Files
2026-02-08 11:20:43 -10:00

43 lines
13 KiB
HTML
Vendored

<!DOCTYPE html PUBLIC ""
"">
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.io.csv documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">8.003</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch current"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -641px;"><span class="top" style="height: 650px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.io.csv.html#var-csv-.3Edataset"><div class="inner"><span>csv-&gt;dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.io.csv.html#var-csv-.3Edataset-seq"><div class="inner"><span>csv-&gt;dataset-seq</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.io.csv.html#var-rows-.3Ecsv.21"><div class="inner"><span>rows-&gt;csv!</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.io.csv.html#var-rows-.3Edataset-fn"><div class="inner"><span>rows-&gt;dataset-fn</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.io.csv</h1><div class="doc"><div class="markdown"><p>CSV parsing based on <a href="https://cnuernber.github.io/charred/">charred.api/read-csv</a>.</p>
</div></div><div class="public anchor" id="var-csv-.3Edataset"><h3>csv-&gt;dataset</h3><div class="usage"><code>(csv-&gt;dataset input &amp; [options])</code></div><div class="doc"><div class="markdown"><p>Read a csv into a dataset. Same options as <a href="tech.v3.dataset.html#var--.3Edataset">tech.v3.dataset/-&gt;dataset</a>.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/csv.clj#L125">view source</a></div></div><div class="public anchor" id="var-csv-.3Edataset-seq"><h3>csv-&gt;dataset-seq</h3><div class="usage"><code>(csv-&gt;dataset-seq input &amp; [options])</code></div><div class="doc"><div class="markdown"><p>Read a csv into a lazy sequence of datasets. All options of <a href="tech.v3.dataset.html#var--.3Edataset">tech.v3.dataset/-&gt;dataset</a>
are suppored aside from <code>:n-initial-skip-rows</code> with an additional option of
<code>:batch-size</code> which defaults to 128000.</p>
<p>Options are passed through to
<a href="https://cnuernber.github.io/charred/charred.bulk.html#var-batch-csv-rows">charred.bulk/batch-csv-rows</a>
renaming where necessary. This method defaults to using a load thread - see above method
for more options. To disable the load thread use <code>:csv-load-thread-name nil</code>.</p>
<p>When using multithreaded loading, options are also passed through to
<a href="https://cnuernber.github.io/ham-fisted/ham-fisted.api.html#var-pmap-opts">ham-fisted.api/pmap-opts</a>
so you can change the amount of <code>:n-lookahead</code> the pmap opteration uses when submitting jobs
to the thread pool. By default this is set to 4 to decrease possible OOM sitations.</p>
<p>The input will only be closed once the entire sequence is realized.</p>
<p>Options:</p>
<ul>
<li>:load-tfn - dataset-&gt;x transformation function to be performed on
in the same thread context just after dataset is loaded. Doing some operations
in this transform function can be considerably more efficient than only loading
the dataset when using multithreaded loading.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/csv.clj#L83">view source</a></div></div><div class="public anchor" id="var-rows-.3Ecsv.21"><h3>rows-&gt;csv!</h3><div class="usage"><code>(rows-&gt;csv! output headers rows)</code><code>(rows-&gt;csv! output headers rows {:keys [separator], :or {separator \tab}, :as options})</code></div><div class="doc"><div class="markdown"><p>Given an something convertible to an output stream, an optional set of headers
as string arrays, and a sequence of string arrows, write a CSV or a TSV file.</p>
<p>Options:</p>
<ul>
<li><code>:separator</code> - Defaults to ab.</li>
<li><code>:quote</code> - Default "</li>
<li><code>:quote?</code> A predicate function which determines if a string should be quoted.
Defaults to quoting only when necessary. May also be the the value 'true' in which
case every field is quoted.</li>
<li>:newline - <code>:lf</code> (default) or <code>:cr+lf</code>.</li>
<li>:close-writer? - defaults to true. When true, close writer when finished.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/csv.clj#L164">view source</a></div></div><div class="public anchor" id="var-rows-.3Edataset-fn"><h3>rows-&gt;dataset-fn</h3><div class="usage"><code>(rows-&gt;dataset-fn {:keys [header-row?], :or {header-row? true}, :as options})</code></div><div class="doc"><div class="markdown"><p>Create an efficiently callable function to parse row-batches into datasets.
Returns function from row-iter-&gt;dataset. Options passed in here are the
same as -&gt;dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/csv.clj#L28">view source</a></div></div></div></body></html>