43 lines
13 KiB
HTML
Vendored
43 lines
13 KiB
HTML
Vendored
<!DOCTYPE html PUBLIC ""
|
|
"">
|
|
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.io.csv documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
|
|
function gtag(){dataLayer.push(arguments);}
|
|
gtag('js', new Date());
|
|
|
|
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">8.003</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch current"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -641px;"><span class="top" style="height: 650px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.io.csv.html#var-csv-.3Edataset"><div class="inner"><span>csv->dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.io.csv.html#var-csv-.3Edataset-seq"><div class="inner"><span>csv->dataset-seq</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.io.csv.html#var-rows-.3Ecsv.21"><div class="inner"><span>rows->csv!</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.io.csv.html#var-rows-.3Edataset-fn"><div class="inner"><span>rows->dataset-fn</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.io.csv</h1><div class="doc"><div class="markdown"><p>CSV parsing based on <a href="https://cnuernber.github.io/charred/">charred.api/read-csv</a>.</p>
|
|
</div></div><div class="public anchor" id="var-csv-.3Edataset"><h3>csv->dataset</h3><div class="usage"><code>(csv->dataset input & [options])</code></div><div class="doc"><div class="markdown"><p>Read a csv into a dataset. Same options as <a href="tech.v3.dataset.html#var--.3Edataset">tech.v3.dataset/->dataset</a>.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/csv.clj#L125">view source</a></div></div><div class="public anchor" id="var-csv-.3Edataset-seq"><h3>csv->dataset-seq</h3><div class="usage"><code>(csv->dataset-seq input & [options])</code></div><div class="doc"><div class="markdown"><p>Read a csv into a lazy sequence of datasets. All options of <a href="tech.v3.dataset.html#var--.3Edataset">tech.v3.dataset/->dataset</a>
|
|
are suppored aside from <code>:n-initial-skip-rows</code> with an additional option of
|
|
<code>:batch-size</code> which defaults to 128000.</p>
|
|
<p>Options are passed through to
|
|
<a href="https://cnuernber.github.io/charred/charred.bulk.html#var-batch-csv-rows">charred.bulk/batch-csv-rows</a>
|
|
renaming where necessary. This method defaults to using a load thread - see above method
|
|
for more options. To disable the load thread use <code>:csv-load-thread-name nil</code>.</p>
|
|
<p>When using multithreaded loading, options are also passed through to
|
|
<a href="https://cnuernber.github.io/ham-fisted/ham-fisted.api.html#var-pmap-opts">ham-fisted.api/pmap-opts</a>
|
|
so you can change the amount of <code>:n-lookahead</code> the pmap opteration uses when submitting jobs
|
|
to the thread pool. By default this is set to 4 to decrease possible OOM sitations.</p>
|
|
<p>The input will only be closed once the entire sequence is realized.</p>
|
|
<p>Options:</p>
|
|
<ul>
|
|
<li>:load-tfn - dataset->x transformation function to be performed on
|
|
in the same thread context just after dataset is loaded. Doing some operations
|
|
in this transform function can be considerably more efficient than only loading
|
|
the dataset when using multithreaded loading.</li>
|
|
</ul>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/csv.clj#L83">view source</a></div></div><div class="public anchor" id="var-rows-.3Ecsv.21"><h3>rows->csv!</h3><div class="usage"><code>(rows->csv! output headers rows)</code><code>(rows->csv! output headers rows {:keys [separator], :or {separator \tab}, :as options})</code></div><div class="doc"><div class="markdown"><p>Given an something convertible to an output stream, an optional set of headers
|
|
as string arrays, and a sequence of string arrows, write a CSV or a TSV file.</p>
|
|
<p>Options:</p>
|
|
<ul>
|
|
<li><code>:separator</code> - Defaults to ab.</li>
|
|
<li><code>:quote</code> - Default "</li>
|
|
<li><code>:quote?</code> A predicate function which determines if a string should be quoted.
|
|
Defaults to quoting only when necessary. May also be the the value 'true' in which
|
|
case every field is quoted.</li>
|
|
<li>:newline - <code>:lf</code> (default) or <code>:cr+lf</code>.</li>
|
|
<li>:close-writer? - defaults to true. When true, close writer when finished.</li>
|
|
</ul>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/csv.clj#L164">view source</a></div></div><div class="public anchor" id="var-rows-.3Edataset-fn"><h3>rows->dataset-fn</h3><div class="usage"><code>(rows->dataset-fn {:keys [header-row?], :or {header-row? true}, :as options})</code></div><div class="doc"><div class="markdown"><p>Create an efficiently callable function to parse row-batches into datasets.
|
|
Returns function from row-iter->dataset. Options passed in here are the
|
|
same as ->dataset.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/csv.clj#L28">view source</a></div></div></div></body></html> |