52 lines
15 KiB
HTML
Vendored
52 lines
15 KiB
HTML
Vendored
<!DOCTYPE html PUBLIC ""
|
|
"">
|
|
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.io.univocity documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
|
|
function gtag(){dataLayer.push(arguments);}
|
|
gtag('js', new Date());
|
|
|
|
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">8.003</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5 current"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -641px;"><span class="top" style="height: 650px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.io.univocity.html#var-create-csv-parser"><div class="inner"><span>create-csv-parser</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.io.univocity.html#var-csv-.3Edataset"><div class="inner"><span>csv->dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.io.univocity.html#var-csv-.3Erows"><div class="inner"><span>csv->rows</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.io.univocity.html#var-PApplyWriteOptions"><div class="inner"><span>PApplyWriteOptions</span></div></a></li><li class="depth-2"><a href="tech.v3.dataset.io.univocity.html#var-apply-write-options.21"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apply-write-options!</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.io.univocity.html#var-raw-row-iterable"><div class="inner"><span>raw-row-iterable</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.io.univocity.html#var-rows-.3Ecsv.21"><div class="inner"><span>rows->csv!</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.io.univocity</h1><div class="doc"><div class="markdown"><p>Bindings to univocity. Transforms csv's, tsv's into sequences
|
|
of string arrays that are then passed into <code>tech.v3.dataset.io.string-row-parser</code>
|
|
methods.</p>
|
|
</div></div><div class="public anchor" id="var-create-csv-parser"><h3>create-csv-parser</h3><div class="usage"><code>(create-csv-parser {:keys [header-row? num-rows column-whitelist column-blacklist column-allowlist column-blocklist separator n-initial-skip-rows], :or {header-row? true}, :as options})</code></div><div class="doc"><div class="markdown"><p>Create an implementation of univocity csv parser.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/univocity.clj#L77">view source</a></div></div><div class="public anchor" id="var-csv-.3Edataset"><h3>csv->dataset</h3><div class="usage"><code>(csv->dataset input options)</code><code>(csv->dataset input)</code></div><div class="doc"><div class="markdown"><p>Non-lazily and serially parse the columns. Returns a vector of maps of
|
|
{
|
|
:name column-name
|
|
:missing long-reader of in-order missing indexes
|
|
:data typed reader/writer of data
|
|
:metadata - optional map with unparsed-indexes and unparsed-values
|
|
}
|
|
Supports a subset of tech.v3.dataset/->dataset options:
|
|
:column-allowlist in preference to :column-whitelist
|
|
:column-blocklist in preference to :column-blacklist
|
|
:n-initial-skip-rows
|
|
:num-rows
|
|
:header-row?
|
|
:separator
|
|
:parser-fn
|
|
:parser-scan-len</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/univocity.clj#L209">view source</a></div></div><div class="public anchor" id="var-csv-.3Erows"><h3>csv->rows</h3><div class="usage"><code>(csv->rows input options)</code><code>(csv->rows input)</code></div><div class="doc"><div class="markdown"><p>Given a csv, produces a sequence of rows. The csv options from ->dataset
|
|
apply here.</p>
|
|
<p>options:</p>
|
|
<ul>
|
|
<li><code>:column-allowlist</code> - either sequence of string column names or sequence of column
|
|
indices of columns to whitelist. In preference to <code>:column-whitelist</code>.</li>
|
|
<li><code>:column-blocklist</code> - either sequence of string column names or sequence of column
|
|
indices of columns to blacklist. In preference to <code>:column-blacklist</code>.</li>
|
|
<li><code>:num-rows</code> - Number of rows to read</li>
|
|
<li><code>:separator</code> - Add a character separator to the list of separators to auto-detect.</li>
|
|
<li><code>:max-chars-per-column</code> - Defaults to 4096. Columns with more characters that this
|
|
will result in an exception.</li>
|
|
<li><code>:max-num-columns</code> - Defaults to 8192. CSV,TSV files with more columns than this
|
|
will fail to parse. For more information on this option, please visit:
|
|
<a href="https://github.com/uniVocity/univocity-parsers/issues/301">https://github.com/uniVocity/univocity-parsers/issues/301</a></li>
|
|
</ul>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/univocity.clj#L184">view source</a></div></div><div class="public anchor" id="var-PApplyWriteOptions"><h3>PApplyWriteOptions</h3><h4 class="type">protocol</h4><div class="usage"></div><div class="doc"><div class="markdown"></div></div><div class="members"><h4>members</h4><div class="inner"><div class="public anchor" id="var-apply-write-options.21"><h3>apply-write-options!</h3><div class="usage"><code>(apply-write-options! settings options)</code></div><div class="doc"><div class="markdown"></div></div></div></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/univocity.clj#L257">view source</a></div></div><div class="public anchor" id="var-raw-row-iterable"><h3>raw-row-iterable</h3><div class="usage"><code>(raw-row-iterable input parser)</code><code>(raw-row-iterable input)</code></div><div class="doc"><div class="markdown"><p>Returns an iterable that produces string[]'s</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/univocity.clj#L172">view source</a></div></div><div class="public anchor" id="var-rows-.3Ecsv.21"><h3>rows->csv!</h3><div class="usage"><code>(rows->csv! output header-string-array row-string-array-seq)</code><code>(rows->csv! output header-string-array row-string-array-seq {:keys [separator], :or {separator \tab}, :as options})</code></div><div class="doc"><div class="markdown"><p>Given an something convertible to an output stream, an optional set of headers
|
|
as string arrays, and a sequence of string arrows, write a CSV or a TSV file.</p>
|
|
<p>Options:</p>
|
|
<ul>
|
|
<li><code>:separator</code> - Defaults to ab.</li>
|
|
<li><code>:quoted-columns</code> - For csv, specify which columns should always be quoted
|
|
regardless of their data.</li>
|
|
</ul>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/io/univocity.clj#L274">view source</a></div></div></div></body></html> |