65 lines
22 KiB
HTML
Vendored
65 lines
22 KiB
HTML
Vendored
<!DOCTYPE html PUBLIC ""
|
|
"">
|
|
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.column documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
|
|
function gtag(){dataLayer.push(arguments);}
|
|
gtag('js', new Date());
|
|
|
|
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">8.003</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch current"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -641px;"><span class="top" style="height: 650px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.column.html#var-clone"><div class="inner"><span>clone</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-column-map"><div class="inner"><span>column-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-column-name"><div class="inner"><span>column-name</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-correlation"><div class="inner"><span>correlation</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-extend-column-with-empty"><div class="inner"><span>extend-column-with-empty</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-intersect-missing-sets"><div class="inner"><span>intersect-missing-sets</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-is-column.3F"><div class="inner"><span>is-column?</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-is-missing.3F"><div class="inner"><span>is-missing?</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-missing"><div class="inner"><span>missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-new-column"><div class="inner"><span>new-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-parse-column"><div class="inner"><span>parse-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-prepend-column-with-empty"><div class="inner"><span>prepend-column-with-empty</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-select"><div class="inner"><span>select</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-set-missing"><div class="inner"><span>set-missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-set-name"><div class="inner"><span>set-name</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-stats"><div class="inner"><span>stats</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-string-table-keyset"><div class="inner"><span>string-table-keyset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-supported-stats"><div class="inner"><span>supported-stats</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-to-double-array"><div class="inner"><span>to-double-array</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-union-missing-sets"><div class="inner"><span>union-missing-sets</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column.html#var-unique"><div class="inner"><span>unique</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.column</h1><div class="doc"><div class="markdown"></div></div><div class="public anchor" id="var-clone"><h3>clone</h3><div class="usage"><code>(clone col)</code></div><div class="doc"><div class="markdown"><p>Clone this column not changing anything.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L116">view source</a></div></div><div class="public anchor" id="var-column-map"><h3>column-map</h3><div class="usage"><code>(column-map map-fn res-dtype & args)</code></div><div class="doc"><div class="markdown"><p>Map a scalar function across one or more columns.
|
|
This is the semi-missing-set aware version of tech.v3.datatype/emap. This function
|
|
is never lazy.</p>
|
|
<p>If res-dtype is nil then the result is scanned to infer datatype and
|
|
missing set. res-dtype may also be a map of options:</p>
|
|
<p>Options:</p>
|
|
<ul>
|
|
<li><code>:datatype</code> - Set the dataype of the result column. If not given result is scanned
|
|
to infer result datatype and missing set.</li>
|
|
<li><code>:missing-fn</code> - if given, columns are first passed to missing-fn as a sequence and
|
|
this dictates the missing set. Else the missing set is by scanning the results
|
|
during the inference process. See <code>tech.v3.dataset.column/union-missing-sets</code> and
|
|
<code>tech.v3.dataset.column/intersect-missing-sets</code> for example functions to pass in
|
|
here.</li>
|
|
</ul>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L200">view source</a></div></div><div class="public anchor" id="var-column-name"><h3>column-name</h3><div class="usage"><code>(column-name col)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L33">view source</a></div></div><div class="public anchor" id="var-correlation"><h3>correlation</h3><div class="usage"><code>(correlation lhs rhs correlation-type)</code></div><div class="doc"><div class="markdown"><p>Correlation coefficient for given 2 columns. Available correlation types
|
|
are:
|
|
:pearson
|
|
:spearman
|
|
:kendall</p>
|
|
<p>Returns floating point number between <a href="-1 1">-1 1</a></p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L92">view source</a></div></div><div class="public anchor" id="var-extend-column-with-empty"><h3>extend-column-with-empty</h3><div class="usage"><code>(extend-column-with-empty column n-empty)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L179">view source</a></div></div><div class="public anchor" id="var-intersect-missing-sets"><h3>intersect-missing-sets</h3><div class="usage"><code>(intersect-missing-sets col-seq)</code></div><div class="doc"><div class="markdown"><p>Intersect the missing sets of the columns returning a roaring bitmap</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L245">view source</a></div></div><div class="public anchor" id="var-is-column.3F"><h3>is-column?</h3><div class="usage"><code>(is-column? item)</code></div><div class="doc"><div class="markdown"><p>Return true if this item is a column.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L26">view source</a></div></div><div class="public anchor" id="var-is-missing.3F"><h3>is-missing?</h3><div class="usage"><code>(is-missing? col idx)</code></div><div class="doc"><div class="markdown"><p>Return true if this index is missing.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L56">view source</a></div></div><div class="public anchor" id="var-missing"><h3>missing</h3><div class="usage"><code>(missing col)</code></div><div class="doc"><div class="markdown"><p>Indexes of missing values. Both iterable and reader.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L50">view source</a></div></div><div class="public anchor" id="var-new-column"><h3>new-column</h3><div class="usage"><code>(new-column name data)</code><code>(new-column name data metadata)</code><code>(new-column name data metadata missing)</code><code>(new-column data-or-data-map)</code></div><div class="doc"><div class="markdown"><p>Create a new column. Data will scanned for missing values
|
|
unless the full 4-argument pathway is used.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L166">view source</a></div></div><div class="public anchor" id="var-parse-column"><h3>parse-column</h3><div class="usage"><code>(parse-column datatype col options)</code><code>(parse-column datatype col)</code></div><div class="doc"><div class="markdown"><p>parse a text or a str column, returning a new column with the same name but with
|
|
a different datatype. This method is single-threaded.</p>
|
|
<p>parser-fn-or-kwd is nil by default and can the keyword :relaxed? or a function that
|
|
must return one of parsed-value, :tech.v3.dataset/missing in which case a
|
|
missing value will be added or :tech.v3.dataset/parse-failure in which case the
|
|
a missing index will be added and the string value will be recorded in the metadata's
|
|
:unparsed-data, :unparsed-indexes entries.</p>
|
|
<p>Options:</p>
|
|
<p>Same options roughly as ->dataset, specifically of interest may be <code>:text-temp-file</code>.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L139">view source</a></div></div><div class="public anchor" id="var-prepend-column-with-empty"><h3>prepend-column-with-empty</h3><div class="usage"><code>(prepend-column-with-empty column n-empty)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L184">view source</a></div></div><div class="public anchor" id="var-select"><h3>select</h3><div class="usage"><code>(select col selection)</code></div><div class="doc"><div class="markdown"><p>Return a new column with the subset of indexes based on the provided <code>selection</code>.
|
|
<code>selection</code> can be a list of indexes to select or boolean values where the index
|
|
position of each true element indicates a index to select. When supplying a list
|
|
of indices, duplicates are possible and will select the specified position more
|
|
than once.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L107">view source</a></div></div><div class="public anchor" id="var-set-missing"><h3>set-missing</h3><div class="usage"><code>(set-missing col idx-seq)</code></div><div class="doc"><div class="markdown"><p>Set the missing indexes for a column. This doesn't change any values in the
|
|
underlying data store.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L63">view source</a></div></div><div class="public anchor" id="var-set-name"><h3>set-name</h3><div class="usage"><code>(set-name col name)</code></div><div class="doc"><div class="markdown"><p>Return a new column.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L38">view source</a></div></div><div class="public anchor" id="var-stats"><h3>stats</h3><div class="usage"><code>(stats col stats-set)</code></div><div class="doc"><div class="markdown"><p>Return a map of stats. Stats set is a set of the desired stats in keyword
|
|
form. Guaranteed support across implementations for :mean :variance :median :skew.
|
|
Implementations should check their metadata before doing calculations.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L84">view source</a></div></div><div class="public anchor" id="var-string-table-keyset"><h3>string-table-keyset</h3><div class="usage"><code>(string-table-keyset col)</code></div><div class="doc"><div class="markdown"><p>Get the string table for this column. Returns nil if this isn't a string column.
|
|
This doesn't necessarily tell you the unique set of the column unless you have just
|
|
parsed a file. It is, when non-nil, a strict superset of the strings in the
|
|
columns.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L122">view source</a></div></div><div class="public anchor" id="var-supported-stats"><h3>supported-stats</h3><div class="usage"><code>(supported-stats col)</code></div><div class="doc"><div class="markdown"><p>List of available stats for the column</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L44">view source</a></div></div><div class="public anchor" id="var-to-double-array"><h3>to-double-array</h3><div class="usage"><code>(to-double-array col & [error-on-missing?])</code></div><div class="doc"><div class="markdown"><p>Convert to a java primitive array of a given datatype. For strings,
|
|
an implicit string->double mapping is expected. For booleans, true=1 false=0.
|
|
Finally, any missing values should be indicated by a NaN of the expected type.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L189">view source</a></div></div><div class="public anchor" id="var-union-missing-sets"><h3>union-missing-sets</h3><div class="usage"><code>(union-missing-sets col-seq)</code></div><div class="doc"><div class="markdown"><p>Union the missing sets of the columns returning a roaring bitmap</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L239">view source</a></div></div><div class="public anchor" id="var-unique"><h3>unique</h3><div class="usage"><code>(unique col)</code></div><div class="doc"><div class="markdown"><p>Set of all unique values</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column.clj#L78">view source</a></div></div></div></body></html> |