42 lines
15 KiB
HTML
Vendored
42 lines
15 KiB
HTML
Vendored
<!DOCTYPE html PUBLIC ""
|
|
"">
|
|
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.math documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
|
|
function gtag(){dataLayer.push(arguments);}
|
|
gtag('js', new Date());
|
|
|
|
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">8.003</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch current"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -641px;"><span class="top" style="height: 650px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.math.html#var-correlation-table"><div class="inner"><span>correlation-table</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-fill-range-replace"><div class="inner"><span>fill-range-replace</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-fit-minmax"><div class="inner"><span>fit-minmax</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-fit-std-scale"><div class="inner"><span>fit-std-scale</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-interpolate-loess"><div class="inner"><span>interpolate-loess</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-transform-minmax"><div class="inner"><span>transform-minmax</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-transform-std-scale"><div class="inner"><span>transform-std-scale</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.math</h1><div class="doc"><div class="markdown"><p>Various mathematic transformations of datasets such as (inefficiently)
|
|
building simple tables, pca, and normalizing columns to have mean of 0 and variance of 1.
|
|
More in-depth transformations are found at <code>tech.v3.dataset.neanderthal</code>.</p>
|
|
</div></div><div class="public anchor" id="var-correlation-table"><h3>correlation-table</h3><div class="usage"><code>(correlation-table dataset & {:keys [correlation-type colname-seq]})</code></div><div class="doc"><div class="markdown"><p>Return a map of colname->list of sorted tuple of <a href="colname, coefficient">colname, coefficient</a>.
|
|
Sort is:
|
|
(sort-by (comp #(Math/abs (double %)) second) >)</p>
|
|
<p>Thus the first entry is:
|
|
<a href="colname, 1.0">colname, 1.0</a></p>
|
|
<p>There are three possible correlation types:
|
|
:pearson
|
|
:spearman
|
|
:kendall</p>
|
|
<p>:pearson is the default.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L37">view source</a></div></div><div class="public anchor" id="var-fill-range-replace"><h3>fill-range-replace</h3><div class="usage"><code>(fill-range-replace ds colname max-span)</code><code>(fill-range-replace ds colname max-span missing-strategy)</code><code>(fill-range-replace ds colname max-span missing-strategy missing-value)</code></div><div class="doc"><div class="markdown"><p>Given an in-order column of a numeric or datetime type, fill in spans that are
|
|
larger than the given max-span. The source column must not have missing values.
|
|
For more documentation on fill-range, see tech.v3.datatype.function.fill-range.</p>
|
|
<p>If the column is a datetime type the operation happens in millisecond space and
|
|
max-span may be a datetime type convertible to milliseconds.</p>
|
|
<p>The result column has the same datatype as the input column.</p>
|
|
<p>After the operation, if missing strategy is not nil the newly produced missing
|
|
values along with the existing missing values will be replaced using the given
|
|
missing strategy for all other columns. See
|
|
<code>tech.v3.dataset.missing/replace-missing</code> for documentation on missing strategies.
|
|
The missing strategy defaults to :down unless explicity set.</p>
|
|
<p>Returns a new dataset.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L160">view source</a></div></div><div class="public anchor" id="var-fit-minmax"><h3>fit-minmax</h3><div class="usage"><code>(fit-minmax dataset {:keys [min max], :or {min -0.5, max 0.5}, :as options})</code><code>(fit-minmax dataset)</code></div><div class="doc"><div class="markdown"><p>nan-aware min-max fit of the dataset. Returns an object that can be used
|
|
in transform-minmax. target Min-max default to -0.5,0.5</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L291">view source</a></div></div><div class="public anchor" id="var-fit-std-scale"><h3>fit-std-scale</h3><div class="usage"><code>(fit-std-scale dataset {:keys [mean? stddev?], :or {mean? true, stddev? true}, :as options})</code><code>(fit-std-scale dataset)</code></div><div class="doc"><div class="markdown"><p>Calculate nan-aware means, stddev - per-column - of a dataset.</p>
|
|
<p>Options are passed through to
|
|
tech.v3.datatype.statistics/descriptive-statistics.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L238">view source</a></div></div><div class="public anchor" id="var-interpolate-loess"><h3>interpolate-loess</h3><div class="usage"><code>(interpolate-loess ds x-colname y-colname {:keys [bandwidth iterations accuracy result-name], :or {bandwidth 0.75, iterations 4, accuracy LoessInterpolator/DEFAULT_ACCURACY}})</code><code>(interpolate-loess ds x-colname y-colname)</code></div><div class="doc"><div class="markdown"><p>Interpolate using the LOESS regression engine. Useful for smoothing out graphs.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L112">view source</a></div></div><div class="public anchor" id="var-transform-minmax"><h3>transform-minmax</h3><div class="usage"><code>(transform-minmax dataset {:keys [min max column-data]})</code></div><div class="doc"><div class="markdown"><p>Scale columns listed in the min-max transform to the mins and maxes dictated
|
|
by that transform.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L312">view source</a></div></div><div class="public anchor" id="var-transform-std-scale"><h3>transform-std-scale</h3><div class="usage"><code>(transform-std-scale dataset std-scale-xform)</code></div><div class="doc"><div class="markdown"><p>Given a dataset and a standard scale transform return a new dataset
|
|
with the columns</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L262">view source</a></div></div></div></body></html> |