Files
df-research/tech.ml.dataset/docs/tech.v3.dataset.math.html
2026-02-08 11:20:43 -10:00

42 lines
15 KiB
HTML
Vendored

<!DOCTYPE html PUBLIC ""
"">
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.math documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">8.003</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch current"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -641px;"><span class="top" style="height: 650px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.math.html#var-correlation-table"><div class="inner"><span>correlation-table</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-fill-range-replace"><div class="inner"><span>fill-range-replace</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-fit-minmax"><div class="inner"><span>fit-minmax</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-fit-std-scale"><div class="inner"><span>fit-std-scale</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-interpolate-loess"><div class="inner"><span>interpolate-loess</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-transform-minmax"><div class="inner"><span>transform-minmax</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.math.html#var-transform-std-scale"><div class="inner"><span>transform-std-scale</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.math</h1><div class="doc"><div class="markdown"><p>Various mathematic transformations of datasets such as (inefficiently)
building simple tables, pca, and normalizing columns to have mean of 0 and variance of 1.
More in-depth transformations are found at <code>tech.v3.dataset.neanderthal</code>.</p>
</div></div><div class="public anchor" id="var-correlation-table"><h3>correlation-table</h3><div class="usage"><code>(correlation-table dataset &amp; {:keys [correlation-type colname-seq]})</code></div><div class="doc"><div class="markdown"><p>Return a map of colname-&gt;list of sorted tuple of <a href="colname, coefficient">colname, coefficient</a>.
Sort is:
(sort-by (comp #(Math/abs (double %)) second) &gt;)</p>
<p>Thus the first entry is:
<a href="colname, 1.0">colname, 1.0</a></p>
<p>There are three possible correlation types:
:pearson
:spearman
:kendall</p>
<p>:pearson is the default.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L37">view source</a></div></div><div class="public anchor" id="var-fill-range-replace"><h3>fill-range-replace</h3><div class="usage"><code>(fill-range-replace ds colname max-span)</code><code>(fill-range-replace ds colname max-span missing-strategy)</code><code>(fill-range-replace ds colname max-span missing-strategy missing-value)</code></div><div class="doc"><div class="markdown"><p>Given an in-order column of a numeric or datetime type, fill in spans that are
larger than the given max-span. The source column must not have missing values.
For more documentation on fill-range, see tech.v3.datatype.function.fill-range.</p>
<p>If the column is a datetime type the operation happens in millisecond space and
max-span may be a datetime type convertible to milliseconds.</p>
<p>The result column has the same datatype as the input column.</p>
<p>After the operation, if missing strategy is not nil the newly produced missing
values along with the existing missing values will be replaced using the given
missing strategy for all other columns. See
<code>tech.v3.dataset.missing/replace-missing</code> for documentation on missing strategies.
The missing strategy defaults to :down unless explicity set.</p>
<p>Returns a new dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L160">view source</a></div></div><div class="public anchor" id="var-fit-minmax"><h3>fit-minmax</h3><div class="usage"><code>(fit-minmax dataset {:keys [min max], :or {min -0.5, max 0.5}, :as options})</code><code>(fit-minmax dataset)</code></div><div class="doc"><div class="markdown"><p>nan-aware min-max fit of the dataset. Returns an object that can be used
in transform-minmax. target Min-max default to -0.5,0.5</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L291">view source</a></div></div><div class="public anchor" id="var-fit-std-scale"><h3>fit-std-scale</h3><div class="usage"><code>(fit-std-scale dataset {:keys [mean? stddev?], :or {mean? true, stddev? true}, :as options})</code><code>(fit-std-scale dataset)</code></div><div class="doc"><div class="markdown"><p>Calculate nan-aware means, stddev - per-column - of a dataset.</p>
<p>Options are passed through to
tech.v3.datatype.statistics/descriptive-statistics.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L238">view source</a></div></div><div class="public anchor" id="var-interpolate-loess"><h3>interpolate-loess</h3><div class="usage"><code>(interpolate-loess ds x-colname y-colname {:keys [bandwidth iterations accuracy result-name], :or {bandwidth 0.75, iterations 4, accuracy LoessInterpolator/DEFAULT_ACCURACY}})</code><code>(interpolate-loess ds x-colname y-colname)</code></div><div class="doc"><div class="markdown"><p>Interpolate using the LOESS regression engine. Useful for smoothing out graphs.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L112">view source</a></div></div><div class="public anchor" id="var-transform-minmax"><h3>transform-minmax</h3><div class="usage"><code>(transform-minmax dataset {:keys [min max column-data]})</code></div><div class="doc"><div class="markdown"><p>Scale columns listed in the min-max transform to the mins and maxes dictated
by that transform.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L312">view source</a></div></div><div class="public anchor" id="var-transform-std-scale"><h3>transform-std-scale</h3><div class="usage"><code>(transform-std-scale dataset std-scale-xform)</code></div><div class="doc"><div class="markdown"><p>Given a dataset and a standard scale transform return a new dataset
with the columns</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/math.clj#L262">view source</a></div></div></div></body></html>