Files
df-research/tech.ml.dataset/docs/tech.v3.dataset.neanderthal.html
2026-02-08 11:20:43 -10:00

71 lines
16 KiB
HTML
Vendored

<!DOCTYPE html PUBLIC ""
"">
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.neanderthal documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">7.031</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch current"><a href="tech.v3.dataset.neanderthal.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>neanderthal</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -672px;"><span class="top" style="height: 681px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>smile</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.smile.data.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>data</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.neanderthal.html#var-dataset-.3Edense"><div class="inner"><span>dataset-&gt;dense</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.neanderthal.html#var-dense-.3Edataset"><div class="inner"><span>dense-&gt;dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.neanderthal.html#var-fit-pca"><div class="inner"><span>fit-pca</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.neanderthal.html#var-fit-pca.21"><div class="inner"><span>fit-pca!</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.neanderthal.html#var-neanderthal-enabled.3F"><div class="inner"><span>neanderthal-enabled?</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.neanderthal.html#var-transform-pca"><div class="inner"><span>transform-pca</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.neanderthal.html#var-transform-pca.21"><div class="inner"><span>transform-pca!</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.neanderthal</h1><div class="doc"><div class="markdown"><p>Conversion of a dataset to/from a neanderthal dense matrix as well as various
dataset transformations such as pca, covariance and correlation matrixes.</p>
<p>Please include these additional dependencies in your project:</p>
<pre><code class="language-clojure"> [uncomplicate/neanderthal "0.45.0"]
</code></pre>
</div></div><div class="public anchor" id="var-dataset-.3Edense"><h3>dataset-&gt;dense</h3><div class="usage"><code>(dataset-&gt;dense dataset neanderthal-layout datatype)</code><code>(dataset-&gt;dense dataset neanderthal-layout)</code><code>(dataset-&gt;dense dataset)</code></div><div class="doc"><div class="markdown"><p>Convert a dataset into a dense neanderthal CPU matrix. If the matrix
is column-major, then potentially you can get accelerated copies from the dataset
into neanderthal.</p>
<ul>
<li>neanderthal-layout - either :column for a column-major matrix or :row for a row-major
matrix.</li>
<li>datatype - either :float64 or :float32</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/neanderthal.clj#L31">view source</a></div></div><div class="public anchor" id="var-dense-.3Edataset"><h3>dense-&gt;dataset</h3><div class="usage"><code>(dense-&gt;dataset matrix)</code></div><div class="doc"><div class="markdown"><p>Given a neanderthal matrix, convert its columns into the columns of a
tech.v3.dataset. This does the conversion in-place. If you would like to copy
the neanderthal matrix into JVM arrays, then after method use dtype/clone.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/neanderthal.clj#L61">view source</a></div></div><div class="public anchor" id="var-fit-pca"><h3>fit-pca</h3><div class="usage"><code>(fit-pca dataset {:keys [n-components variance-amount], :or {variance-amount 0.95}, :as options})</code><code>(fit-pca dataset)</code></div><div class="doc"><div class="markdown"><p>Run PCA on the dataset. Dataset must not have missing values
or non-numeric string columns.</p>
<p>Keep in mind that PCA may be highly influenced by outliers in the dataset
and a probabilistic or some level of auto-encoder dimensionality reduction
more effective for your problem.</p>
<p>Returns pca-info:
{:means - vec of means
:eigenvalues - vec of eigenvalues
:eigenvectors - matrix of eigenvectors
}</p>
<p>Use transform-pca with a dataset and the the returned value to perform
PCA on a dataset.</p>
<p>Options:</p>
<ul>
<li>method - svd, cov - Either use SVD or covariance based method. SVD is faster
but covariance method means the post-projection variances are accurate.
Defaults to cov. Both methods produce similar projection matrixes.</li>
<li>variance-amount - fractional amount of variance to keep. Defaults to 0.95.</li>
<li>n-components - If provided overrides variance amount and sets the number of
components to keep. This controls the number of result columns directly as an
integer.</li>
<li>covariance-bias? - When using :cov, divide by n-rows if true and (dec n-rows)
if false. defaults to false.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/neanderthal.clj#L197">view source</a></div></div><div class="public anchor" id="var-fit-pca.21"><h3>fit-pca!</h3><div class="usage"><code>(fit-pca! tensor {:keys [method covariance-bias?], :or {method :cov, covariance-bias? false}, :as _options})</code><code>(fit-pca! tensor)</code></div><div class="doc"><div class="markdown"><p>Run Principle Component Analysis on a tensor.</p>
<p>Keep in mind that PCA may be highly influenced by outliers in the dataset
and a probabilistic or some level of auto-encoder dimensionality reduction
more effective for your problem.</p>
<p>Returns a map of:</p>
<ul>
<li>:means - vec of means</li>
<li>:eigenvalues - vec of eigenvalues. These are the variance of columns of the
post-projected tensor if :cov is used. They are in the ballpark if :svd is used.</li>
<li>:eigenvectors - matrix of eigenvectors</li>
</ul>
<p>Options:</p>
<ul>
<li>method - svd, cov - Either use SVD or covariance based method. SVD is faster
but covariance method means the post-projection variances are accurate. Both
methods produce an identical or extremely similar projection matrix. Defaults
to <code>:cov</code>.</li>
<li>covariance-bias? - When using :cov, divide by n-rows if true and (dec n-rows)
if false. defaults to false.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/neanderthal.clj#L71">view source</a></div></div><div class="public anchor" id="var-neanderthal-enabled.3F"><h3>neanderthal-enabled?</h3><div class="usage"><code>(neanderthal-enabled?)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/neanderthal.clj#L192">view source</a></div></div><div class="public anchor" id="var-transform-pca"><h3>transform-pca</h3><div class="usage"><code>(transform-pca dataset {:keys [n-components result-datatype], :as pca-transform})</code></div><div class="doc"><div class="markdown"><p>PCA transform the dataset returning a new dataset. The method used to generate the pca information
is indicated in the metadata of the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/neanderthal.clj#L260">view source</a></div></div><div class="public anchor" id="var-transform-pca.21"><h3>transform-pca!</h3><div class="usage"><code>(transform-pca! tensor pca-info n-components)</code></div><div class="doc"><div class="markdown"><p>PCA transform the dataset returning a new tensor. Mean-centers
the tensor in-place.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/neanderthal.clj#L153">view source</a></div></div></div></body></html>