Files
df-research/tech.ml.dataset/docs/tech.v3.dataset.column-filters.html
2026-02-08 11:20:43 -10:00

37 lines
18 KiB
HTML
Vendored

<!DOCTYPE html PUBLIC ""
"">
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.column-filters documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">8.003</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch current"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -641px;"><span class="top" style="height: 650px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-boolean"><div class="inner"><span>boolean</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-categorical"><div class="inner"><span>categorical</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-column-filter"><div class="inner"><span>column-filter</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-datetime"><div class="inner"><span>datetime</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-difference"><div class="inner"><span>difference</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-feature"><div class="inner"><span>feature</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-intersection"><div class="inner"><span>intersection</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-metadata-filter"><div class="inner"><span>metadata-filter</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-missing"><div class="inner"><span>missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-no-missing"><div class="inner"><span>no-missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-numeric"><div class="inner"><span>numeric</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-of-datatype"><div class="inner"><span>of-datatype</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-prediction"><div class="inner"><span>prediction</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-probability-distribution"><div class="inner"><span>probability-distribution</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-string"><div class="inner"><span>string</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-target"><div class="inner"><span>target</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.column-filters.html#var-union"><div class="inner"><span>union</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.column-filters</h1><div class="doc"><div class="markdown"><p>Queries to select column subsets that have various properites such as all numeric
columns, all feature columns, or columns that have a specific datatype.</p>
<p>Further a few set operations (union, intersection, difference) are provided
to further manipulate subsets of columns.</p>
<p>All functions are transformations from dataset to dataset.</p>
<p>The functions in this namespace use the metadata on the columns of the dataset, wich can be inspected via <code>clojure.core/meta</code></p>
</div></div><div class="public anchor" id="var-boolean"><h3>boolean</h3><div class="usage"><code>(boolean dataset)</code></div><div class="doc"><div class="markdown"><p>Return a dataset containing only the boolean columns.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L63">view source</a></div></div><div class="public anchor" id="var-categorical"><h3>categorical</h3><div class="usage"><code>(categorical dataset)</code></div><div class="doc"><div class="markdown"><p>Return a dataset containing only the categorical columns.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L44">view source</a></div></div><div class="public anchor" id="var-column-filter"><h3>column-filter</h3><div class="usage"><code>(column-filter dataset filter-fn)</code></div><div class="doc"><div class="markdown"><p>Return a dataset with only the columns for which the filter function returns a truthy
value.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L22">view source</a></div></div><div class="public anchor" id="var-datetime"><h3>datetime</h3><div class="usage"><code>(datetime dataset)</code></div><div class="doc"><div class="markdown"><p>Return a dataset containing only the datetime columns.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L114">view source</a></div></div><div class="public anchor" id="var-difference"><h3>difference</h3><div class="usage"><code>(difference lhs-ds rhs-ds)</code><code>(difference lhs-ds)</code></div><div class="doc"><div class="markdown"><p>Return the columns in lhs which do not have an equivalently named column in
rhs.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L143">view source</a></div></div><div class="public anchor" id="var-feature"><h3>feature</h3><div class="usage"><code>(feature dataset)</code></div><div class="doc"><div class="markdown"><p>Return a dataset container only the columns which have not been marked as inference
columns.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L95">view source</a></div></div><div class="public anchor" id="var-intersection"><h3>intersection</h3><div class="usage"><code>(intersection lhs-ds rhs-ds)</code></div><div class="doc"><div class="markdown"><p>Return only columns for rhs for which an equivalently named column exists in lhs.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L120">view source</a></div></div><div class="public anchor" id="var-metadata-filter"><h3>metadata-filter</h3><div class="usage"><code>(metadata-filter dataset filter-fn)</code></div><div class="doc"><div class="markdown"><p>Return a dataset with only the columns for which, given the column metadata,
the filter function returns a truthy value.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L33">view source</a></div></div><div class="public anchor" id="var-missing"><h3>missing</h3><div class="usage"><code>(missing dataset)</code></div><div class="doc"><div class="markdown"><p>Return a dataset with only columns have have missing values</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L102">view source</a></div></div><div class="public anchor" id="var-no-missing"><h3>no-missing</h3><div class="usage"><code>(no-missing dataset)</code></div><div class="doc"><div class="markdown"><p>Return a dataset with only columns that have no missing values.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L108">view source</a></div></div><div class="public anchor" id="var-numeric"><h3>numeric</h3><div class="usage"><code>(numeric dataset)</code></div><div class="doc"><div class="markdown"><p>Return a dataset containing only the numeric columns.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L50">view source</a></div></div><div class="public anchor" id="var-of-datatype"><h3>of-datatype</h3><div class="usage"><code>(of-datatype dataset datatype)</code></div><div class="doc"><div class="markdown"><p>Return a dataset containing only the columns of a specific datatype.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L57">view source</a></div></div><div class="public anchor" id="var-prediction"><h3>prediction</h3><div class="usage"><code>(prediction dataset)</code></div><div class="doc"><div class="markdown"><p>Return the columns of the dataset marked as predictions.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L89">view source</a></div></div><div class="public anchor" id="var-probability-distribution"><h3>probability-distribution</h3><div class="usage"><code>(probability-distribution dataset)</code></div><div class="doc"><div class="markdown"><p>Return the columns of the dataset that comprise the probability distribution
after classification.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L82">view source</a></div></div><div class="public anchor" id="var-string"><h3>string</h3><div class="usage"><code>(string dataset)</code></div><div class="doc"><div class="markdown"><p>Return a dataset containing only the string columns.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L69">view source</a></div></div><div class="public anchor" id="var-target"><h3>target</h3><div class="usage"><code>(target dataset)</code></div><div class="doc"><div class="markdown"><p>Return a dataset containing only the columns that have been marked as inference
targets.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L75">view source</a></div></div><div class="public anchor" id="var-union"><h3>union</h3><div class="usage"><code>(union lhs-ds rhs-ds)</code></div><div class="doc"><div class="markdown"><p>Return all columns of lhs along with any columns in rhs which have names that
do not exist in lhs.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/column_filters.clj#L130">view source</a></div></div></div></body></html>