68 lines
13 KiB
HTML
Vendored
68 lines
13 KiB
HTML
Vendored
<!DOCTYPE html PUBLIC ""
|
|
"">
|
|
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.set documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
|
|
function gtag(){dataLayer.push(arguments);}
|
|
gtag('js', new Date());
|
|
|
|
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">8.003</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch current"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -641px;"><span class="top" style="height: 650px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.set.html#var-difference"><div class="inner"><span>difference</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.set.html#var-intersection"><div class="inner"><span>intersection</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.set.html#var-reduce-intersection"><div class="inner"><span>reduce-intersection</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.set.html#var-reduce-union"><div class="inner"><span>reduce-union</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.set.html#var-union"><div class="inner"><span>union</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.set</h1><div class="doc"><div class="markdown"><p>Extensions to datasets to do per-row bag-semantics set/union and intersection.</p>
|
|
</div></div><div class="public anchor" id="var-difference"><h3>difference</h3><div class="usage"><code>(difference a)</code><code>(difference a b)</code></div><div class="doc"><div class="markdown"><p>Remove tuples from a that also appear in b.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/set.clj#L183">view source</a></div></div><div class="public anchor" id="var-intersection"><h3>intersection</h3><div class="usage"><code>(intersection a)</code><code>(intersection a b)</code><code>(intersection a b & args)</code></div><div class="doc"><div class="markdown"><p>Intersect two datasets producing a new dataset with the union of tuples.
|
|
Tuples repeated across all datasets repeated in final dataset at their minimum
|
|
per-dataset repetition count.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/set.clj#L174">view source</a></div></div><div class="public anchor" id="var-reduce-intersection"><h3>reduce-intersection</h3><div class="usage"><code>(reduce-intersection options datasets)</code><code>(reduce-intersection datasets)</code></div><div class="doc"><div class="markdown"><p>Given a sequence of datasets, union the rows such that tuples that exist in all datasets
|
|
appear in the final dataset at their mininum repetition amount. Can return either a
|
|
dataset with duplicate tuples or a dataset with a :count column.</p>
|
|
<p>Options:</p>
|
|
<ul>
|
|
<li><code>:count</code> - Name of count column, if nil then tuples are duplicated and count is implicit.</li>
|
|
</ul>
|
|
<pre><code class="language-clojure">user> (def ds-a (ds/->dataset [{:a 1 :b 2} {:a 1 :b 2} {:a 2 :b 3}]))
|
|
#'user/ds-a
|
|
user> (def ds-b (ds/->dataset [{:a 1 :b 2} {:a 1 :b 2} {:a 3 :b 3}]))
|
|
#'user/ds-b
|
|
user> (ds-set/reduce-intersection [ds-a ds-b])
|
|
_unnamed [2 2]:
|
|
|
|
| :a | :b |
|
|
|---:|---:|
|
|
| 1 | 2 |
|
|
| 1 | 2 |
|
|
user> (ds-set/reduce-intersection {:count :count} [ds-a ds-b])
|
|
_unnamed [1 3]:
|
|
|
|
| :a | :b | :count |
|
|
|---:|---:|-------:|
|
|
| 1 | 2 | 2 |
|
|
</code></pre>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/set.clj#L72">view source</a></div></div><div class="public anchor" id="var-reduce-union"><h3>reduce-union</h3><div class="usage"><code>(reduce-union options datasets)</code><code>(reduce-union datasets)</code></div><div class="doc"><div class="markdown"><p>Given a sequence of datasets, union the rows such that all tuples appear in the final
|
|
dataset at their maximum repetition amount. Can return either a dataset with duplicate
|
|
tuples or a dataset with a :count column.</p>
|
|
<p>Options:</p>
|
|
<ul>
|
|
<li><code>:count</code> - Name of count column, if nil then tuples are duplicated and count is implicit.</li>
|
|
</ul>
|
|
<pre><code class="language-clojure">user> (def ds-a (ds/->dataset [{:a 1 :b 2} {:a 1 :b 2} {:a 2 :b 3}]))
|
|
#'user/ds-a
|
|
user> (def ds-b (ds/->dataset [{:a 1 :b 2} {:a 1 :b 2} {:a 3 :b 3}]))
|
|
#'user/ds-b
|
|
user> (ds-set/reduce-union [ds-a ds-b])
|
|
_unnamed [4 2]:
|
|
|
|
| :a | :b |
|
|
|---:|---:|
|
|
| 2 | 3 |
|
|
| 3 | 3 |
|
|
| 1 | 2 |
|
|
| 1 | 2 |
|
|
user> (ds-set/reduce-union {:count :count} [ds-a ds-b])
|
|
_unnamed [3 3]:
|
|
|
|
| :a | :b | :count |
|
|
|---:|---:|-------:|
|
|
| 2 | 3 | 1 |
|
|
| 3 | 3 | 1 |
|
|
| 1 | 2 | 2 |
|
|
</code></pre>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/set.clj#L118">view source</a></div></div><div class="public anchor" id="var-union"><h3>union</h3><div class="usage"><code>(union a)</code><code>(union a b)</code><code>(union a b & args)</code></div><div class="doc"><div class="markdown"><p>Union two datasets producing a new dataset with the union of tuples. Repeated tuples will
|
|
be repeated in final dataset at their maximum per-dataset repetition count.</p>
|
|
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/set.clj#L166">view source</a></div></div></div></body></html> |