Files
df-research/tech.ml.dataset/docs/tech.v3.dataset.rolling.html
2026-02-08 11:20:43 -10:00

163 lines
22 KiB
HTML
Vendored

<!DOCTYPE html PUBLIC ""
"">
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.rolling documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">8.003</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch current"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -641px;"><span class="top" style="height: 650px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.rolling.html#var-expanding"><div class="inner"><span>expanding</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.rolling.html#var-first"><div class="inner"><span>first</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.rolling.html#var-last"><div class="inner"><span>last</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.rolling.html#var-max"><div class="inner"><span>max</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.rolling.html#var-mean"><div class="inner"><span>mean</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.rolling.html#var-min"><div class="inner"><span>min</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.rolling.html#var-nth"><div class="inner"><span>nth</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.rolling.html#var-rolling"><div class="inner"><span>rolling</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.rolling.html#var-standard-deviation"><div class="inner"><span>standard-deviation</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.rolling.html#var-sum"><div class="inner"><span>sum</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.rolling.html#var-variance"><div class="inner"><span>variance</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.rolling</h1><div class="doc"><div class="markdown"><p>Implement a generalized rolling window including support for time-based variable
width windows.</p>
</div></div><div class="public anchor" id="var-expanding"><h3>expanding</h3><div class="usage"><code>(expanding ds reducer-map)</code></div><div class="doc"><div class="markdown"><p>Run a set of reducers across a dataset with an expanding set of windows. These
will produce a cumsum-type operation.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/rolling.clj#L307">view source</a></div></div><div class="public anchor" id="var-first"><h3>first</h3><div class="usage"><code>(first column-name)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/rolling.clj#L64">view source</a></div></div><div class="public anchor" id="var-last"><h3>last</h3><div class="usage"><code>(last column-name)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/rolling.clj#L70">view source</a></div></div><div class="public anchor" id="var-max"><h3>max</h3><div class="usage"><code>(max column-name)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/rolling.clj#L37">view source</a></div></div><div class="public anchor" id="var-mean"><h3>mean</h3><div class="usage"><code>(mean column-name)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/rolling.clj#L17">view source</a></div></div><div class="public anchor" id="var-min"><h3>min</h3><div class="usage"><code>(min column-name)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/rolling.clj#L31">view source</a></div></div><div class="public anchor" id="var-nth"><h3>nth</h3><div class="usage"><code>(nth column-name nth-val)</code></div><div class="doc"><div class="markdown"><p>Get the nth window value</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/rolling.clj#L57">view source</a></div></div><div class="public anchor" id="var-rolling"><h3>rolling</h3><div class="usage"><code>(rolling ds window reducer-map options)</code><code>(rolling ds window reducer-map)</code></div><div class="doc"><div class="markdown"><p>Perform a rolling window operation appending columns to the original dataset.</p>
<ul>
<li>ds - src dataset.</li>
<li>window - either an integer for fixed window sizes or a map describing the window
operation containing keys:
<ul>
<li><code>:window-type</code> - either <code>:fixed</code> or <code>:variable</code>. For variable window operations
<code>:column-name</code> must be a monotonically increasing column.</li>
<li><code>:window-size</code> - for fixed window operation must be a positive integer. For
variable window operations must be a double value which is produced via a
comparison function.</li>
<li><code>:relative-window-position</code> - describes where the window is
positioned. Operations are <code>:left</code>, <code>:center</code>, <code>:right</code> and defaults to
<code>:center</code> for fixed and <code>:right</code> for relative window types.</li>
<li><code>:edge-mode</code> - for fixed windows describes what values to fill in at the edges
of the source column. Options are <code>:zero</code> which is 0 for numeric types and <code>nil</code>
for object types and <code>:clamp</code> which fills in the first,last values of the column
respectively. Defaults to <code>:clamp</code>.</li>
<li><code>:comp-fn</code> - if provided must return a double which is the result of comparing
the last value of the range to the first which means <code>clojure.core/-</code>
is a reasonable default.</li>
<li><code>:units</code> - for datetime types, describes the units of <code>:window-size</code> and will
dictate the numeric space if <code>:comp-fn</code> is not provided.</li>
</ul>
</li>
<li>reducer-map - A map of result column name to reducer map. The reducer map is a
map which must contain at least <code>{:column-name :reducer}</code> where reducer is an ifn
that is passed each window. The result column is scanned to ascertain datatype and
missing value status. Multi-column reducers are supported if column-name is a vector
of column names. In that case each column's window is passed to the reducer. The
reducer can also specify the final datatype if <code>:datatype</code> is a key in the map. Beware,
however, that this disables missing value detection for integer datatypes.</li>
</ul>
<p><strong>Fixed Window Examples:</strong></p>
<pre><code class="language-clojure">user&gt; (def test-ds (ds/-&gt;dataset {:a (map #(Math/sin (double %))
(range 0 200 0.1))}))
#'user/test-ds
user&gt; (ds/head (ds-roll/rolling test-ds 10 {:mean (ds-roll/mean :a)
:min (ds-roll/min :a)
:max (ds-roll/max :a)}))
_unnamed [5 4]:
| :a | :mean | :min | :max |
|-----------:|-----------:|-----:|-----------:|
| 0.00000000 | 0.09834413 | 0.0 | 0.38941834 |
| 0.09983342 | 0.14628668 | 0.0 | 0.47942554 |
| 0.19866933 | 0.20275093 | 0.0 | 0.56464247 |
| 0.29552021 | 0.26717270 | 0.0 | 0.64421769 |
| 0.38941834 | 0.33890831 | 0.0 | 0.71735609 |
user&gt; (ds/head (ds-roll/rolling test-ds
{:window-type :fixed
:window-size 10
:relative-window-position :left}
{:mean (ds-roll/mean :a)
:min (ds-roll/min :a)
:max (ds-roll/max :a)}))
_unnamed [5 4]:
| :a | :mean | :min | :max |
|-----------:|-----------:|-----:|-----------:|
| 0.00000000 | 0.00000000 | 0.0 | 0.00000000 |
| 0.09983342 | 0.00998334 | 0.0 | 0.09983342 |
| 0.19866933 | 0.02985027 | 0.0 | 0.19866933 |
| 0.29552021 | 0.05940230 | 0.0 | 0.29552021 |
| 0.38941834 | 0.09834413 | 0.0 | 0.38941834 |
user&gt; (ds/head (ds-roll/rolling test-ds
{:window-type :fixed
:window-size 10
:relative-window-position :right}
{:mean (ds-roll/mean :a)
:min (ds-roll/min :a)
:max (ds-roll/max :a)}))
_unnamed [5 4]:
| :a | :mean | :min | :max |
|-----------:|-----------:|-----------:|-----------:|
| 0.00000000 | 0.41724100 | 0.00000000 | 0.78332691 |
| 0.09983342 | 0.50138810 | 0.09983342 | 0.84147098 |
| 0.19866933 | 0.58052549 | 0.19866933 | 0.89120736 |
| 0.29552021 | 0.65386247 | 0.29552021 | 0.93203909 |
| 0.38941834 | 0.72066627 | 0.38941834 | 0.96355819 |
user&gt; ;;Multi column reducer
user&gt; (ds/head (ds-roll/rolling test-ds 10
{:c {:column-name [:a :a]
:reducer (fn [a b]
(Math/round
(+ (dfn/sum a) (dfn/sum b))))
:datatype :int16}}))
_unnamed [5 2]:
| :a | :c |
|-----------:|---:|
| 0.00000000 | 2 |
| 0.09983342 | 3 |
| 0.19866933 | 4 |
| 0.29552021 | 5 |
| 0.38941834 | 7 |
</code></pre>
<p><strong>Variable Window Examples:</strong></p>
<pre><code class="language-clojure">user&gt; (def stocks (ds/-&gt;dataset "test/data/stocks.csv" {:key-fn keyword}))
#'user/stocks
user&gt; ;;variable window column must be monotonically increasing
user&gt; (def stocks (ds/sort-by-column stocks :date))
#'user/stocks
user&gt; (ds/head stocks)
test/data/stocks.csv [5 3]:
| :symbol | :date | :price |
|---------|------------|-------:|
| AAPL | 2000-01-01 | 25.94 |
| IBM | 2000-01-01 | 100.52 |
| MSFT | 2000-01-01 | 39.81 |
| AMZN | 2000-01-01 | 64.56 |
| AAPL | 2000-02-01 | 28.66 |
user&gt; (ds/head (ds-roll/rolling stocks
{:window-type :variable
:column-name :date
:units :days
:window-size 3}
{:price-mean-3d (ds-roll/mean :price)
:price-max-3d (ds-roll/max :price)
:price-min-3d (ds-roll/min :price)}))
test/data/stocks.csv [5 6]:
| :symbol | :date | :price | :price-mean-3d | :price-max-3d | :price-min-3d |
|---------|------------|-------:|---------------:|--------------:|--------------:|
| AAPL | 2000-01-01 | 25.94 | 57.70750000 | 100.52 | 25.94 |
| IBM | 2000-01-01 | 100.52 | 68.29666667 | 100.52 | 39.81 |
| MSFT | 2000-01-01 | 39.81 | 52.18500000 | 64.56 | 39.81 |
| AMZN | 2000-01-01 | 64.56 | 64.56000000 | 64.56 | 64.56 |
| AAPL | 2000-02-01 | 28.66 | 56.49750000 | 92.11 | 28.66 |
user&gt; (ds/head (ds-roll/rolling stocks
{:window-type :variable
:column-name :date
:units :months
:window-size 3}
{:price-mean-3d (ds-roll/mean :price)
:price-max-3d (ds-roll/max :price)
:price-min-3d (ds-roll/min :price)}))
test/data/stocks.csv [5 6]:
| :symbol | :date | :price | :price-mean-3d | :price-max-3d | :price-min-3d |
|---------|------------|-------:|---------------:|--------------:|--------------:|
| AAPL | 2000-01-01 | 25.94 | 58.92500000 | 106.11 | 25.94 |
| IBM | 2000-01-01 | 100.52 | 61.92363636 | 106.11 | 28.66 |
| MSFT | 2000-01-01 | 39.81 | 58.06400000 | 106.11 | 28.66 |
| AMZN | 2000-01-01 | 64.56 | 60.09222222 | 106.11 | 28.66 |
| AAPL | 2000-02-01 | 28.66 | 57.56583333 | 106.11 | 28.37 |
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/rolling.clj#L116">view source</a></div></div><div class="public anchor" id="var-standard-deviation"><h3>standard-deviation</h3><div class="usage"><code>(standard-deviation column-name)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/rolling.clj#L50">view source</a></div></div><div class="public anchor" id="var-sum"><h3>sum</h3><div class="usage"><code>(sum column-name)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/rolling.clj#L24">view source</a></div></div><div class="public anchor" id="var-variance"><h3>variance</h3><div class="usage"><code>(variance column-name)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/rolling.clj#L43">view source</a></div></div></div></body></html>