Files
df-research/tech.ml.dataset/docs/tech.v3.dataset.metamorph.html
2026-02-08 11:20:43 -10:00

805 lines
99 KiB
HTML
Vendored

<!DOCTYPE html PUBLIC ""
"">
<html><head><meta charset="UTF-8" /><title>tech.v3.dataset.metamorph documentation</title><script async="true" src="https://www.googletagmanager.com/gtag/js?id=G-RGTB4J7LGP"></script><script>window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-95TVFC1FEB');</script><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">TMD</span> <span class="project-version">8.003</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="000-getting-started.html"><div class="inner"><span>tech.ml.dataset Getting Started</span></div></a></li><li class="depth-1 "><a href="100-walkthrough.html"><div class="inner"><span>tech.ml.dataset Walkthrough</span></div></a></li><li class="depth-1 "><a href="200-quick-reference.html"><div class="inner"><span>tech.ml.dataset Quick Reference</span></div></a></li><li class="depth-1 "><a href="columns-readers-and-datatypes.html"><div class="inner"><span>tech.ml.dataset Columns, Readers, and Datatypes</span></div></a></li><li class="depth-1 "><a href="nippy-serialization-rocks.html"><div class="inner"><span>tech.ml.dataset And nippy</span></div></a></li><li class="depth-1 "><a href="supported-datatypes.html"><div class="inner"><span>tech.ml.dataset Supported Datatypes</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.dataset.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dataset</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.categorical.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>categorical</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.clipboard.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clipboard</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.column-filters.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>column-filters</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>io</span></div></div></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.csv.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>csv</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-5 branch"><a href="tech.v3.dataset.io.string-row-parser.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>string-row-parser</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.io.univocity.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>univocity</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.join.html"><div class="inner"><span class="tree" style="top: -145px;"><span class="top" style="height: 154px;"></span><span class="bottom"></span></span><span>join</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.math.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>math</span></div></a></li><li class="depth-4 branch current"><a href="tech.v3.dataset.metamorph.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>metamorph</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.modelling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>modelling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.print.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>print</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-5"><a href="tech.v3.dataset.reductions.apache-data-sketch.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>apache-data-sketch</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.rolling.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.set.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>set</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.dataset.tensor.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4"><a href="tech.v3.dataset.zip.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>zip</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -641px;"><span class="top" style="height: 650px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.arrow.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>arrow</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.clj-transit.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>clj-transit</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.fastexcel.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>fastexcel</span></div></a></li><li class="depth-4"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>guava</span></div></div></li><li class="depth-5"><a href="tech.v3.libs.guava.cache.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>cache</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.parquet.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>parquet</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.libs.poi.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>poi</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.tribuo.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tribuo</span></div></a></li></ul></div><div class="sidebar secondary"><h3><a href="#top"><span class="inner">Public Vars</span></a></h3><ul><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-add-column"><div class="inner"><span>add-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-add-or-update-column"><div class="inner"><span>add-or-update-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-append-columns"><div class="inner"><span>append-columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-assoc-ds"><div class="inner"><span>assoc-ds</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-assoc-metadata"><div class="inner"><span>assoc-metadata</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-brief"><div class="inner"><span>brief</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-build-pipelined-function"><div class="inner"><span>build-pipelined-function</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-categorical-.3Enumber"><div class="inner"><span>categorical-&gt;number</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-categorical-.3Eone-hot"><div class="inner"><span>categorical-&gt;one-hot</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-column"><div class="inner"><span>column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-column-.3Edataset"><div class="inner"><span>column-&gt;dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-column-cast"><div class="inner"><span>column-cast</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-column-count"><div class="inner"><span>column-count</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-column-labeled-mapseq"><div class="inner"><span>column-labeled-mapseq</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-column-map"><div class="inner"><span>column-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-column-names"><div class="inner"><span>column-names</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-column-values-.3Ecategorical"><div class="inner"><span>column-values-&gt;categorical</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-columns"><div class="inner"><span>columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-columns-with-missing-seq"><div class="inner"><span>columns-with-missing-seq</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-columnwise-concat"><div class="inner"><span>columnwise-concat</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-concat"><div class="inner"><span>concat</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-concat-copying"><div class="inner"><span>concat-copying</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-concat-inplace"><div class="inner"><span>concat-inplace</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-data-.3Edataset"><div class="inner"><span>data-&gt;dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-dataset-.3Ecategorical-xforms"><div class="inner"><span>dataset-&gt;categorical-xforms</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-dataset-.3Edata"><div class="inner"><span>dataset-&gt;data</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-dataset-name"><div class="inner"><span>dataset-name</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-dataset.3F"><div class="inner"><span>dataset?</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-descriptive-stats"><div class="inner"><span>descriptive-stats</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-drop-columns"><div class="inner"><span>drop-columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-drop-missing"><div class="inner"><span>drop-missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-drop-rows"><div class="inner"><span>drop-rows</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-empty-column-names"><div class="inner"><span>empty-column-names</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-empty-dataset"><div class="inner"><span>empty-dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-ensure-array-backed"><div class="inner"><span>ensure-array-backed</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-feature-ecount"><div class="inner"><span>feature-ecount</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-filter"><div class="inner"><span>filter</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-filter-column"><div class="inner"><span>filter-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-filter-dataset"><div class="inner"><span>filter-dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-group-by"><div class="inner"><span>group-by</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-group-by-.3Eindexes"><div class="inner"><span>group-by-&gt;indexes</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-group-by-column"><div class="inner"><span>group-by-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-group-by-column-.3Eindexes"><div class="inner"><span>group-by-column-&gt;indexes</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-group-by-column-consumer"><div class="inner"><span>group-by-column-consumer</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-has-column.3F"><div class="inner"><span>has-column?</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-head"><div class="inner"><span>head</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-induction"><div class="inner"><span>induction</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-inference-column.3F"><div class="inner"><span>inference-column?</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-inference-target-column-names"><div class="inner"><span>inference-target-column-names</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-inference-target-ds"><div class="inner"><span>inference-target-ds</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-inference-target-label-inverse-map"><div class="inner"><span>inference-target-label-inverse-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-inference-target-label-map"><div class="inner"><span>inference-target-label-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-k-fold-datasets"><div class="inner"><span>k-fold-datasets</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-labels"><div class="inner"><span>labels</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-mapseq-reader"><div class="inner"><span>mapseq-reader</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-min-n-by-column"><div class="inner"><span>min-n-by-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-missing"><div class="inner"><span>missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-model-type"><div class="inner"><span>model-type</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-new-column"><div class="inner"><span>new-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-new-dataset"><div class="inner"><span>new-dataset</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-num-inference-classes"><div class="inner"><span>num-inference-classes</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-order-column-names"><div class="inner"><span>order-column-names</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-pmap-ds"><div class="inner"><span>pmap-ds</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-print-all"><div class="inner"><span>print-all</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-probability-distributions-.3Elabel-column"><div class="inner"><span>probability-distributions-&gt;label-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-rand-nth"><div class="inner"><span>rand-nth</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-remove-column"><div class="inner"><span>remove-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-remove-columns"><div class="inner"><span>remove-columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-remove-empty-columns"><div class="inner"><span>remove-empty-columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-remove-rows"><div class="inner"><span>remove-rows</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-rename-columns"><div class="inner"><span>rename-columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-replace-missing"><div class="inner"><span>replace-missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-replace-missing-value"><div class="inner"><span>replace-missing-value</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-reverse-rows"><div class="inner"><span>reverse-rows</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-row-at"><div class="inner"><span>row-at</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-row-count"><div class="inner"><span>row-count</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-row-map"><div class="inner"><span>row-map</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-row-mapcat"><div class="inner"><span>row-mapcat</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-rows"><div class="inner"><span>rows</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-rowvec-at"><div class="inner"><span>rowvec-at</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-rowvecs"><div class="inner"><span>rowvecs</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-sample"><div class="inner"><span>sample</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-select"><div class="inner"><span>select</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-select-by-index"><div class="inner"><span>select-by-index</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-select-columns"><div class="inner"><span>select-columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-select-columns-by-index"><div class="inner"><span>select-columns-by-index</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-select-missing"><div class="inner"><span>select-missing</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-select-rows"><div class="inner"><span>select-rows</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-set-dataset-name"><div class="inner"><span>set-dataset-name</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-set-inference-target"><div class="inner"><span>set-inference-target</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-shape"><div class="inner"><span>shape</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-shuffle"><div class="inner"><span>shuffle</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-sort-by"><div class="inner"><span>sort-by</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-sort-by-column"><div class="inner"><span>sort-by-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-tail"><div class="inner"><span>tail</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-take-nth"><div class="inner"><span>take-nth</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-train-test-split"><div class="inner"><span>train-test-split</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-unique-by"><div class="inner"><span>unique-by</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-unique-by-column"><div class="inner"><span>unique-by-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-unordered-select"><div class="inner"><span>unordered-select</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-unroll-column"><div class="inner"><span>unroll-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-update"><div class="inner"><span>update</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-update-column"><div class="inner"><span>update-column</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-update-columns"><div class="inner"><span>update-columns</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-update-columnwise"><div class="inner"><span>update-columnwise</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-update-elemwise"><div class="inner"><span>update-elemwise</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-value-reader"><div class="inner"><span>value-reader</span></div></a></li><li class="depth-1"><a href="tech.v3.dataset.metamorph.html#var-write.21"><div class="inner"><span>write!</span></div></a></li></ul></div><div class="namespace-docs" id="content"><h1 class="anchor" id="top">tech.v3.dataset.metamorph</h1><div class="doc"><div class="markdown"><p>This is an auto-generated api system - it scans the namespaces and changes the first
to be metamorph-compliant which means transforming an argument that is just a dataset into
an argument that is a metamorph context - a map of <code>{:metamorph/data ds}</code>. They also return
their result as a metamorph context.</p>
</div></div><div class="public anchor" id="var-add-column"><h3>add-column</h3><div class="usage"><code>(add-column column)</code></div><div class="doc"><div class="markdown"><p>Add a new column. Error if name collision</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L10">view source</a></div></div><div class="public anchor" id="var-add-or-update-column"><h3>add-or-update-column</h3><div class="usage"><code>(add-or-update-column colname column)</code><code>(add-or-update-column column)</code></div><div class="doc"><div class="markdown"><p>If column exists, replace. Else append new column.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L16">view source</a></div></div><div class="public anchor" id="var-append-columns"><h3>append-columns</h3><div class="usage"><code>(append-columns column-seq)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L24">view source</a></div></div><div class="public anchor" id="var-assoc-ds"><h3>assoc-ds</h3><div class="usage"><code>(assoc-ds cname cdata &amp; args)</code></div><div class="doc"><div class="markdown"><p>If dataset is not nil, calls <code>clojure.core/assoc</code>. Else creates a new empty dataset and
then calls <code>clojure.core/assoc</code>. Guaranteed to return a dataset (unlike assoc).</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L29">view source</a></div></div><div class="public anchor" id="var-assoc-metadata"><h3>assoc-metadata</h3><div class="usage"><code>(assoc-metadata filter-fn-or-ds k v &amp; args)</code></div><div class="doc"><div class="markdown"><p>Set metadata across a set of columns.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L36">view source</a></div></div><div class="public anchor" id="var-brief"><h3>brief</h3><div class="usage"><code>(brief options)</code><code>(brief)</code></div><div class="doc"><div class="markdown"><p>Get a brief description, in mapseq form of a dataset. A brief description is
the mapseq form of descriptive stats.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L42">view source</a></div></div><div class="public anchor" id="var-build-pipelined-function"><h3>build-pipelined-function</h3><h4 class="type">macro</h4><div class="usage"><code>(build-pipelined-function f m)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L51">view source</a></div></div><div class="public anchor" id="var-categorical-.3Enumber"><h3>categorical-&gt;number</h3><div class="usage"><code>(categorical-&gt;number filter-fn-or-ds)</code><code>(categorical-&gt;number filter-fn-or-ds table-args)</code><code>(categorical-&gt;number filter-fn-or-ds table-args result-datatype)</code></div><div class="doc"><div class="markdown"><p>Convert columns into a discrete , numeric representation
See tech.v3.dataset.categorical/fit-categorical-map.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L56">view source</a></div></div><div class="public anchor" id="var-categorical-.3Eone-hot"><h3>categorical-&gt;one-hot</h3><div class="usage"><code>(categorical-&gt;one-hot filter-fn-or-ds)</code><code>(categorical-&gt;one-hot filter-fn-or-ds table-args)</code><code>(categorical-&gt;one-hot filter-fn-or-ds table-args result-datatype)</code></div><div class="doc"><div class="markdown"><p>Convert string columns to numeric columns.
See tech.v3.dataset.categorical/fit-one-hot</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L67">view source</a></div></div><div class="public anchor" id="var-column"><h3>column</h3><div class="usage"><code>(column colname)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L78">view source</a></div></div><div class="public anchor" id="var-column-.3Edataset"><h3>column-&gt;dataset</h3><div class="usage"><code>(column-&gt;dataset colname transform-fn options)</code><code>(column-&gt;dataset colname transform-fn)</code></div><div class="doc"><div class="markdown"><p>Transform a column into a sequence of maps using transform-fn.
Return dataset created out of the sequence of maps.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L83">view source</a></div></div><div class="public anchor" id="var-column-cast"><h3>column-cast</h3><div class="usage"><code>(column-cast colname datatype)</code><code>(column-cast colname datatype options)</code></div><div class="doc"><div class="markdown"><p>Cast a column to a new datatype. This is never a lazy operation. If the old
and new datatypes match and no cast-fn is provided then dtype/clone is called
on the column.</p>
<p>colname may be a scalar or a tuple of <a href="src-col dst-col">src-col dst-col</a>.</p>
<p>datatype may be a datatype enumeration or a tuple of
<a href="datatype cast-fn">datatype cast-fn</a> where cast-fn may return either a new value,
:tech.v3.dataset/missing, or :tech.v3.dataset/parse-failure.
Exceptions are propagated to the caller. The new column has at least the
existing missing set (if no attempt returns :missing or :cast-failure).
:cast-failure means the value gets added to metadata key :unparsed-data
and the index gets added to :unparsed-indexes.</p>
<p>If the existing datatype is string, then tech.v3.datatype.column/parse-column
is called.</p>
<p>Casts between numeric datatypes need no cast-fn but one may be provided.
Casts to string need no cast-fn but one may be provided.
Casts from string to anything will call tech.v3.dataset.column/parse-column.</p>
<p>Options:</p>
<ul>
<li><code>:track-parse-errors</code> - defaults to false. When true extra metadata keys
<code>:unparsed-indexes :unparsed-data</code> will be appended to the metadata. Be aware
these values may not serialize as unparsed indexes is a roaring bitmap.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L92">view source</a></div></div><div class="public anchor" id="var-column-count"><h3>column-count</h3><div class="usage"><code>(column-count)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L126">view source</a></div></div><div class="public anchor" id="var-column-labeled-mapseq"><h3>column-labeled-mapseq</h3><div class="usage"><code>(column-labeled-mapseq value-colname-seq)</code></div><div class="doc"><div class="markdown"><p>Given a dataset, return a sequence of maps where several columns are all stored
in a :value key and a :label key contains a column name. Used for quickly creating
timeseries or scatterplot labeled graphs. Returns a lazy sequence, not a reader!</p>
<p>See also <code>columnwise-concat</code></p>
<p>Return a sequence of maps with</p>
<pre><code class="language-clojure"> {... - columns not in colname-seq
:value - value from one of the value columns
:label - name of the column the value came from
}
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L131">view source</a></div></div><div class="public anchor" id="var-column-map"><h3>column-map</h3><div class="usage"><code>(column-map result-colname map-fn res-dtype-or-opts filter-fn-or-ds)</code><code>(column-map result-colname map-fn filter-fn-or-ds)</code><code>(column-map result-colname map-fn)</code></div><div class="doc"><div class="markdown"><p>Produce a new (or updated) column as the result of mapping a fn over columns. This
function is never lazy - all results are immediately calculated.</p>
<ul>
<li><code>dataset</code> - dataset.</li>
<li><code>result-colname</code> - Name of new (or existing) column.</li>
<li><code>map-fn</code> - function to map over columns. Same rules as <code>tech.v3.datatype/emap</code>.</li>
<li><code>res-dtype-or-opts</code> - If not given result is scanned to infer missing and datatype.
If using an option map, options are described below.</li>
<li><code>filter-fn-or-ds</code> - A dataset, a sequence of columns, or a <code>tech.v3.datasets/column-filters</code>
column filter function. Defaults to all the columns of the existing dataset.</li>
</ul>
<p>Returns a new dataset with a new or updated column.</p>
<p>Options:</p>
<ul>
<li><code>:datatype</code> - Set the dataype of the result column. If not given result is scanned
to infer result datatype and missing set.</li>
<li><code>:missing-fn</code> - if given, columns are first passed to missing-fn as a sequence and
this dictates the missing set. Else the missing set is by scanning the results
during the inference process. See <code>tech.v3.dataset.column/union-missing-sets</code> and
<code>tech.v3.dataset.column/intersect-missing-sets</code> for example functions to pass in
here.</li>
</ul>
<p>Examples:</p>
<pre><code class="language-clojure">
;;From the tests --
(let [testds (ds/-&gt;dataset [{:a 1.0 :b 2.0} {:a 3.0 :b 5.0} {:a 4.0 :b nil}])]
;;result scanned for both datatype and missing set
(is (= (vec [3.0 6.0 nil])
(:b2 (ds/column-map testds :b2 #(when % (inc %)) [:b]))))
;;result scanned for missing set only. Result used in-place.
(is (= (vec [3.0 6.0 nil])
(:b2 (ds/column-map testds :b2 #(when % (inc %))
{:datatype :float64} [:b]))))
;;Nothing scanned at all.
(is (= (vec [3.0 6.0 nil])
(:b2 (ds/column-map testds :b2 #(inc %)
{:datatype :float64
:missing-fn ds-col/union-missing-sets} [:b]))))
;;Missing set scanning causes NPE at inc.
(is (thrown? Throwable
(ds/column-map testds :b2 #(inc %)
{:datatype :float64}
[:b]))))
;;Ad-hoc repl --
user&gt; (require '[tech.v3.dataset :as ds]))
nil
user&gt; (def ds (ds/-&gt;dataset "test/data/stocks.csv"))
#'user/ds
user&gt; (ds/head ds)
test/data/stocks.csv [5 3]:
| symbol | date | price |
|--------|------------|-------|
| MSFT | 2000-01-01 | 39.81 |
| MSFT | 2000-02-01 | 36.35 |
| MSFT | 2000-03-01 | 43.22 |
| MSFT | 2000-04-01 | 28.37 |
| MSFT | 2000-05-01 | 25.45 |
user&gt; (-&gt; (ds/column-map ds "price^2" #(* % %) ["price"])
(ds/head))
test/data/stocks.csv [5 4]:
| symbol | date | price | price^2 |
|--------|------------|-------|-----------|
| MSFT | 2000-01-01 | 39.81 | 1584.8361 |
| MSFT | 2000-02-01 | 36.35 | 1321.3225 |
| MSFT | 2000-03-01 | 43.22 | 1867.9684 |
| MSFT | 2000-04-01 | 28.37 | 804.8569 |
| MSFT | 2000-05-01 | 25.45 | 647.7025 |
user&gt; (def ds1 (ds/-&gt;dataset [{:a 1} {:b 2.0} {:a 2 :b 3.0}]))
#'user/ds1
user&gt; ds1
_unnamed [3 2]:
| :b | :a |
|----:|---:|
| | 1 |
| 2.0 | |
| 3.0 | 2 |
user&gt; (ds/column-map ds1 :c (fn [a b]
(when (and a b)
(+ (double a) (double b))))
[:a :b])
_unnamed [3 3]:
| :b | :a | :c |
|----:|---:|----:|
| | 1 | |
| 2.0 | | |
| 3.0 | 2 | 5.0 |
user&gt; (ds/missing (*1 :c))
{0,1}
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L149">view source</a></div></div><div class="public anchor" id="var-column-names"><h3>column-names</h3><div class="usage"><code>(column-names)</code></div><div class="doc"><div class="markdown"><p>In-order sequence of column names</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L261">view source</a></div></div><div class="public anchor" id="var-column-values-.3Ecategorical"><h3>column-values-&gt;categorical</h3><div class="usage"><code>(column-values-&gt;categorical src-column)</code></div><div class="doc"><div class="markdown"><p>Given a column encoded via either string-&gt;number or one-hot, reverse
map to the a sequence of the original string column values.
In the case of one-hot mappings, src-column must be the original
column name before the one-hot map</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L267">view source</a></div></div><div class="public anchor" id="var-columns"><h3>columns</h3><div class="usage"><code>(columns)</code></div><div class="doc"><div class="markdown"><p>Return sequence of all columns in dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L276">view source</a></div></div><div class="public anchor" id="var-columns-with-missing-seq"><h3>columns-with-missing-seq</h3><div class="usage"><code>(columns-with-missing-seq)</code></div><div class="doc"><div class="markdown"><p>Return a sequence of:</p>
<pre><code class="language-clojure"> {:column-name column-name
:missing-count missing-count
}
</code></pre>
<p>or nil of no columns are missing data.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L282">view source</a></div></div><div class="public anchor" id="var-columnwise-concat"><h3>columnwise-concat</h3><div class="usage"><code>(columnwise-concat colnames options)</code><code>(columnwise-concat colnames)</code></div><div class="doc"><div class="markdown"><p>Given a dataset and a list of columns, produce a new dataset with
the columns concatenated to a new column with a :column column indicating
which column the original value came from. Any columns not mentioned in the
list of columns are duplicated.</p>
<p>Example:</p>
<pre><code class="language-clojure">user&gt; (-&gt; [{:a 1 :b 2 :c 3 :d 1} {:a 4 :b 5 :c 6 :d 2}]
(ds/-&gt;dataset)
(ds/columnwise-concat [:c :a :b]))
null [6 3]:
| :column | :value | :d |
|---------+--------+----|
| :c | 3 | 1 |
| :c | 6 | 2 |
| :a | 1 | 1 |
| :a | 4 | 2 |
| :b | 2 | 1 |
| :b | 5 | 2 |
</code></pre>
<p>Options:</p>
<p>value-column-name - defaults to :value
colname-column-name - defaults to :column</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L294">view source</a></div></div><div class="public anchor" id="var-concat"><h3>concat</h3><div class="usage"><code>(concat &amp; args)</code><code>(concat)</code></div><div class="doc"><div class="markdown"><p>Concatenate datasets using a copying-concatenation.
See also <a href="tech.v3.dataset.html#var-concat-inplace">concat-inplace</a> as it may be more efficient for your use case if you have
a small number (like less than 3) of datasets.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L328">view source</a></div></div><div class="public anchor" id="var-concat-copying"><h3>concat-copying</h3><div class="usage"><code>(concat-copying &amp; args)</code><code>(concat-copying)</code></div><div class="doc"><div class="markdown"><p>Concatenate datasets into a new dataset copying data. Respects missing values.
Datasets must all have the same columns. Result column datatypes will be a widening
cast of the datatypes.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L338">view source</a></div></div><div class="public anchor" id="var-concat-inplace"><h3>concat-inplace</h3><div class="usage"><code>(concat-inplace &amp; args)</code><code>(concat-inplace)</code></div><div class="doc"><div class="markdown"><p>Concatenate datasets in place. Respects missing values. Datasets must all have the
same columns. Result column datatypes will be a widening cast of the datatypes.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L348">view source</a></div></div><div class="public anchor" id="var-data-.3Edataset"><h3>data-&gt;dataset</h3><div class="usage"><code>(data-&gt;dataset)</code></div><div class="doc"><div class="markdown"><p>Convert a data-ized dataset created via dataset-&gt;data back into a
full dataset</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L357">view source</a></div></div><div class="public anchor" id="var-dataset-.3Ecategorical-xforms"><h3>dataset-&gt;categorical-xforms</h3><div class="usage"><code>(dataset-&gt;categorical-xforms)</code></div><div class="doc"><div class="markdown"><p>Given a dataset, return a map of column-name-&gt;xform information.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L364">view source</a></div></div><div class="public anchor" id="var-dataset-.3Edata"><h3>dataset-&gt;data</h3><div class="usage"><code>(dataset-&gt;data)</code></div><div class="doc"><div class="markdown"><p>Convert a dataset to a pure clojure datastructure. Returns a map with two keys:
{:metadata :columns}.
:columns is a vector of column definitions appropriate for passing directly back
into new-dataset.
A column definition in this case is a map of {:name :missing :data :metadata}.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L370">view source</a></div></div><div class="public anchor" id="var-dataset-name"><h3>dataset-name</h3><div class="usage"><code>(dataset-name)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L380">view source</a></div></div><div class="public anchor" id="var-dataset.3F"><h3>dataset?</h3><div class="usage"><code>(dataset?)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L385">view source</a></div></div><div class="public anchor" id="var-descriptive-stats"><h3>descriptive-stats</h3><div class="usage"><code>(descriptive-stats)</code><code>(descriptive-stats options)</code></div><div class="doc"><div class="markdown"><p>Get descriptive statistics across the columns of the dataset.
In addition to the standard stats.
Options:
:stat-names - defaults to (remove #{:values :num-distinct-values}
(all-descriptive-stats-names))
:n-categorical-values - Number of categorical values to report in the 'values'
field. Defaults to 21.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L390">view source</a></div></div><div class="public anchor" id="var-drop-columns"><h3>drop-columns</h3><div class="usage"><code>(drop-columns colname-seq-or-fn)</code></div><div class="doc"><div class="markdown"><p>Same as remove-columns. Remove columns indexed by column name seq or
column filter function.
For example:</p>
<pre><code class="language-clojure">(drop-columns DS [:A :B])
(drop-columns DS cf/categorical)
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L404">view source</a></div></div><div class="public anchor" id="var-drop-missing"><h3>drop-missing</h3><div class="usage"><code>(drop-missing)</code><code>(drop-missing colname)</code></div><div class="doc"><div class="markdown"><p>Remove missing entries by simply selecting out the missing indexes.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L417">view source</a></div></div><div class="public anchor" id="var-drop-rows"><h3>drop-rows</h3><div class="usage"><code>(drop-rows row-indexes)</code></div><div class="doc"><div class="markdown"><p>Drop rows from dataset or column</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L425">view source</a></div></div><div class="public anchor" id="var-empty-column-names"><h3>empty-column-names</h3><div class="usage"><code>(empty-column-names)</code></div><div class="doc"><div class="markdown"><p>Return a sequence of column names whose empty set length matches the row count of the dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L431">view source</a></div></div><div class="public anchor" id="var-empty-dataset"><h3>empty-dataset</h3><div class="usage"><code>(empty-dataset)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L437">view source</a></div></div><div class="public anchor" id="var-ensure-array-backed"><h3>ensure-array-backed</h3><div class="usage"><code>(ensure-array-backed options)</code><code>(ensure-array-backed)</code></div><div class="doc"><div class="markdown"><p>Ensure the column data in the dataset is stored in pure java arrays. This is
sometimes necessary for interop with other libraries and this operation will
force any lazy computations to complete. This also clears the missing set
for each column and writes the missing values to the new arrays.</p>
<p>Columns that are already array backed and that have no missing values are not
changed and retuned.</p>
<p>The postcondition is that dtype/-&gt;array will return a java array in the appropriate
datatype for each column.</p>
<p>Options:</p>
<ul>
<li><code>:unpack?</code> - unpack packed datetime types. Defaults to true</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L442">view source</a></div></div><div class="public anchor" id="var-feature-ecount"><h3>feature-ecount</h3><div class="usage"><code>(feature-ecount)</code></div><div class="doc"><div class="markdown"><p>Number of feature columns. Feature columns are columns that are not
inference targets.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L463">view source</a></div></div><div class="public anchor" id="var-filter"><h3>filter</h3><div class="usage"><code>(filter predicate)</code></div><div class="doc"><div class="markdown"><p>dataset-&gt;dataset transformation. Predicate is passed a map of
colname-&gt;column-value.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L470">view source</a></div></div><div class="public anchor" id="var-filter-column"><h3>filter-column</h3><div class="usage"><code>(filter-column colname predicate)</code><code>(filter-column colname)</code></div><div class="doc"><div class="markdown"><p>Filter a given column by a predicate. Predicate is passed column values.
If predicate is <em>not</em> an instance of Ifn it is treated as a value and will
be used as if the predicate is #(= value %).</p>
<p>The 2-arity form of this function reads the column as a boolean reader so for
instance numeric 0 values are false in that case as are Double/NaN, Float/NaN. Objects are
only false if nil?.</p>
<p>Returns a dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L477">view source</a></div></div><div class="public anchor" id="var-filter-dataset"><h3>filter-dataset</h3><div class="usage"><code>(filter-dataset filter-fn-or-ds)</code></div><div class="doc"><div class="markdown"><p>Filter the columns of the dataset returning a new dataset. This pathway is
designed to work with the tech.v3.dataset.column-filters namespace.</p>
<ul>
<li>If filter-fn-or-ds is a dataset, it is returned.</li>
<li>If filter-fn-or-ds is sequential, then select-columns is called.</li>
<li>If filter-fn-or-ds is :all, all columns are returned</li>
<li>If filter-fn-or-ds is an instance of IFn, the dataset is passed into it.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L493">view source</a></div></div><div class="public anchor" id="var-group-by"><h3>group-by</h3><div class="usage"><code>(group-by key-fn options)</code><code>(group-by key-fn)</code></div><div class="doc"><div class="markdown"><p>Produce a map of key-fn-value-&gt;dataset. The argument to key-fn
is a map of colname-&gt;column-value representing a row in dataset.
Each dataset in the resulting map contains all and only rows
that produce the same key-fn-value.</p>
<p>Options - options are passed into dtype arggroup:</p>
<ul>
<li><code>:group-by-finalizer</code> - when provided this is run on each dataset immediately after the
rows are selected. This can be used to immediately perform a reduction on each new
dataset which is faster than doing it in a separate run.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L505">view source</a></div></div><div class="public anchor" id="var-group-by-.3Eindexes"><h3>group-by-&gt;indexes</h3><div class="usage"><code>(group-by-&gt;indexes key-fn options)</code><code>(group-by-&gt;indexes key-fn)</code></div><div class="doc"><div class="markdown"><p>(Non-lazy) - Group a dataset and return a map of key-fn-value-&gt;indexes where indexes
is an in-order contiguous group of indexes.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L522">view source</a></div></div><div class="public anchor" id="var-group-by-column"><h3>group-by-column</h3><div class="usage"><code>(group-by-column colname options)</code><code>(group-by-column colname)</code></div><div class="doc"><div class="markdown"><p>Return a map of column-value-&gt;dataset. Each dataset in the
resulting map contains all and only rows with the same value in
column.</p>
<ul>
<li><code>:group-by-finalizer</code> - when provided this is run on each dataset immediately after the
rows are selected. This can be used to immediately perform a reduction on each new
dataset which is faster than doing it in a separate run.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L531">view source</a></div></div><div class="public anchor" id="var-group-by-column-.3Eindexes"><h3>group-by-column-&gt;indexes</h3><div class="usage"><code>(group-by-column-&gt;indexes colname options)</code><code>(group-by-column-&gt;indexes colname)</code></div><div class="doc"><div class="markdown"><p>(Non-lazy) - Group a dataset by a column return a map of column-val-&gt;indexes
where indexes is an in-order contiguous group of indexes.</p>
<p>Options are passed into dtype's arggroup method.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L545">view source</a></div></div><div class="public anchor" id="var-group-by-column-consumer"><h3>group-by-column-consumer</h3><div class="usage"><code>(group-by-column-consumer cname)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L556">view source</a></div></div><div class="public anchor" id="var-has-column.3F"><h3>has-column?</h3><div class="usage"><code>(has-column? column-name)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L561">view source</a></div></div><div class="public anchor" id="var-head"><h3>head</h3><div class="usage"><code>(head n)</code><code>(head)</code></div><div class="doc"><div class="markdown"><p>Get the first n row of a dataset. Equivalent to
`(select-rows ds (range n)). Arguments are reversed, however, so this can
be used in -&gt;&gt; operators.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L566">view source</a></div></div><div class="public anchor" id="var-induction"><h3>induction</h3><div class="usage"><code>(induction induct-fn &amp; args)</code></div><div class="doc"><div class="markdown"><p>Given a dataset and a function from dataset-&gt;row produce a new dataset.
The produced row will be merged with the current row and then added to the
dataset.</p>
<p>Options are same as the options used for <a href="tech.v3.dataset.html#var--.3Edataset">-&gt;dataset</a> in order for the
user to control the parsing of the return values of <code>induct-fn</code>.
A new dataset is returned.</p>
<p>Example:</p>
<pre><code class="language-clojure">user&gt; (def ds (ds/-&gt;dataset {:a [0 1 2 3] :b [1 2 3 4]}))
#'user/ds
user&gt; ds
_unnamed [4 2]:
| :a | :b |
|---:|---:|
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |
| 3 | 4 |
user&gt; (ds/induction ds (fn [ds]
{:sum-of-previous-row (dfn/sum (ds/rowvec-at ds -1))
:sum-a (dfn/sum (ds :a))
:sum-b (dfn/sum (ds :b))}))
_unnamed [4 5]:
| :a | :b | :sum-b | :sum-a | :sum-of-previous-row |
|---:|---:|-------:|-------:|---------------------:|
| 0 | 1 | 0.0 | 0.0 | 0.0 |
| 1 | 2 | 1.0 | 0.0 | 1.0 |
| 2 | 3 | 3.0 | 1.0 | 5.0 |
| 3 | 4 | 6.0 | 3.0 | 14.0 |
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L576">view source</a></div></div><div class="public anchor" id="var-inference-column.3F"><h3>inference-column?</h3><div class="usage"><code>(inference-column?)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L616">view source</a></div></div><div class="public anchor" id="var-inference-target-column-names"><h3>inference-target-column-names</h3><div class="usage"><code>(inference-target-column-names)</code></div><div class="doc"><div class="markdown"><p>Return the names of the columns that are inference targets.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L621">view source</a></div></div><div class="public anchor" id="var-inference-target-ds"><h3>inference-target-ds</h3><div class="usage"><code>(inference-target-ds)</code></div><div class="doc"><div class="markdown"><p>Given a dataset return reverse-mapped inference target columns or nil
in the case where there are no inference targets.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L627">view source</a></div></div><div class="public anchor" id="var-inference-target-label-inverse-map"><h3>inference-target-label-inverse-map</h3><div class="usage"><code>(inference-target-label-inverse-map &amp; args)</code></div><div class="doc"><div class="markdown"><p>Given options generated during ETL operations and annotated with :label-columns
sequence container 1 label column, generate a reverse map that maps from a dataset
value back to the label that generated that value.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L634">view source</a></div></div><div class="public anchor" id="var-inference-target-label-map"><h3>inference-target-label-map</h3><div class="usage"><code>(inference-target-label-map &amp; args)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L642">view source</a></div></div><div class="public anchor" id="var-k-fold-datasets"><h3>k-fold-datasets</h3><div class="usage"><code>(k-fold-datasets k options)</code><code>(k-fold-datasets k)</code></div><div class="doc"><div class="markdown"><p>Given 1 dataset, prepary K datasets using the k-fold algorithm.
Randomize dataset defaults to true which will realize the entire dataset
so use with care if you have large datasets.</p>
<p>Returns a sequence of {:test-ds :train-ds}</p>
<p>Options:</p>
<ul>
<li><code>:randomize-dataset?</code> - When true, shuffle the dataset. In that case 'seed' may be
provided. Defaults to true.</li>
<li><code>:seed</code> - when <code>:randomize-dataset?</code> is true then this can either be an
implementation of java.util.Random or an integer seed which will be used to
construct java.util.Random.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L647">view source</a></div></div><div class="public anchor" id="var-labels"><h3>labels</h3><div class="usage"><code>(labels)</code></div><div class="doc"><div class="markdown"><p>Return the labels. The labels sequence is the reverse mapped inference
column. This returns a single column of data or errors out.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L667">view source</a></div></div><div class="public anchor" id="var-mapseq-reader"><h3>mapseq-reader</h3><div class="usage"><code>(mapseq-reader options)</code><code>(mapseq-reader)</code></div><div class="doc"><div class="markdown"><p>Return a reader that produces a map of column-name-&gt;column-value
upon read.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L674">view source</a></div></div><div class="public anchor" id="var-min-n-by-column"><h3>min-n-by-column</h3><div class="usage"><code>(min-n-by-column cname N comparator options)</code><code>(min-n-by-column cname N comparator)</code><code>(min-n-by-column cname N)</code></div><div class="doc"><div class="markdown"><p>Find the minimum N entries (unsorted) by column. Resulting data will be indexed in
original order. If you want a sorted order then sort the result.</p>
<p>See options to <a href="tech.v3.dataset.html#var-sort-by-column">sort-by-column</a>.</p>
<p>Example:</p>
<pre><code class="language-clojure">user&gt; (ds/min-n-by-column ds "price" 10 nil nil)
test/data/stocks.csv [10 3]:
| symbol | date | price |
|--------|------------|------:|
| AMZN | 2001-09-01 | 5.97 |
| AMZN | 2001-10-01 | 6.98 |
| AAPL | 2000-12-01 | 7.44 |
| AAPL | 2002-08-01 | 7.38 |
| AAPL | 2002-09-01 | 7.25 |
| AAPL | 2002-12-01 | 7.16 |
| AAPL | 2003-01-01 | 7.18 |
| AAPL | 2003-02-01 | 7.51 |
| AAPL | 2003-03-01 | 7.07 |
| AAPL | 2003-04-01 | 7.11 |
user&gt; (ds/min-n-by-column ds "price" 10 &gt; nil)
test/data/stocks.csv [10 3]:
| symbol | date | price |
|--------|------------|-------:|
| GOOG | 2007-09-01 | 567.27 |
| GOOG | 2007-10-01 | 707.00 |
| GOOG | 2007-11-01 | 693.00 |
| GOOG | 2007-12-01 | 691.48 |
| GOOG | 2008-01-01 | 564.30 |
| GOOG | 2008-04-01 | 574.29 |
| GOOG | 2008-05-01 | 585.80 |
| GOOG | 2009-11-01 | 583.00 |
| GOOG | 2009-12-01 | 619.98 |
| GOOG | 2010-03-01 | 560.19 |
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L683">view source</a></div></div><div class="public anchor" id="var-missing"><h3>missing</h3><div class="usage"><code>(missing)</code></div><div class="doc"><div class="markdown"><p>Given a dataset or a column, return the missing set as a roaring bitmap</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L731">view source</a></div></div><div class="public anchor" id="var-model-type"><h3>model-type</h3><div class="usage"><code>(model-type &amp; args)</code></div><div class="doc"><div class="markdown"><p>Check the label column after dataset processing.
Return either
:regression
:classification</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L737">view source</a></div></div><div class="public anchor" id="var-new-column"><h3>new-column</h3><div class="usage"><code>(new-column data)</code><code>(new-column data metadata)</code><code>(new-column data metadata missing)</code><code>(new-column)</code></div><div class="doc"><div class="markdown"><p>Create a new column. Data will scanned for missing values
unless the full 4-argument pathway is used.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L746">view source</a></div></div><div class="public anchor" id="var-new-dataset"><h3>new-dataset</h3><div class="usage"><code>(new-dataset ds-metadata column-seq)</code><code>(new-dataset column-seq)</code><code>(new-dataset)</code></div><div class="doc"><div class="markdown"><p>Create a new dataset from a sequence of columns. Data will be converted
into columns using ds-col-proto/ensure-column-seq. If the column seq is simply a
collection of vectors, for instance, columns will be named ordinally.
options map -
:dataset-name - Name of the dataset. Defaults to "_unnamed".
:key-fn - Key function used on all column names before insertion into dataset.</p>
<p>The return value fulfills the dataset protocols.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L759">view source</a></div></div><div class="public anchor" id="var-num-inference-classes"><h3>num-inference-classes</h3><div class="usage"><code>(num-inference-classes)</code></div><div class="doc"><div class="markdown"><p>Given a dataset and correctly built options from pipeline operations,
return the number of classes used for the label. Error if not classification
dataset.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L776">view source</a></div></div><div class="public anchor" id="var-order-column-names"><h3>order-column-names</h3><div class="usage"><code>(order-column-names colname-seq)</code></div><div class="doc"><div class="markdown"><p>Order a sequence of columns names so they match the order in the
original dataset. Missing columns are placed last.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L784">view source</a></div></div><div class="public anchor" id="var-pmap-ds"><h3>pmap-ds</h3><div class="usage"><code>(pmap-ds ds-map-fn options)</code><code>(pmap-ds ds-map-fn)</code></div><div class="doc"><div class="markdown"><p>Parallelize mapping a function from dataset-&gt;dataset across a single dataset. Results are
coalesced back into a single dataset. The original dataset is simple sliced into n-core
results and map-fn is called n-core times. ds-map-fn must be a function from
dataset-&gt;dataset although it may return nil.</p>
<p>Options:</p>
<ul>
<li><code>:max-batch-size</code> - this is a default for tech.v3.parallel.for/indexed-map-reduce. You
can control how many rows are processed in a given batch - the default is 64000. If your
mapping pathway produces a large expansion in the size of the dataset then it may be
good to reduce the max batch size and use :as-seq to produce a sequence of datasets.</li>
<li><code>:result-type</code>
<ul>
<li><code>:as-seq</code> - Return a sequence of datasets, one for each batch.</li>
<li><code>:as-ds</code> - Return a single datasets with all results in memory (default option).</li>
</ul>
</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L791">view source</a></div></div><div class="public anchor" id="var-print-all"><h3>print-all</h3><div class="usage"><code>(print-all)</code></div><div class="doc"><div class="markdown"><p>Helper function equivalent to <code>(tech.v3.dataset.print/print-range ... :all)</code></p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L812">view source</a></div></div><div class="public anchor" id="var-probability-distributions-.3Elabel-column"><h3>probability-distributions-&gt;label-column</h3><div class="usage"><code>(probability-distributions-&gt;label-column dst-colname label-column-datatype)</code><code>(probability-distributions-&gt;label-column dst-colname)</code></div><div class="doc"><div class="markdown"><p>Given a dataset that has columns in which the column names describe labels and the
rows describe a probability distribution, create a label column by taking the max
value in each row and assign column that row value.
Creates a categorical label column which has a catgeorical map in its meta.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L818">view source</a></div></div><div class="public anchor" id="var-rand-nth"><h3>rand-nth</h3><div class="usage"><code>(rand-nth)</code></div><div class="doc"><div class="markdown"><p>Return a random row from the dataset in map format</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L830">view source</a></div></div><div class="public anchor" id="var-remove-column"><h3>remove-column</h3><div class="usage"><code>(remove-column col-name)</code></div><div class="doc"><div class="markdown"><p>Same as:</p>
<pre><code class="language-clojure">(dissoc dataset col-name)
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L836">view source</a></div></div><div class="public anchor" id="var-remove-columns"><h3>remove-columns</h3><div class="usage"><code>(remove-columns colname-seq-or-fn)</code></div><div class="doc"><div class="markdown"><p>Remove columns indexed by column name seq or column filter function.
For example:</p>
<pre><code class="language-clojure"> (remove-columns DS [:A :B])
(remove-columns DS cf/categorical)
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L846">view source</a></div></div><div class="public anchor" id="var-remove-empty-columns"><h3>remove-empty-columns</h3><div class="usage"><code>(remove-empty-columns)</code></div><div class="doc"><div class="markdown"><p>Remove all columns that have no data - missing set length equals row count.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L858">view source</a></div></div><div class="public anchor" id="var-remove-rows"><h3>remove-rows</h3><div class="usage"><code>(remove-rows row-indexes)</code></div><div class="doc"><div class="markdown"><p>Same as drop-rows.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L864">view source</a></div></div><div class="public anchor" id="var-rename-columns"><h3>rename-columns</h3><div class="usage"><code>(rename-columns colnames)</code></div><div class="doc"><div class="markdown"><p>Rename columns using a map or vector of column names.</p>
<p>Does not reorder columns; rename is in-place for maps and
positional for vectors.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L870">view source</a></div></div><div class="public anchor" id="var-replace-missing"><h3>replace-missing</h3><div class="usage"><code>(replace-missing)</code><code>(replace-missing strategy)</code><code>(replace-missing columns-selector strategy)</code><code>(replace-missing columns-selector strategy value)</code></div><div class="doc"><div class="markdown"><p>Replace missing values in some columns with a given strategy.
The columns selector may be:</p>
<ul>
<li>seq of any legal column names</li>
<li>or a column filter function, such as <code>numeric</code> and <code>categorical</code></li>
</ul>
<p>Strategies may be:</p>
<ul>
<li>
<p><code>:down</code> - take value from previous non-missing row if possible else use provided value.</p>
</li>
<li>
<p><code>:up</code> - take value from next non-missing row if possible else use provided value.</p>
</li>
<li>
<p><code>:downup</code> - take value from previous if possible else use next.</p>
</li>
<li>
<p><code>:updown</code> - take value from next if possible else use previous.</p>
</li>
<li>
<p><code>:nearest</code> - Use nearest of next or previous values. <code>:mid</code> is an alias for <code>:nearest</code>.</p>
</li>
<li>
<p><code>:midpoint</code> - Use midpoint of averaged values between previous and next nonmissing
rows.</p>
</li>
<li>
<p><code>:abb</code> - Impute missing with approximate bayesian bootstrap. See <a href="https://search.r-project.org/CRAN/refmans/LaplacesDemon/html/ABB.html">r's ABB</a>.</p>
</li>
<li>
<p><code>:lerp</code> - Linearly interpolate values between previous and next nonmissing rows.</p>
</li>
<li>
<p><code>:value</code> - Value will be provided - see below.</p>
<p>value may be provided which will then be used. Value may be a function in which
case it will be called on the column with missing values elided and the return will
be used to as the filler.</p>
</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L879">view source</a></div></div><div class="public anchor" id="var-replace-missing-value"><h3>replace-missing-value</h3><div class="usage"><code>(replace-missing-value filter-fn-or-ds scalar-value)</code><code>(replace-missing-value scalar-value)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L912">view source</a></div></div><div class="public anchor" id="var-reverse-rows"><h3>reverse-rows</h3><div class="usage"><code>(reverse-rows)</code></div><div class="doc"><div class="markdown"><p>Reverse the rows in the dataset or column.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L919">view source</a></div></div><div class="public anchor" id="var-row-at"><h3>row-at</h3><div class="usage"><code>(row-at idx)</code></div><div class="doc"><div class="markdown"><p>Get the row at an individual index. If indexes are negative then the dataset
is indexed from the end.</p>
<pre><code class="language-clojure">user&gt; (ds/row-at stocks 1)
{"date" #object[java.time.LocalDate 0x534cb03b "2000-02-01"],
"symbol" "MSFT",
"price" 36.35}
user&gt; (ds/row-at stocks -1)
{"date" #object[java.time.LocalDate 0x6bf60ed5 "2010-03-01"],
"symbol" "AAPL",
"price" 223.02}
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L925">view source</a></div></div><div class="public anchor" id="var-row-count"><h3>row-count</h3><div class="usage"><code>(row-count)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L943">view source</a></div></div><div class="public anchor" id="var-row-map"><h3>row-map</h3><div class="usage"><code>(row-map map-fn options)</code><code>(row-map map-fn)</code></div><div class="doc"><div class="markdown"><p>Map a function across the rows of the dataset producing a new dataset
that is merged back into the original potentially replacing existing columns.
Options are passed into the <a href="tech.v3.dataset.html#var--.3Edataset">-&gt;dataset</a> function so you can control the resulting
column types by the usual dataset parsing options described there.</p>
<p>Options:</p>
<p>See options for <a href="tech.v3.dataset.html#var-pmap-ds">pmap-ds</a>. In particular, note that you can
produce a sequence of datasets as opposed to a single large dataset.</p>
<p>Speed demons should attempt both <code>{:copying? false}</code> and <code>{:copying? true}</code> in the options
map as that changes rather drastically how data is read from the datasets. If you are
going to read all the data in the dataset, <code>{:copying? true}</code> will most likely be
the faster of the two.</p>
<p>Examples:</p>
<pre><code class="language-clojure">user&gt; (def stocks (ds/-&gt;dataset "test/data/stocks.csv"))
#'user/stocks
user&gt; (ds/head stocks)
test/data/stocks.csv [5 3]:
| symbol | date | price |
|--------|------------|------:|
| MSFT | 2000-01-01 | 39.81 |
| MSFT | 2000-02-01 | 36.35 |
| MSFT | 2000-03-01 | 43.22 |
| MSFT | 2000-04-01 | 28.37 |
| MSFT | 2000-05-01 | 25.45 |
user&gt; (ds/head (ds/row-map stocks (fn [row]
{"symbol" (keyword (row "symbol"))
:price2 (* (row "price")(row "price"))})))
test/data/stocks.csv [5 4]:
| symbol | date | price | :price2 |
|--------|------------|------:|----------:|
| :MSFT | 2000-01-01 | 39.81 | 1584.8361 |
| :MSFT | 2000-02-01 | 36.35 | 1321.3225 |
| :MSFT | 2000-03-01 | 43.22 | 1867.9684 |
| :MSFT | 2000-04-01 | 28.37 | 804.8569 |
| :MSFT | 2000-05-01 | 25.45 | 647.7025 |
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L948">view source</a></div></div><div class="public anchor" id="var-row-mapcat"><h3>row-mapcat</h3><div class="usage"><code>(row-mapcat mapcat-fn options)</code><code>(row-mapcat mapcat-fn)</code></div><div class="doc"><div class="markdown"><p>Map a function across the rows of the dataset. The function must produce a sequence of
maps and the original dataset rows will be duplicated and then merged into the result
of calling (-&gt;&gt; (apply concat) (-&gt;&gt;dataset options) on the result of <code>mapcat-fn</code>. Options
are the same as <a href="tech.v3.dataset.html#var--.3Edataset">-&gt;dataset</a>.</p>
<p>The smaller the maps returned from mapcat-fn the better, perhaps consider using records.
In the case that a mapcat-fn result map has a key that overlaps a column name the
column will be replaced with the output of mapcat-fn. The returned map will have the
key <code>:_row-id</code> assoc'd onto it so for absolutely minimal gc usage include this
as a member variable in your map.</p>
<p>Options:</p>
<ul>
<li>See options for <a href="tech.v3.dataset.html#var-pmap-ds">pmap-ds</a>. Especially note <code>:max-batch-size</code> and <code>:result-type</code>.
In order to conserve memory it may be much more efficient to return a sequence of datasets
rather than one large dataset. If returning sequences of datasets perhaps consider
a transducing pathway across them or the <a href="tech.v3.dataset.reductions.html">tech.v3.dataset.reductions</a> namespace.</li>
</ul>
<p>Example:</p>
<pre><code class="language-clojure">user&gt; (def ds (ds/-&gt;dataset {:rid (range 10)
:data (repeatedly 10 #(rand-int 3))}))
#'user/ds
user&gt; (ds/head ds)
_unnamed [5 2]:
| :rid | :data |
|-----:|------:|
| 0 | 0 |
| 1 | 2 |
| 2 | 0 |
| 3 | 1 |
| 4 | 2 |
user&gt; (def mapcat-fn (fn [row]
(for [idx (range (row :data))]
{:idx idx})))
#'user/mapcat-fn
user&gt; (mapcat mapcat-fn (ds/rows ds))
({:idx 0} {:idx 1} {:idx 0} {:idx 0} {:idx 1} {:idx 0} {:idx 1} {:idx 0} {:idx 1})
user&gt; (ds/row-mapcat ds mapcat-fn)
_unnamed [9 3]:
| :rid | :data | :idx |
|-----:|------:|-----:|
| 1 | 2 | 0 |
| 1 | 2 | 1 |
| 3 | 1 | 0 |
| 4 | 2 | 0 |
| 4 | 2 | 1 |
| 6 | 2 | 0 |
| 6 | 2 | 1 |
| 8 | 2 | 0 |
| 8 | 2 | 1 |
user&gt;
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L999">view source</a></div></div><div class="public anchor" id="var-rows"><h3>rows</h3><div class="usage"><code>(rows options)</code><code>(rows)</code></div><div class="doc"><div class="markdown"><p>Get the rows of the dataset as a list of potentially flyweight maps.</p>
<p>Options:</p>
<ul>
<li>copying? - When true the data is copied out of the dataset row by row upon read of that
row. When false the data is only referenced upon each read of a particular key. Copying
is appropriate if you want to use the row values as keys a map and it is inappropriate if
you are only going to read a very small portion of the row map.</li>
<li>nil-missing? - When true, maps returned have nil values for missing entries as opposed
to eliding the missing keys entirely. It is legacy behavior and slightly faster to
use <code>:nil-missing? true</code>.</li>
</ul>
<pre><code class="language-clojure">user&gt; (take 5 (ds/rows stocks))
({"date" #object[java.time.LocalDate 0x6c433971 "2000-01-01"],
"symbol" "MSFT",
"price" 39.81}
{"date" #object[java.time.LocalDate 0x28f96b14 "2000-02-01"],
"symbol" "MSFT",
"price" 36.35}
{"date" #object[java.time.LocalDate 0x7bdbf0a "2000-03-01"],
"symbol" "MSFT",
"price" 43.22}
{"date" #object[java.time.LocalDate 0x16d3871e "2000-04-01"],
"symbol" "MSFT",
"price" 28.37}
{"date" #object[java.time.LocalDate 0x47094da0 "2000-05-01"],
"symbol" "MSFT",
"price" 25.45})
user&gt; (ds/rows (ds/-&gt;dataset [{:a 1 :b 2} {:a 2} {:b 3}]))
[{:a 1, :b 2} {:a 2} {:b 3}]
user&gt; (ds/rows (ds/-&gt;dataset [{:a 1 :b 2} {:a 2} {:b 3}]) {:nil-missing? true})
[{:a 1, :b 2} {:a 2, :b nil} {:a nil, :b 3}]
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1062">view source</a></div></div><div class="public anchor" id="var-rowvec-at"><h3>rowvec-at</h3><div class="usage"><code>(rowvec-at idx)</code></div><div class="doc"><div class="markdown"><p>Return a persisent-vector-like row at a given index. Negative indexes index
from the end.</p>
<pre><code class="language-clojure">user&gt; (ds/rowvec-at stocks 1)
["MSFT" #object[java.time.LocalDate 0x5848b8b3 "2000-02-01"] 36.35]
user&gt; (ds/rowvec-at stocks -1)
["AAPL" #object[java.time.LocalDate 0x4b70b0d5 "2010-03-01"] 223.02]
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1106">view source</a></div></div><div class="public anchor" id="var-rowvecs"><h3>rowvecs</h3><div class="usage"><code>(rowvecs options)</code><code>(rowvecs)</code></div><div class="doc"><div class="markdown"><p>Return a randomly addressable list of rows in persistent vector-like form.</p>
<p>Options:</p>
<ul>
<li>copying? - When true the data is copied out of the dataset row by row upon read of that
row. When false the data is only referenced upon each read of a particular key. Copying
is appropriate if you want to use the row values as keys a map and it is inappropriate if
you are only going to read a given key for a given row once.</li>
</ul>
<pre><code class="language-clojure">user&gt; (take 5 (ds/rowvecs stocks))
(["MSFT" #object[java.time.LocalDate 0x5be9e4c8 "2000-01-01"] 39.81]
["MSFT" #object[java.time.LocalDate 0xf758e5 "2000-02-01"] 36.35]
["MSFT" #object[java.time.LocalDate 0x752cc84d "2000-03-01"] 43.22]
["MSFT" #object[java.time.LocalDate 0x7bad4827 "2000-04-01"] 28.37]
["MSFT" #object[java.time.LocalDate 0x3a62c34a "2000-05-01"] 25.45])
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1120">view source</a></div></div><div class="public anchor" id="var-sample"><h3>sample</h3><div class="usage"><code>(sample n options)</code><code>(sample n)</code><code>(sample)</code></div><div class="doc"><div class="markdown"><p>Sample n-rows from a dataset. Defaults to sampling <em>without</em> replacement.</p>
<p>For the definition of seed, see the argshuffle documentation](<a href="https://cnuernber.github.io/dtype-next/tech.v3.datatype.argops.html#var-argshuffle">https://cnuernber.github.io/dtype-next/tech.v3.datatype.argops.html#var-argshuffle</a>)</p>
<p>The returned dataset's metadata is altered merging <code>{:print-index-range (range n)}</code> in so you
will always see the entire returned dataset. If this isn't desired, <code>vary-meta</code> a good pathway.</p>
<p>Options:</p>
<ul>
<li><code>:replacement?</code> - Do sampling with replacement. Defaults to false.</li>
<li><code>:seed</code> - Provide a seed as a number or provide a Random implementation.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1144">view source</a></div></div><div class="public anchor" id="var-select"><h3>select</h3><div class="usage"><code>(select colname-seq selection)</code></div><div class="doc"><div class="markdown"><p>Reorder/trim dataset according to this sequence of indexes. Returns a new dataset.
colname-seq - one of:</p>
<ul>
<li>:all - all the columns</li>
<li>sequence of column names - those columns in that order.</li>
<li>implementation of java.util.Map - column order is dictate by map iteration order
selected columns are subsequently named after the corresponding value in the map.
similar to <code>rename-columns</code> except this trims the result to be only the columns
in the map.
selection - either keyword :all, a list of indexes to select, or a list of booleans where
the index position of each true value indicates an index to select. When providing indices,
duplicates will select the specified index position more than once.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1164">view source</a></div></div><div class="public anchor" id="var-select-by-index"><h3>select-by-index</h3><div class="usage"><code>(select-by-index col-index row-index)</code></div><div class="doc"><div class="markdown"><p>Trim dataset according to this sequence of indexes. Returns a new dataset.</p>
<p>col-index and row-index - one of:</p>
<ul>
<li>:all - all the columns</li>
<li>list of indexes. May contain duplicates. Negative values will be counted from
the end of the sequence.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1181">view source</a></div></div><div class="public anchor" id="var-select-columns"><h3>select-columns</h3><div class="usage"><code>(select-columns colname-seq-or-fn)</code></div><div class="doc"><div class="markdown"><p>Select columns from the dataset by:</p>
<ul>
<li>seq of column names</li>
<li>column selector function</li>
<li><code>:all</code> keyword</li>
</ul>
<p>For example:</p>
<pre><code class="language-clojure">(select-columns DS [:A :B])
(select-columns DS cf/numeric)
(select-columns DS :all)
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1193">view source</a></div></div><div class="public anchor" id="var-select-columns-by-index"><h3>select-columns-by-index</h3><div class="usage"><code>(select-columns-by-index col-index)</code></div><div class="doc"><div class="markdown"><p>Select columns from the dataset by seq of index(includes negative) or :all.</p>
<p>See documentation for <code>select-by-index</code>.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1211">view source</a></div></div><div class="public anchor" id="var-select-missing"><h3>select-missing</h3><div class="usage"><code>(select-missing)</code></div><div class="doc"><div class="markdown"><p>Remove missing entries by simply selecting out the missing indexes</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1219">view source</a></div></div><div class="public anchor" id="var-select-rows"><h3>select-rows</h3><div class="usage"><code>(select-rows row-indexes options)</code><code>(select-rows row-indexes)</code></div><div class="doc"><div class="markdown"><p>Select rows from the dataset or column.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1225">view source</a></div></div><div class="public anchor" id="var-set-dataset-name"><h3>set-dataset-name</h3><div class="usage"><code>(set-dataset-name ds-name)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1233">view source</a></div></div><div class="public anchor" id="var-set-inference-target"><h3>set-inference-target</h3><div class="usage"><code>(set-inference-target target-name-or-target-name-seq)</code></div><div class="doc"><div class="markdown"><p>Set the inference target on the column. This sets the :column-type member
of the column metadata to :inference-target?.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1238">view source</a></div></div><div class="public anchor" id="var-shape"><h3>shape</h3><div class="usage"><code>(shape)</code></div><div class="doc"><div class="markdown"><p>Returns shape in column-major format of <a href="n-columns n-rows">n-columns n-rows</a>.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1245">view source</a></div></div><div class="public anchor" id="var-shuffle"><h3>shuffle</h3><div class="usage"><code>(shuffle options)</code><code>(shuffle)</code></div><div class="doc"><div class="markdown"><p>Shuffle the rows of the dataset optionally providing a seed.
See <a href="https://cnuernber.github.io/dtype-next/tech.v3.datatype.argops.html#var-argshuffle">https://cnuernber.github.io/dtype-next/tech.v3.datatype.argops.html#var-argshuffle</a>.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1251">view source</a></div></div><div class="public anchor" id="var-sort-by"><h3>sort-by</h3><div class="usage"><code>(sort-by key-fn compare-fn &amp; args)</code><code>(sort-by key-fn)</code></div><div class="doc"><div class="markdown"><p>Sort a dataset by a key-fn and compare-fn.</p>
<ul>
<li><code>key-fn</code> - function from map to sort value.</li>
<li><code>compare-fn</code> may be one of:
<ul>
<li>a clojure operator like clojure.core/&lt;</li>
<li><code>:tech.numerics/&lt;</code>, <code>:tech.numerics/&gt;</code> for unboxing comparisons of primitive
values.</li>
<li>clojure.core/compare</li>
<li>A custom java.util.Comparator instantiation.</li>
</ul>
</li>
</ul>
<p>Options:</p>
<ul>
<li><code>:nan-strategy</code> - General missing strategy. Options are <code>:first</code>, <code>:last</code>, and
<code>:exception</code>.</li>
<li><code>:parallel?</code> - Uses parallel quicksort when true and regular quicksort when false.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1260">view source</a></div></div><div class="public anchor" id="var-sort-by-column"><h3>sort-by-column</h3><div class="usage"><code>(sort-by-column colname compare-fn &amp; args)</code><code>(sort-by-column colname)</code></div><div class="doc"><div class="markdown"><p>Sort a dataset by a given column using the given compare fn.</p>
<ul>
<li><code>compare-fn</code> may be one of:
<ul>
<li>a clojure operator like clojure.core/&lt;</li>
<li><code>:tech.numerics/&lt;</code>, <code>:tech.numerics/&gt;</code> for unboxing comparisons of primitive
values.</li>
<li>clojure.core/compare</li>
<li>A custom java.util.Comparator instantiation.</li>
</ul>
</li>
</ul>
<p>Options:</p>
<ul>
<li><code>:nan-strategy</code> - General missing strategy. Options are <code>:first</code>, <code>:last</code>, and
<code>:exception</code>.</li>
<li><code>:parallel?</code> - Uses parallel quicksort when true and regular quicksort when false.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1282">view source</a></div></div><div class="public anchor" id="var-tail"><h3>tail</h3><div class="usage"><code>(tail n)</code><code>(tail)</code></div><div class="doc"><div class="markdown"><p>Get the last n rows of a dataset. Equivalent to
`(select-rows ds (range ...)). Argument order is dataset-last, however, so this can
be used in -&gt;&gt; operators.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1303">view source</a></div></div><div class="public anchor" id="var-take-nth"><h3>take-nth</h3><div class="usage"><code>(take-nth n-val)</code></div><div class="doc"><div class="markdown"></div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1313">view source</a></div></div><div class="public anchor" id="var-train-test-split"><h3>train-test-split</h3><div class="usage"><code>(train-test-split options)</code><code>(train-test-split)</code></div><div class="doc"><div class="markdown"><p>Probabilistically split the dataset returning a map of <code>{:train-ds :test-ds}</code>.</p>
<p>Options:</p>
<ul>
<li><code>:randomize-dataset?</code> - When true, shuffle the dataset. In that case 'seed' may be
provided. Defaults to true.</li>
<li><code>:seed</code> - when <code>:randomize-dataset?</code> is true then this can either be an
implementation of java.util.Random or an integer seed which will be used to
construct java.util.Random.</li>
<li><code>:train-fraction</code> - Fraction of the dataset to use as training set. Defaults to
0.7.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1318">view source</a></div></div><div class="public anchor" id="var-unique-by"><h3>unique-by</h3><div class="usage"><code>(unique-by options map-fn)</code><code>(unique-by map-fn)</code></div><div class="doc"><div class="markdown"><p>Map-fn function gets passed map for each row, rows are grouped by the
return value. Keep-fn is used to decide the index to keep.</p>
<p>:keep-fn - Function from key,idx-seq-&gt;idx. Defaults to #(first %2).</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1336">view source</a></div></div><div class="public anchor" id="var-unique-by-column"><h3>unique-by-column</h3><div class="usage"><code>(unique-by-column options colname)</code><code>(unique-by-column colname)</code></div><div class="doc"><div class="markdown"><p>Map-fn function gets passed map for each row, rows are grouped by the
return value. Keep-fn is used to decide the index to keep.</p>
<p>:keep-fn - Function from key, idx-seq-&gt;idx. Defaults to #(first %2).</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1347">view source</a></div></div><div class="public anchor" id="var-unordered-select"><h3>unordered-select</h3><div class="usage"><code>(unordered-select colname-seq index-seq)</code></div><div class="doc"><div class="markdown"><p>Perform a selection but use the order of the columns in the existing table; do
<em>not</em> reorder the columns based on colname-seq. Useful when doing selection based
on sets or persistent hash maps.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1358">view source</a></div></div><div class="public anchor" id="var-unroll-column"><h3>unroll-column</h3><div class="usage"><code>(unroll-column column-name)</code><code>(unroll-column column-name options)</code></div><div class="doc"><div class="markdown"><p>Unroll a column that has some (or all) sequential data as entries.
Returns a new dataset with same columns but with other columns duplicated
where the unroll happened. Column now contains only scalar data.</p>
<p>Any missing indexes are dropped.</p>
<pre><code class="language-clojure">user&gt; (-&gt; (ds/-&gt;dataset [{:a 1 :b [2 3]}
{:a 2 :b [4 5]}
{:a 3 :b :a}])
(ds/unroll-column :b {:indexes? true}))
_unnamed [5 3]:
| :a | :b | :indexes |
|----+----+----------|
| 1 | 2 | 0 |
| 1 | 3 | 1 |
| 2 | 4 | 0 |
| 2 | 5 | 1 |
| 3 | :a | 0 |
</code></pre>
<p>Options -
:datatype - datatype of the resulting column if one aside from :object is desired.
:indexes? - If true, create a new column that records the indexes of the values from
the original column. Can also be a truthy value (like a keyword) and the column
will be named this.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1366">view source</a></div></div><div class="public anchor" id="var-update"><h3>update</h3><div class="usage"><code>(update filter-fn-or-ds update-fn &amp; args)</code></div><div class="doc"><div class="markdown"><p>Update this dataset. Filters this dataset into a new dataset,
applies update-fn, then merges the result into original dataset.</p>
<p>This pathways is designed to work with the tech.v3.dataset.column-filters namespace.</p>
<ul>
<li><code>filter-fn-or-ds</code> is a generalized parameter. May be a function,
a dataset or a sequence of column names.</li>
<li>update-fn must take the dataset as the first argument and must return
a dataset.</li>
</ul>
<pre><code class="language-clojure">(ds/bind-&gt; (ds/-&gt;dataset dataset) ds
(ds/remove-column "Id")
(ds/update cf/string ds/replace-missing-value "NA")
(ds/update-elemwise cf/string #(get {"" "NA"} % %))
(ds/update cf/numeric ds/replace-missing-value 0)
(ds/update cf/boolean ds/replace-missing-value false)
(ds/update-columnwise (cf/union (cf/numeric ds) (cf/boolean ds))
#(dtype/elemwise-cast % :float64)))
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1400">view source</a></div></div><div class="public anchor" id="var-update-column"><h3>update-column</h3><div class="usage"><code>(update-column col-name update-fn)</code></div><div class="doc"><div class="markdown"><p>Update a column returning a new dataset. update-fn is a column-&gt;column
transformation. Error if column does not exist.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1426">view source</a></div></div><div class="public anchor" id="var-update-columns"><h3>update-columns</h3><div class="usage"><code>(update-columns column-name-seq-or-fn update-fn)</code></div><div class="doc"><div class="markdown"><p>Update a sequence of columns selected by column name seq or column selector
function.</p>
<p>For example:</p>
<pre><code class="language-clojure">(update-columns DS [:A :B] #(dfn/+ % 2))
(update-columns DS cf/numeric #(dfn// % 2))
</code></pre>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1433">view source</a></div></div><div class="public anchor" id="var-update-columnwise"><h3>update-columnwise</h3><div class="usage"><code>(update-columnwise filter-fn-or-ds cwise-update-fn &amp; args)</code></div><div class="doc"><div class="markdown"><p>Call update-fn on each column of the dataset. Returns the dataset.
See arguments to update</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1447">view source</a></div></div><div class="public anchor" id="var-update-elemwise"><h3>update-elemwise</h3><div class="usage"><code>(update-elemwise filter-fn-or-ds map-fn)</code><code>(update-elemwise map-fn)</code></div><div class="doc"><div class="markdown"><p>Replace all elements in selected columns by calling selected function on each
element. column-name-seq must be a sequence of column names if provided.
filter-fn-or-ds has same rules as update. Implicitly clears the missing set so
function must deal with type-specific missing values correctly.
Returns new dataset</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1454">view source</a></div></div><div class="public anchor" id="var-value-reader"><h3>value-reader</h3><div class="usage"><code>(value-reader options)</code><code>(value-reader)</code></div><div class="doc"><div class="markdown"><p>Return a reader that produces a reader of column values per index.
Options:
:copying? - Default to false - When true row values are copied on read.</p>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1466">view source</a></div></div><div class="public anchor" id="var-write.21"><h3>write!</h3><div class="usage"><code>(write! output-path options)</code><code>(write! output-path)</code></div><div class="doc"><div class="markdown"><p>Write a dataset out to a file. Supported forms are:</p>
<pre><code class="language-clojure">(ds/write! test-ds "test.csv")
(ds/write! test-ds "test.tsv")
(ds/write! test-ds "test.tsv.gz")
(ds/write! test-ds "test.nippy")
(ds/write! test-ds out-stream)
</code></pre>
<p>Options:</p>
<ul>
<li><code>:max-chars-per-column</code> - csv,tsv specific, defaults to 65536 - values longer than this will
cause an exception during serialization.</li>
<li><code>:max-num-columns</code> - csv,tsv specific, defaults to 8192 - If the dataset has more than this number of
columns an exception will be thrown during serialization.</li>
<li><code>:quoted-columns</code> - csv specific - sequence of columns names that you would like to always have quoted.</li>
<li><code>:file-type</code> - Manually specify the file type. This is usually inferred from the filename but if you
pass in an output stream then you will need to specify the file type.</li>
<li><code>:headers?</code> - if csv headers are written, defaults to true.</li>
</ul>
</div></div><div class="src-link"><a href="https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/dataset/metamorph.clj#L1476">view source</a></div></div></div></body></html>