df-research/tablecloth/CHANGELOG.md

# Change Log

## [7.042]

* Deps updated
* added fn 'map-column->columns' ([#178])(https://github.com/scicloj/tablecloth/issues/178)

## [7.029]

### Added

* `reorder-columns` can work on grouped dataset now

### Fixed

* arrays of 2 element arrays behave as expected on dataset creation ([#142](https://github.com/scicloj/tablecloth/issues/142))

## [7.021]

Deps updated

Documentation changed to be generated by Clay instead of RMarkdown

## [7.017]

### Fixed

* semi and anti joins fail on table containing missing values, multi columns and duplicated rows

## [7.014]

Deps updated to fix `j/left-join` issue.

## [7.012]

### Fixed

* join columns should consider `nil` as missing value only, [discussion](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/tablecloth.20join-columns.20is.20nil-ing.20false.3F.20values)
* `:nil-missing?` in more places needed (group-by operations), [discussion](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/tablecloth.20group.20operations.20dropping.20a.20nil.20group-by.20column)
* changes to the `group-by` documentation [PR115](https://github.com/scicloj/tablecloth/pull/115), thanks to [Marshall](https://github.com/mars0i)
* reflection warning for `Collections/shuffle` removed

## [7.007]

### Added

* Extened documentation for `dataset` (copied from TMD), [#112](https://github.com/scicloj/tablecloth/issues/112)

### Changed

* `rows` accepts `:nil-missing?`(default: true) and `copying?`(default: false) options.

## [7.000-beta-51]

Deps updated

## [7.000-beta-50.2]

### Added

* `:hashing` is available for single column joins too

## [7.000-beta-50.1]

### Added

* `:hashing` option determines method of creating an index for multicolumn joins (was `hash` is `identity`)

### Fixed

* [#108](https://github.com/scicloj/tablecloth/issues/108) - hashing replaced with packing data into the  sequence

## [7.000-beta-50]

Deps updated

## [7.000-beta-38]

### Fixed

* dataset from singleton creation generated from wrong structure

## [7.000-beta-27]

### Added

* `map-rows` to map each row and produce new columns
* `rows` can return sequence of vectors (`:as-vecs`)

### Fixed

* balanced k-fold partitioning as proposed in [#92](https://github.com/scicloj/tablecloth/issues/92) by @behrica

## [7.000-beta-16]

Updated to TMD v7

Differences:

* the order of columns is persisted in more cases
* the order of groups in grouped dataset can be random

## [6.103.1]

### Added

* doc strings for every funcitons, #87, #88
* aggregate-columns should default to all columns when called without a column selector #91
* create functions for packing / unpacking columns to arrays #82

### Changed

* [breaking] when dataset file do not exists throw an exception #84, #85

## [6.103]

Clojure upgraded to 1.11.1

### Added

* `separate-column` infers column names when function is used and `target-columns` is `nil`, [#78](https://github.com/scicloj/tablecloth/issues/78)

### Changed

* [breaking][minor] `separate-column` repleces source column with target on every case

## [6.102]

### Fixed

* replace `clojure.core/pmap` with `dtype-next` version (related to [#325](https://github.com/techascent/tech.ml.dataset/issues/325))

## [6.101]

### Added

`get-entry` introduced

## [6.094.1]

### Fixed

* [#77] `anti-join` and `semi-join` bugs when tables contain missing values

## [6.094]

### Added

* `crosstab` - cross tabulation
* `pivot->longer` `:coerce-to-number` option added

### Changed

* [breaking] `pivot->wider` no longer coerces column names to strings, it's up to user

### Fixed

* predicates should behave as in Clojure ([discussion](https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/tablecloth.2Fselect-rows.20predicate.20requirements.3F))

## [6.090]

TMD version bump

### Changed

[breaking]

`replace-missing` up/down strategies clarified. `:down` is replaced by `:downup` and `:up` is replaced by `:updown`. `:down` and `:up` work only in one direction now.

https://github.com/techascent/tech.ml.dataset/issues/305

## [6.088.1]

### Fixed

* Wrong way of selecting columns for joins (shouldn't be a set). https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/complete.20ala.20R/near/286277344

## [6.088]

### Added

* `data frame` term in the title of docs ([discussion](https://app.slack.com/client/T03RZGPFR/threads/thread/C03RZGPG3-1649857892.946909))
* joins can accept different names for left/right datasets
* `cross-join`, `expand` and `complete` introduced

### Changed

* removed setting `*warn-on-reflection*`
* [breaking] creation of singleton dataset adds an error message as a column by default ([discussion](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/Empty.20csv.20regression.3F.3F))

## [6.076]

Version bump

### Added

* docstring for `unroll` and `fold-by` by @holyjak (#60 and #61)

## [6.051]

### Fixed

- [#58] - editor friendly api file

## [6.031]

### Fixed

- #57 - InputStream should be dispatched first (the flow now: tries to create a dataset and it fails packs an objet as a singleton

### Changed

- `select-rows` accepts `IFn` for row selection.
- [breaking] #54, #56 -  `pipeline` namespace is stripped, all functions are moved to [metamorph](https://github.com/scicloj/metamorph) library. This is temporary solution before removing this namespace completely. Pipelined versions of functions will be moved to metamorph as well later.

## [6.025]

### Added

* [#49] added docstring to `add-column`

### Fixed

* [#53] summary prefix ignored for aggregate (when fn[ds] is passed)

## [6.023]

### Added

* Documented columns / rows functions [PR52](https://github.com/scicloj/tablecloth/pull/52)
* Reference to original to lifted functions metadata for pipelines [PR51](https://github.com/scicloj/tablecloth/pull/51)

### Changed

* alias for api functions in reference (was: `api`, is: `tc`)

## [6.012]

### Fixed

* `replace-missing` on grouped dataset has swapped arguments

## [6.006]

### Fixed

* `update-columns` on grouped dataset

## [6.002]

### Changed

* [#43] Align with TMD for dataset creation from a map of sequences.
* [breaking] creation from tensor is `:as-rows` now

## [6.00-beta-16]

### Changed

* [#42] [breaking] `add-column` default strategy is `:strict` now.

## [6.00-beta-10]

### Fixed

* [#41] dataset name not set on tensor path

## [6.00-beta-7]

TMD upgrade, no changes in TC

## [5.17]

TMD upgrade

### Fixed

* [#36] `reorder-columns` on empty dataset returns nil

## [5.11]

### Fixed

* `aggregate-columns` didn't keep column order (#35)

## [5.05.1]

### Added

* `pipeline` functions have `doc` copied from original ones

## [5.05]

### Added

* `split` can turn off shuffling now (`:shuffle?` option)
* `split :holdouts` - sequence of consecutive holdouts

## [5.04]

tech.ml.dataset version bump, this introduces the change of the order of the groups after `group-by` operation

## [5.02]

### Added

* `split :holdout` supports any number of splits (minimum 2) [#28]
* `split` supports `split-names` to provide custom names for subdatasets
* `concat` and `concat-copying` are working with grouped datasets

### Fixed

* `kfold` split failed on small number of rows (due to `partition-all` behaviour

## [5.01]

### Added

* `split->seq` to return train/test splits as a sequence or datasets or as map of sequences for grouped datasets

### Changed

* [breaking] `tablecloth.pipeline` returns a map with dataset under `:metamorph/data` key (see [metamorph](https://github.com/scicloj/metamorph))
* [breaking] `split` returns now a dataset or grouped dataset with two new columns indicating train/test and split id. See `split->seq` for previous behaviour.

## [5.00-beta-29.1]

### Added

* `without-grouping->` threading macro which allows operations on grouping dataset treated as a regular one.

### Changed

* `group-by` accepts any java.util.Map for a collection of indexes (use LinkedHashMap to persist an order)
* some `tablecloth.api.group-by` functions moved to `tablecloth.api.utils`, no changes to API

## [5.00-beta-29]

### Changed

* `add-or-replace-column(s)` replaced by `add-column(s)` (`add-or-replace-column(s)` is marked as deprecated) (#16)

### Fixed

* `mark-as-group` wasn't visible in API (#18)
* `map-columns` didn't propagate `new-type` for grouped case (#20)
* broken links (#14) in readme

## [5.00-beta-28]

### Added

* `let-dataset` - to simulate `tibble` from R

### Fixed

* Adding a column to an empty dataset returned empty dataset

## [5.00-beta-27]

### Changed

* re-implementation of numerical arrays path dataset creation

## [5.00-beta-25]

### Added

* `rows` and `columns` new result: `:as-double-arrays` - convert rows to 2d double array
* dataset can be created from numerical arrays [discusson](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/dataset.20.3C-.3E.20jvm.20arrays)

### Fixed

* column from single value should create valid datatype (#10)

## [5.00-beta-21a]

### Added

* `tablecloth.pipeline` for pipeline operations

## [5.00-beta-21]

### Added

* `concat-copying` exposed.
* `split` function for splitting into train-test pairs with `:kfold`, `:bootstrap`, `:loo` and `holdout` strategies + stratified versions
* `replace-missing` with new strategy `:midpoint`

## [5.00-beta-5a]

### Fixed

* column names should keep order for provided names (#9)

## [5.00-beta-5]

t.m.d update

## [5.00-beta-3]

t.m.d update

### Changed

* contribution guide in readme

## [5.00-beta-2]

t.m.d update

### Changed

* `write-nippy!` and `read-nippy` are deprecated, replaced by `write!` and `dataset`

## [5.0-SNAPSHOT]

`tech.ml.dataset` version 5.0-alpha*

### Added

* `map-columns` accepts optional target datatype
* `ds/column->dataset` functionality introduced in `separate-column`
* more datatypes included for conversion (`:text` among others)

### Changed

* `write-csv!` replaced by `write!` (`write-csv!` is marked as deprecated)
* `info` field `:size` is replaced by `:n-elems`
* [breaking] `separate-column` 3-arity version accepts `separator` instead `target-columns` now

### Fixed

* do not skip 1-row DS when folding
* do not attempt to fold empty dataset

## [4.04]

`tech.ml.dataset` version 4.04

### Added

* tests: dataset

### Changed

* version number to match t.m.dataset version
* documentation:
    - gfm renderer for markdown

### Fixed

* code block language alignment fix in css

## [1.0.0-pre-alpha9]

`tech.ml.dataset` version 4.03

### Added

* some operations on grouped dataset can be parallel (`parallel?` option set to `true`). These are: `aggregate`, `unique-by`, `order-by`, `join-columns`, `separate-columns`, `ungroup`

### Fixed

* #2 - docs typo
* #3 - recover datatypes after ungrouping

### Changed

* `aggregation` uses now in-place ungrouping which is much faster

## [1.0.0-pre-alpha8]

`tech.ml.dataset` version 3.06

### Added

* `fill-range-replace` to inject data to make continuous seqence in column
* `write-nippy!` and `read-nippy`

## [1.0.0-pre-alpha7]

`tech.ml.dataset` version 2.13

### Added

* `replace-missing` new strategies: `:mid` and `:lerp`, working also for dates.

### Changed

* [breaking] `replace-missing` has different conctract and default strategy `:mid`. `value` argument is the last argument now.
* [breaking] `replace-missing` `:up` and `:down` strategies, when `value` is `nil` fills border missing values with nearest value.

## [1.0.0-pre-alpha6]

`tech.ml.dataset` version 2.06

### Added

* `asof-join` added

## [1.0.0-pre-alpha4]

### Added

* `reshape` tests
* `pivot->wider` accepts `:drop-missing?` option (default: `true`)

### Changed

* `pivot->wider` drops missing rows by default
* `pivto->wider` order of concatenated column names is reversed (first: colnames, last: value), was opposite.
* `pivot->longer` `:splitter` accepts string used for splitting column name