514 lines
12 KiB
Markdown
Vendored
514 lines
12 KiB
Markdown
Vendored
# Change Log
|
|
|
|
## [7.042]
|
|
|
|
* Deps updated
|
|
* added fn 'map-column->columns' ([#178])(https://github.com/scicloj/tablecloth/issues/178)
|
|
|
|
## [7.029]
|
|
|
|
### Added
|
|
|
|
* `reorder-columns` can work on grouped dataset now
|
|
|
|
### Fixed
|
|
|
|
* arrays of 2 element arrays behave as expected on dataset creation ([#142](https://github.com/scicloj/tablecloth/issues/142))
|
|
|
|
## [7.021]
|
|
|
|
Deps updated
|
|
|
|
Documentation changed to be generated by Clay instead of RMarkdown
|
|
|
|
## [7.017]
|
|
|
|
### Fixed
|
|
|
|
* semi and anti joins fail on table containing missing values, multi columns and duplicated rows
|
|
|
|
## [7.014]
|
|
|
|
Deps updated to fix `j/left-join` issue.
|
|
|
|
## [7.012]
|
|
|
|
### Fixed
|
|
|
|
* join columns should consider `nil` as missing value only, [discussion](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/tablecloth.20join-columns.20is.20nil-ing.20false.3F.20values)
|
|
* `:nil-missing?` in more places needed (group-by operations), [discussion](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/tablecloth.20group.20operations.20dropping.20a.20nil.20group-by.20column)
|
|
* changes to the `group-by` documentation [PR115](https://github.com/scicloj/tablecloth/pull/115), thanks to [Marshall](https://github.com/mars0i)
|
|
* reflection warning for `Collections/shuffle` removed
|
|
|
|
## [7.007]
|
|
|
|
### Added
|
|
|
|
* Extened documentation for `dataset` (copied from TMD), [#112](https://github.com/scicloj/tablecloth/issues/112)
|
|
|
|
### Changed
|
|
|
|
* `rows` accepts `:nil-missing?`(default: true) and `copying?`(default: false) options.
|
|
|
|
## [7.000-beta-51]
|
|
|
|
Deps updated
|
|
|
|
## [7.000-beta-50.2]
|
|
|
|
### Added
|
|
|
|
* `:hashing` is available for single column joins too
|
|
|
|
## [7.000-beta-50.1]
|
|
|
|
### Added
|
|
|
|
* `:hashing` option determines method of creating an index for multicolumn joins (was `hash` is `identity`)
|
|
|
|
### Fixed
|
|
|
|
* [#108](https://github.com/scicloj/tablecloth/issues/108) - hashing replaced with packing data into the sequence
|
|
|
|
## [7.000-beta-50]
|
|
|
|
Deps updated
|
|
|
|
## [7.000-beta-38]
|
|
|
|
### Fixed
|
|
|
|
* dataset from singleton creation generated from wrong structure
|
|
|
|
## [7.000-beta-27]
|
|
|
|
### Added
|
|
|
|
* `map-rows` to map each row and produce new columns
|
|
* `rows` can return sequence of vectors (`:as-vecs`)
|
|
|
|
### Fixed
|
|
|
|
* balanced k-fold partitioning as proposed in [#92](https://github.com/scicloj/tablecloth/issues/92) by @behrica
|
|
|
|
## [7.000-beta-16]
|
|
|
|
Updated to TMD v7
|
|
|
|
Differences:
|
|
|
|
* the order of columns is persisted in more cases
|
|
* the order of groups in grouped dataset can be random
|
|
|
|
## [6.103.1]
|
|
|
|
### Added
|
|
|
|
* doc strings for every funcitons, #87, #88
|
|
* aggregate-columns should default to all columns when called without a column selector #91
|
|
* create functions for packing / unpacking columns to arrays #82
|
|
|
|
### Changed
|
|
|
|
* [breaking] when dataset file do not exists throw an exception #84, #85
|
|
|
|
## [6.103]
|
|
|
|
Clojure upgraded to 1.11.1
|
|
|
|
### Added
|
|
|
|
* `separate-column` infers column names when function is used and `target-columns` is `nil`, [#78](https://github.com/scicloj/tablecloth/issues/78)
|
|
|
|
### Changed
|
|
|
|
* [breaking][minor] `separate-column` repleces source column with target on every case
|
|
|
|
## [6.102]
|
|
|
|
### Fixed
|
|
|
|
* replace `clojure.core/pmap` with `dtype-next` version (related to [#325](https://github.com/techascent/tech.ml.dataset/issues/325))
|
|
|
|
## [6.101]
|
|
|
|
### Added
|
|
|
|
`get-entry` introduced
|
|
|
|
## [6.094.1]
|
|
|
|
### Fixed
|
|
|
|
* [#77] `anti-join` and `semi-join` bugs when tables contain missing values
|
|
|
|
## [6.094]
|
|
|
|
### Added
|
|
|
|
* `crosstab` - cross tabulation
|
|
* `pivot->longer` `:coerce-to-number` option added
|
|
|
|
### Changed
|
|
|
|
* [breaking] `pivot->wider` no longer coerces column names to strings, it's up to user
|
|
|
|
### Fixed
|
|
|
|
* predicates should behave as in Clojure ([discussion](https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/tablecloth.2Fselect-rows.20predicate.20requirements.3F))
|
|
|
|
## [6.090]
|
|
|
|
TMD version bump
|
|
|
|
### Changed
|
|
|
|
[breaking]
|
|
|
|
`replace-missing` up/down strategies clarified. `:down` is replaced by `:downup` and `:up` is replaced by `:updown`. `:down` and `:up` work only in one direction now.
|
|
|
|
https://github.com/techascent/tech.ml.dataset/issues/305
|
|
|
|
## [6.088.1]
|
|
|
|
### Fixed
|
|
|
|
* Wrong way of selecting columns for joins (shouldn't be a set). https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/complete.20ala.20R/near/286277344
|
|
|
|
## [6.088]
|
|
|
|
### Added
|
|
|
|
* `data frame` term in the title of docs ([discussion](https://app.slack.com/client/T03RZGPFR/threads/thread/C03RZGPG3-1649857892.946909))
|
|
* joins can accept different names for left/right datasets
|
|
* `cross-join`, `expand` and `complete` introduced
|
|
|
|
### Changed
|
|
|
|
* removed setting `*warn-on-reflection*`
|
|
* [breaking] creation of singleton dataset adds an error message as a column by default ([discussion](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/Empty.20csv.20regression.3F.3F))
|
|
|
|
## [6.076]
|
|
|
|
Version bump
|
|
|
|
### Added
|
|
|
|
* docstring for `unroll` and `fold-by` by @holyjak (#60 and #61)
|
|
|
|
## [6.051]
|
|
|
|
### Fixed
|
|
|
|
- [#58] - editor friendly api file
|
|
|
|
## [6.031]
|
|
|
|
### Fixed
|
|
|
|
- #57 - InputStream should be dispatched first (the flow now: tries to create a dataset and it fails packs an objet as a singleton
|
|
|
|
### Changed
|
|
|
|
- `select-rows` accepts `IFn` for row selection.
|
|
- [breaking] #54, #56 - `pipeline` namespace is stripped, all functions are moved to [metamorph](https://github.com/scicloj/metamorph) library. This is temporary solution before removing this namespace completely. Pipelined versions of functions will be moved to metamorph as well later.
|
|
|
|
## [6.025]
|
|
|
|
### Added
|
|
|
|
* [#49] added docstring to `add-column`
|
|
|
|
### Fixed
|
|
|
|
* [#53] summary prefix ignored for aggregate (when fn[ds] is passed)
|
|
|
|
## [6.023]
|
|
|
|
### Added
|
|
|
|
* Documented columns / rows functions [PR52](https://github.com/scicloj/tablecloth/pull/52)
|
|
* Reference to original to lifted functions metadata for pipelines [PR51](https://github.com/scicloj/tablecloth/pull/51)
|
|
|
|
### Changed
|
|
|
|
* alias for api functions in reference (was: `api`, is: `tc`)
|
|
|
|
## [6.012]
|
|
|
|
### Fixed
|
|
|
|
* `replace-missing` on grouped dataset has swapped arguments
|
|
|
|
## [6.006]
|
|
|
|
### Fixed
|
|
|
|
* `update-columns` on grouped dataset
|
|
|
|
## [6.002]
|
|
|
|
### Changed
|
|
|
|
* [#43] Align with TMD for dataset creation from a map of sequences.
|
|
* [breaking] creation from tensor is `:as-rows` now
|
|
|
|
## [6.00-beta-16]
|
|
|
|
### Changed
|
|
|
|
* [#42] [breaking] `add-column` default strategy is `:strict` now.
|
|
|
|
## [6.00-beta-10]
|
|
|
|
### Fixed
|
|
|
|
* [#41] dataset name not set on tensor path
|
|
|
|
## [6.00-beta-7]
|
|
|
|
TMD upgrade, no changes in TC
|
|
|
|
## [5.17]
|
|
|
|
TMD upgrade
|
|
|
|
### Fixed
|
|
|
|
* [#36] `reorder-columns` on empty dataset returns nil
|
|
|
|
## [5.11]
|
|
|
|
### Fixed
|
|
|
|
* `aggregate-columns` didn't keep column order (#35)
|
|
|
|
## [5.05.1]
|
|
|
|
### Added
|
|
|
|
* `pipeline` functions have `doc` copied from original ones
|
|
|
|
## [5.05]
|
|
|
|
### Added
|
|
|
|
* `split` can turn off shuffling now (`:shuffle?` option)
|
|
* `split :holdouts` - sequence of consecutive holdouts
|
|
|
|
## [5.04]
|
|
|
|
tech.ml.dataset version bump, this introduces the change of the order of the groups after `group-by` operation
|
|
|
|
## [5.02]
|
|
|
|
### Added
|
|
|
|
* `split :holdout` supports any number of splits (minimum 2) [#28]
|
|
* `split` supports `split-names` to provide custom names for subdatasets
|
|
* `concat` and `concat-copying` are working with grouped datasets
|
|
|
|
### Fixed
|
|
|
|
* `kfold` split failed on small number of rows (due to `partition-all` behaviour
|
|
|
|
## [5.01]
|
|
|
|
### Added
|
|
|
|
* `split->seq` to return train/test splits as a sequence or datasets or as map of sequences for grouped datasets
|
|
|
|
### Changed
|
|
|
|
* [breaking] `tablecloth.pipeline` returns a map with dataset under `:metamorph/data` key (see [metamorph](https://github.com/scicloj/metamorph))
|
|
* [breaking] `split` returns now a dataset or grouped dataset with two new columns indicating train/test and split id. See `split->seq` for previous behaviour.
|
|
|
|
## [5.00-beta-29.1]
|
|
|
|
### Added
|
|
|
|
* `without-grouping->` threading macro which allows operations on grouping dataset treated as a regular one.
|
|
|
|
### Changed
|
|
|
|
* `group-by` accepts any java.util.Map for a collection of indexes (use LinkedHashMap to persist an order)
|
|
* some `tablecloth.api.group-by` functions moved to `tablecloth.api.utils`, no changes to API
|
|
|
|
## [5.00-beta-29]
|
|
|
|
### Changed
|
|
|
|
* `add-or-replace-column(s)` replaced by `add-column(s)` (`add-or-replace-column(s)` is marked as deprecated) (#16)
|
|
|
|
### Fixed
|
|
|
|
* `mark-as-group` wasn't visible in API (#18)
|
|
* `map-columns` didn't propagate `new-type` for grouped case (#20)
|
|
* broken links (#14) in readme
|
|
|
|
## [5.00-beta-28]
|
|
|
|
### Added
|
|
|
|
* `let-dataset` - to simulate `tibble` from R
|
|
|
|
### Fixed
|
|
|
|
* Adding a column to an empty dataset returned empty dataset
|
|
|
|
## [5.00-beta-27]
|
|
|
|
### Changed
|
|
|
|
* re-implementation of numerical arrays path dataset creation
|
|
|
|
## [5.00-beta-25]
|
|
|
|
### Added
|
|
|
|
* `rows` and `columns` new result: `:as-double-arrays` - convert rows to 2d double array
|
|
* dataset can be created from numerical arrays [discusson](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/dataset.20.3C-.3E.20jvm.20arrays)
|
|
|
|
### Fixed
|
|
|
|
* column from single value should create valid datatype (#10)
|
|
|
|
## [5.00-beta-21a]
|
|
|
|
### Added
|
|
|
|
* `tablecloth.pipeline` for pipeline operations
|
|
|
|
## [5.00-beta-21]
|
|
|
|
### Added
|
|
|
|
* `concat-copying` exposed.
|
|
* `split` function for splitting into train-test pairs with `:kfold`, `:bootstrap`, `:loo` and `holdout` strategies + stratified versions
|
|
* `replace-missing` with new strategy `:midpoint`
|
|
|
|
## [5.00-beta-5a]
|
|
|
|
### Fixed
|
|
|
|
* column names should keep order for provided names (#9)
|
|
|
|
## [5.00-beta-5]
|
|
|
|
t.m.d update
|
|
|
|
## [5.00-beta-3]
|
|
|
|
t.m.d update
|
|
|
|
### Changed
|
|
|
|
* contribution guide in readme
|
|
|
|
## [5.00-beta-2]
|
|
|
|
t.m.d update
|
|
|
|
### Changed
|
|
|
|
* `write-nippy!` and `read-nippy` are deprecated, replaced by `write!` and `dataset`
|
|
|
|
## [5.0-SNAPSHOT]
|
|
|
|
`tech.ml.dataset` version 5.0-alpha*
|
|
|
|
### Added
|
|
|
|
* `map-columns` accepts optional target datatype
|
|
* `ds/column->dataset` functionality introduced in `separate-column`
|
|
* more datatypes included for conversion (`:text` among others)
|
|
|
|
### Changed
|
|
|
|
* `write-csv!` replaced by `write!` (`write-csv!` is marked as deprecated)
|
|
* `info` field `:size` is replaced by `:n-elems`
|
|
* [breaking] `separate-column` 3-arity version accepts `separator` instead `target-columns` now
|
|
|
|
### Fixed
|
|
|
|
* do not skip 1-row DS when folding
|
|
* do not attempt to fold empty dataset
|
|
|
|
## [4.04]
|
|
|
|
`tech.ml.dataset` version 4.04
|
|
|
|
### Added
|
|
|
|
* tests: dataset
|
|
|
|
### Changed
|
|
|
|
* version number to match t.m.dataset version
|
|
* documentation:
|
|
- gfm renderer for markdown
|
|
|
|
### Fixed
|
|
|
|
* code block language alignment fix in css
|
|
|
|
## [1.0.0-pre-alpha9]
|
|
|
|
`tech.ml.dataset` version 4.03
|
|
|
|
### Added
|
|
|
|
* some operations on grouped dataset can be parallel (`parallel?` option set to `true`). These are: `aggregate`, `unique-by`, `order-by`, `join-columns`, `separate-columns`, `ungroup`
|
|
|
|
### Fixed
|
|
|
|
* #2 - docs typo
|
|
* #3 - recover datatypes after ungrouping
|
|
|
|
### Changed
|
|
|
|
* `aggregation` uses now in-place ungrouping which is much faster
|
|
|
|
## [1.0.0-pre-alpha8]
|
|
|
|
`tech.ml.dataset` version 3.06
|
|
|
|
### Added
|
|
|
|
* `fill-range-replace` to inject data to make continuous seqence in column
|
|
* `write-nippy!` and `read-nippy`
|
|
|
|
## [1.0.0-pre-alpha7]
|
|
|
|
`tech.ml.dataset` version 2.13
|
|
|
|
### Added
|
|
|
|
* `replace-missing` new strategies: `:mid` and `:lerp`, working also for dates.
|
|
|
|
### Changed
|
|
|
|
* [breaking] `replace-missing` has different conctract and default strategy `:mid`. `value` argument is the last argument now.
|
|
* [breaking] `replace-missing` `:up` and `:down` strategies, when `value` is `nil` fills border missing values with nearest value.
|
|
|
|
## [1.0.0-pre-alpha6]
|
|
|
|
`tech.ml.dataset` version 2.06
|
|
|
|
### Added
|
|
|
|
* `asof-join` added
|
|
|
|
## [1.0.0-pre-alpha4]
|
|
|
|
### Added
|
|
|
|
* `reshape` tests
|
|
* `pivot->wider` accepts `:drop-missing?` option (default: `true`)
|
|
|
|
### Changed
|
|
|
|
* `pivot->wider` drops missing rows by default
|
|
* `pivto->wider` order of concatenated column names is reversed (first: colnames, last: value), was opposite.
|
|
* `pivot->longer` `:splitter` accepts string used for splitting column name
|