ajet/df-research

Fork 0

Files

ajet bdf064f54d init research

2026-02-08 11:20:43 -10:00

12 KiB

Vendored

Raw Permalink Blame History

Change Log

[7.042]

Deps updated
added fn 'map-column->columns' ([#178])(https://github.com/scicloj/tablecloth/issues/178)

[7.029]

Added

reorder-columns can work on grouped dataset now

Fixed

arrays of 2 element arrays behave as expected on dataset creation (#142)

[7.021]

Deps updated

Documentation changed to be generated by Clay instead of RMarkdown

[7.017]

Fixed

semi and anti joins fail on table containing missing values, multi columns and duplicated rows

[7.014]

Deps updated to fix j/left-join issue.

[7.012]

Fixed

join columns should consider nil as missing value only, discussion
:nil-missing? in more places needed (group-by operations), discussion
changes to the group-by documentation PR115, thanks to Marshall
reflection warning for Collections/shuffle removed

[7.007]

Added

Extened documentation for dataset (copied from TMD), #112

Changed

rows accepts :nil-missing?(default: true) and copying?(default: false) options.

[7.000-beta-51]

Deps updated

[7.000-beta-50.2]

Added

:hashing is available for single column joins too

[7.000-beta-50.1]

Added

:hashing option determines method of creating an index for multicolumn joins (was hash is identity)

Fixed

#108 - hashing replaced with packing data into the sequence

[7.000-beta-50]

Deps updated

[7.000-beta-38]

Fixed

dataset from singleton creation generated from wrong structure

[7.000-beta-27]

Added

map-rows to map each row and produce new columns
rows can return sequence of vectors (:as-vecs)

Fixed

balanced k-fold partitioning as proposed in #92 by @behrica

[7.000-beta-16]

Updated to TMD v7

Differences:

the order of columns is persisted in more cases
the order of groups in grouped dataset can be random

[6.103.1]

Added

doc strings for every funcitons, #87, #88
aggregate-columns should default to all columns when called without a column selector #91
create functions for packing / unpacking columns to arrays #82

Changed

[breaking] when dataset file do not exists throw an exception #84, #85

[6.103]

Clojure upgraded to 1.11.1

Added

separate-column infers column names when function is used and target-columns is nil, #78

Changed

[breaking][minor] separate-column repleces source column with target on every case

[6.102]

Fixed

replace clojure.core/pmap with dtype-next version (related to #325)

[6.101]

Added

get-entry introduced

[6.094.1]

Fixed

[#77] anti-join and semi-join bugs when tables contain missing values

[6.094]

Added

crosstab - cross tabulation
pivot->longer :coerce-to-number option added

Changed

[breaking] pivot->wider no longer coerces column names to strings, it's up to user

Fixed

predicates should behave as in Clojure (discussion)

[6.090]

TMD version bump

Changed

[breaking]

replace-missing up/down strategies clarified. :down is replaced by :downup and :up is replaced by :updown. :down and :up work only in one direction now.

https://github.com/techascent/tech.ml.dataset/issues/305

[6.088.1]

Fixed

Wrong way of selecting columns for joins (shouldn't be a set). https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/complete.20ala.20R/near/286277344

[6.088]

Added

data frame term in the title of docs (discussion)
joins can accept different names for left/right datasets
cross-join, expand and complete introduced

Changed

removed setting *warn-on-reflection*
[breaking] creation of singleton dataset adds an error message as a column by default (discussion)

[6.076]

Version bump

Added

docstring for unroll and fold-by by @holyjak (#60 and #61)

[6.051]

Fixed

[#58] - editor friendly api file

[6.031]

Fixed

#57 - InputStream should be dispatched first (the flow now: tries to create a dataset and it fails packs an objet as a singleton

Changed

select-rows accepts IFn for row selection.
[breaking] #54, #56 - pipeline namespace is stripped, all functions are moved to metamorph library. This is temporary solution before removing this namespace completely. Pipelined versions of functions will be moved to metamorph as well later.

[6.025]

Added

[#49] added docstring to add-column

Fixed

[#53] summary prefix ignored for aggregate (when fn[ds] is passed)

[6.023]

Added

Documented columns / rows functions PR52
Reference to original to lifted functions metadata for pipelines PR51

Changed

alias for api functions in reference (was: api, is: tc)

[6.012]

Fixed

replace-missing on grouped dataset has swapped arguments

[6.006]

Fixed

update-columns on grouped dataset

[6.002]

Changed

[#43] Align with TMD for dataset creation from a map of sequences.
[breaking] creation from tensor is :as-rows now

[6.00-beta-16]

Changed

[#42] [breaking] add-column default strategy is :strict now.

[6.00-beta-10]

Fixed

[#41] dataset name not set on tensor path

[6.00-beta-7]

TMD upgrade, no changes in TC

[5.17]

TMD upgrade

Fixed

[#36] reorder-columns on empty dataset returns nil

[5.11]

Fixed

aggregate-columns didn't keep column order (#35)

[5.05.1]

Added

pipeline functions have doc copied from original ones

[5.05]

Added

split can turn off shuffling now (:shuffle? option)
split :holdouts - sequence of consecutive holdouts

[5.04]

tech.ml.dataset version bump, this introduces the change of the order of the groups after group-by operation

[5.02]

Added

split :holdout supports any number of splits (minimum 2) [#28]
split supports split-names to provide custom names for subdatasets
concat and concat-copying are working with grouped datasets

Fixed

kfold split failed on small number of rows (due to partition-all behaviour

[5.01]

Added

split->seq to return train/test splits as a sequence or datasets or as map of sequences for grouped datasets

Changed

[breaking] tablecloth.pipeline returns a map with dataset under :metamorph/data key (see metamorph)
[breaking] split returns now a dataset or grouped dataset with two new columns indicating train/test and split id. See split->seq for previous behaviour.

[5.00-beta-29.1]

Added

without-grouping-> threading macro which allows operations on grouping dataset treated as a regular one.

Changed

group-by accepts any java.util.Map for a collection of indexes (use LinkedHashMap to persist an order)
some tablecloth.api.group-by functions moved to tablecloth.api.utils, no changes to API

[5.00-beta-29]

Changed

add-or-replace-column(s) replaced by add-column(s) (add-or-replace-column(s) is marked as deprecated) (#16)

Fixed

mark-as-group wasn't visible in API (#18)
map-columns didn't propagate new-type for grouped case (#20)
broken links (#14) in readme

[5.00-beta-28]

Added

let-dataset - to simulate tibble from R

Fixed

Adding a column to an empty dataset returned empty dataset

[5.00-beta-27]

Changed

re-implementation of numerical arrays path dataset creation

[5.00-beta-25]

Added

rows and columns new result: :as-double-arrays - convert rows to 2d double array
dataset can be created from numerical arrays discusson

Fixed

column from single value should create valid datatype (#10)

[5.00-beta-21a]

Added

tablecloth.pipeline for pipeline operations

[5.00-beta-21]

Added

concat-copying exposed.
split function for splitting into train-test pairs with :kfold, :bootstrap, :loo and holdout strategies + stratified versions
replace-missing with new strategy :midpoint

[5.00-beta-5a]

Fixed

column names should keep order for provided names (#9)

[5.00-beta-5]

t.m.d update

[5.00-beta-3]

t.m.d update

Changed

contribution guide in readme

[5.00-beta-2]

t.m.d update

Changed

write-nippy! and read-nippy are deprecated, replaced by write! and dataset

[5.0-SNAPSHOT]

tech.ml.dataset version 5.0-alpha*

Added

map-columns accepts optional target datatype
ds/column->dataset functionality introduced in separate-column
more datatypes included for conversion (:text among others)

Changed

write-csv! replaced by write! (write-csv! is marked as deprecated)
info field :size is replaced by :n-elems
[breaking] separate-column 3-arity version accepts separator instead target-columns now

Fixed

do not skip 1-row DS when folding
do not attempt to fold empty dataset

[4.04]

tech.ml.dataset version 4.04

Added

tests: dataset

Changed

version number to match t.m.dataset version
documentation:
- gfm renderer for markdown

Fixed

code block language alignment fix in css

[1.0.0-pre-alpha9]

tech.ml.dataset version 4.03

Added

some operations on grouped dataset can be parallel (parallel? option set to true). These are: aggregate, unique-by, order-by, join-columns, separate-columns, ungroup

Fixed

#2 - docs typo
#3 - recover datatypes after ungrouping

Changed

aggregation uses now in-place ungrouping which is much faster

[1.0.0-pre-alpha8]

tech.ml.dataset version 3.06

Added

fill-range-replace to inject data to make continuous seqence in column
write-nippy! and read-nippy

[1.0.0-pre-alpha7]

tech.ml.dataset version 2.13

Added

replace-missing new strategies: :mid and :lerp, working also for dates.

Changed

[breaking] replace-missing has different conctract and default strategy :mid. value argument is the last argument now.
[breaking] replace-missing :up and :down strategies, when value is nil fills border missing values with nearest value.

[1.0.0-pre-alpha6]

tech.ml.dataset version 2.06

Added

asof-join added

[1.0.0-pre-alpha4]

Added

reshape tests
pivot->wider accepts :drop-missing? option (default: true)

Changed

pivot->wider drops missing rows by default
pivto->wider order of concatenated column names is reversed (first: colnames, last: value), was opposite.
pivot->longer :splitter accepts string used for splitting column name

12 KiB Vendored Raw Permalink Blame History

Change Log

[7.042]

[7.029]

Added

Fixed

[7.021]

[7.017]

Fixed

[7.014]

[7.012]

Fixed

[7.007]

Added

Changed

[7.000-beta-51]

[7.000-beta-50.2]

Added

[7.000-beta-50.1]

Added

Fixed

[7.000-beta-50]

[7.000-beta-38]

Fixed

[7.000-beta-27]

Added

Fixed

[7.000-beta-16]

[6.103.1]

Added

Changed

[6.103]

Added

Changed

[6.102]

Fixed

[6.101]

Added

[6.094.1]

Fixed

[6.094]

Added

Changed

Fixed

[6.090]

Changed

[6.088.1]

Fixed

[6.088]

Added

Changed

[6.076]

Added

[6.051]

Fixed

[6.031]

Fixed

Changed

[6.025]

Added

Fixed

[6.023]

Added

Changed

[6.012]

Fixed

[6.006]

Fixed

[6.002]

Changed

[6.00-beta-16]

Changed

[6.00-beta-10]

Fixed

[6.00-beta-7]

[5.17]

Fixed

[5.11]

Fixed

[5.05.1]

12 KiB

Vendored

Raw Permalink Blame History