4.5 KiB
Vendored
DataFrame object is immutable and all operations return a new instance of DataFrame.
Naming conventions
DataFrame is a columnar data structure and is more oriented to column-wise operations. Most transformation operations start with column selector that selects target columns for the operation.
Syntax of most column operations assumes that they are applied to columns, so they don't include word column in their naming.
On the other hand, the Kotlin DataFrame library follows Koltin Collections naming for row-wise operations
as DataFrame can be interpreted as a Collection of rows. The slight naming difference with Kotlin Collection is that all operations are named in imperative way: sortBy, shuffle etc.
Pairs of column/row operations:
- add columns / append rows
- remove columns / drop rows
- select columns / filter rows
- group columns / groupBy for rows
- reorder columns / sortBy for rows
- join to unite columns / concat to unite rows
Horizontal (column) operations:
- add — add columns
- addId — add
idcolumn - flatten — remove column groupings recursively
- group — group columns into
ColumnGroup - insert — insert column
- map — map columns into new
DataFrameorDataColumn - merge — merge several columns into one
- move — move columns or change column groupings
- remove — remove columns
- rename — rename columns
- reorder — reorder columns
- replace — replace columns
- select — select subset of columns
- split — split values into new columns
- ungroup — remove column grouping
Vertical (row) operations:
- append — add rows
- concat — union rows from several
DataFrameobjects - distinct / distinctBy — remove duplicated rows
- drop / dropLast / dropWhile / dropNulls / dropNA — remove rows by condition
- duplicate — duplicate rows
- explode — spread lists and
DataFrameobjects vertically into new rows - filter / filterBy — filter rows
- implode — merge column values into lists grouping by other columns
- reverse — reverse rows
- shuffle — reorder rows randomly
- sortBy / sortByDesc / sortWith — sort rows
- split — split values into new rows
- take / takeLast / takeWhile — get first/last rows
Value modification:
- convert — convert values into new types
- parse — try to convert
Stringvalues into appropriate types - unfold — convert / "unfold" objects to
ColumnGroup - update — update values preserving column types
- fillNulls / fillNaNs / fillNA — replace missing values
Reshaping:
- pivot / pivotCounts / pivotMatches — convert values into new columns
- gather — convert pairs of column names and values into
keyandvaluecolumns
Learn how to: