init research

This commit is contained in:
2026-02-08 11:20:43 -10:00
commit bdf064f54d
3041 changed files with 1592200 additions and 0 deletions
+617
View File
@@ -0,0 +1,617 @@
[//]: # (title: Column selectors)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
[`DataFrame`](DataFrame.md) provides a DSL for selecting an arbitrary set of columns: the Columns Selection DSL.
Column selectors are used in many operations:
<!---FUN columnSelectorsUsages-->
```kotlin
df.select { age and name }
df.fillNaNs { colsAtAnyDepth().colsOf<Double>() }.withZero()
df.remove { cols { it.hasNulls() } }
df.group { cols { it.data != name } }.into { "nameless" }
df.update { city }.notNull { it.lowercase() }
df.gather { colsOf<Number>() }.into("key", "value")
df.move { name.firstName and name.lastName }.after { city }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.columnSelectorsUsages.html" width="100%"/>
<!---END-->
#### Full DSL Grammar {collapsible="true"}
**Definitions**
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.api.ColumnsSelectionDsl.DslGrammar.DefinitionsPartOfGrammar.html" width="100%"/>
<tabs>
<tab title="Directly in the DSL">
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.api.ColumnsSelectionDsl.DslGrammar.PlainDslPartOfGrammar.html" width="100%"/>
</tab>
<tab title="On a Column Set">
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.api.ColumnsSelectionDsl.DslGrammar.ColumnSetPartOfGrammar.ForHtml.html" width="100%"/>
</tab>
<tab title="On a Column Group">
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.api.ColumnsSelectionDsl.DslGrammar.ColumnGroupPartOfGrammar.ForHtml.html" width="100%"/>
</tab>
</tabs>
#### Functions Overview {collapsible="true"}
##### First (Col), Last (Col), Single (Col) {collapsible="true"}
`first {}`, `firstCol()`, `last {}`, `lastCol()`, `single {}`, `singleCol()`
Returns the first, last, or single column from the top-level, specified [column group](DataColumn.md#columngroup),
or [`ColumnSet`](#column-resolvers) that adheres to the optional given condition. If no column adheres to the given condition,
`NoSuchElementException` is thrown.
##### Col {collapsible="true"}
`col(name)`, `col(5)`
Creates a [`ColumnAccessor`](#column-resolvers) (or [`SingleColumn`](#column-resolvers)) for a column with the given
argument from the top-level or specified [column group](DataColumn.md#columngroup). The argument can be either an
index (`Int`) or a reference to a column (`String`, [`ColumnPath`](#column-resolvers), or
[`ColumnAccessor`](#column-resolvers);
any [AccessApi](apiLevels.md)).
##### Value Col, Frame Col, Col Group {collapsible="true"}
`valueCol(name)`, `valueCol(5)`, `frameCol(name)`, `frameCol(5)`, `colGroup(name)`, `colGroup(5)`
Creates a [`ColumnAccessor`](DataColumn.md) (or `SingleColumn`) for a
[value column](DataColumn.md#valuecolumn) / [frame column](DataColumn.md#framecolumn) /
[column group](DataColumn.md#columngroup) with the given argument from the top-level or
specified [column group](DataColumn.md#columngroup). The argument can be either an index (`Int`) or a reference
to a column (`String`, [`ColumnPath`](#column-resolvers), or [`ColumnAccessor`](#column-resolvers); any [AccessApi](apiLevels.md)).
The functions can be both typed and untyped (in case you're supplying a column name, path, or index).
These functions throw an `IllegalArgumentException` if the column found is not the right kind.
##### Cols {collapsible="true"}
`cols {}`, `cols()`, `cols(colA, colB)`, `cols(1, 5)`, `cols(1..5)`, `[{}]`, `colSet[1, 3]`
Creates a subset of columns ([`ColumnSet`](#column-resolvers)) from the top-level, specified [column group](DataColumn.md#columngroup),
or [`ColumnSet`](#column-resolvers).
You can use either a `ColumnFilter`, or any of the `vararg` overloads for any [AccessApi](apiLevels.md).
The function can be both typed and untyped (in case you're supplying a column name, -path, or index (range)).
Note that you can also use the `[]` operator for most overloads of `cols` to achieve the same result.
##### Range of Columns {collapsible="true"}
`colA.."colB"`
Creates a [`ColumnSet`](#column-resolvers) containing all columns from `colA` to `colB` (inclusive) from the top-level.
Columns inside [column groups](DataColumn.md#columngroup) are also supported
(as long as they share the same direct parent), as well as any combination of [AccessApi](apiLevels.md).
##### Value Columns, Frame Columns, Column Groups {collapsible="true"}
`valueCols {}`, `valueCols()`, `frameCols {}`, `frameCols()`, `colGroups {}`, `colGroups()`
Creates a subset of columns ([`ColumnSet`](#column-resolvers)) from the top-level, specified [column group](DataColumn.md#columngroup),
or [`ColumnSet`](#column-resolvers) containing only [value columns](DataColumn.md#valuecolumn) / [frame columns](DataColumn.md#framecolumn) /
[column groups](DataColumn.md#columngroup) that adhere to the optional condition.
##### Cols of Kind {collapsible="true"}
`colsOfKind(Value, Frame) {}`, `colsOfKind(Group, Frame)`
Creates a subset of columns ([`ColumnSet`](#column-resolvers)) from the top-level, specified [column group](DataColumn.md#columngroup),
or [`ColumnSet`](#column-resolvers) containing only columns of the specified kind(s) that adhere to the optional condition.
##### All (Cols) {collapsible="true"}
`all()`, `allCols()`
Creates a [`ColumnSet`](#column-resolvers) containing all columns from the top-level, specified [column group](DataColumn.md#columngroup),
or [`ColumnSet`](#column-resolvers). This is the opposite of [`none()`](ColumnSelectors.md#none) and equivalent to
[`cols()`](ColumnSelectors.md#cols) without filter.
Note, on [column groups](DataColumn.md#columngroup), `all` is named `allCols` instead to avoid confusion.
##### All (Cols) After, -Before, -From, -Up To {collapsible="true"}
`allAfter(colA)`, `allBefore(colA)`, `allColsFrom(colA)`, `allColsUpTo(colA)`
Creates a [`ColumnSet`](#column-resolvers) containing a subset of columns from the top-level,
specified [column group](DataColumn.md#columngroup), or [`ColumnSet`](#column-resolvers).
The subset includes:
- `all(Cols)Before(colA)`: All columns before the specified column, excluding that column.
- `all(Cols)After(colA)`: All columns after the specified column, excluding that column.
- `all(Cols)From(colA)`: All columns from the specified column, including that column.
- `all(Cols)UpTo(colA)`: All columns up to the specified column, including that column.
NOTE: The `{}` overloads of these functions in the Plain DSL and on [column groups](DataColumn.md#columngroup)
are a `ColumnSelector` (relative to the receiver).
On `ColumnSets` they are a `ColumnFilter` instead.
##### Cols at any Depth {collapsible="true"}
`colsAtAnyDepth {}`, `colsAtAnyDepth()`
Creates a [`ColumnSet`](#column-resolvers) containing all columns from the top-level, specified [column group](DataColumn.md#columngroup),
or [`ColumnSet`](#column-resolvers) at any depth if they satisfy the optional given predicate. This means that columns (of all three kinds!)
nested inside [column groups](DataColumn.md#columngroup) are also included.
This function can also be followed by another [`ColumnSet`](#column-resolvers) filter-function like `colsOf<>()`, `single()`,
or `valueCols()`.
**For example:**
Depth-first search to a column containing the value "Alice":
`df.select { colsAtAnyDepth().first { "Alice" in it.values() } }`
The columns at any depth excluding the top-level:
`df.select { colGroups().colsAtAnyDepth() }`
All [value-](DataColumn.md#valuecolumn) and [frame columns](DataColumn.md#framecolumn) at any depth:
`df.select { colsAtAnyDepth { !it.isColumnGroup } }`
All value columns at any depth nested under a column group named "myColGroup":
`df.select { myColGroup.colsAtAnyDepth().valueCols() }`
**Converting from deprecated syntax:**
`dfs { condition }` -> `colsAtAnyDepth { condition }`
`allDfs(includeGroups = false)` -> `colsAtAnyDepth { includeGroups || !it.isColumnGroup() }`
`dfsOf<Type> { condition }` -> `colsAtAnyDepth().colsOf<Type> { condition }`
`cols { condition }.recursively()` -> `colsAtAnyDepth { condition }`
`first { condition }.rec()` -> `colsAtAnyDepth { condition }.first()`
`all().recursively()` -> `colsAtAnyDepth()`
##### Cols in Groups {collapsible="true"}
`colsInGroups {}`, `colsInGroups()`
Creates a [`ColumnSet`](#column-resolvers) containing all columns that are nested in the [column groups](DataColumn.md#columngroup) at
the top-level, specified [column group](DataColumn.md#columngroup), or [`ColumnSet`](#column-resolvers) adhering to an optional predicate.
This is useful if you want to select all columns that are "one level down".
This function used to be called `children()` in the past.
**For example:**
To get the columns inside all [column groups](DataColumn.md#columngroup) in a [dataframe](DataFrame.md),
instead of having to write:
`df.select { colGroupA.cols() and colGroupB.cols() ... }`
you can use:
`df.select { colsInGroups() }`
or with filter:
`df.select { colsInGroups { "user" in it.name } }`
Similarly, you can take the columns inside all [column groups](DataColumn.md#columngroup) in a [`ColumnSet`](#column-resolvers):
`df.select { colGroups { "my" in it.name }.colsInGroups() }`
##### Take (Last) (Cols) (While) {collapsible="true"}
`take(5)`, `takeLastCols(2)`, `takeLastWhile {}`, `takeColsWhile {}`,
Creates a [`ColumnSet`](#column-resolvers) containing the first / last `n` columns from the top-level,
specified [column group](DataColumn.md#columngroup), or [`ColumnSet`](#column-resolvers) or those that adhere to the given condition.
Note, to avoid ambiguity, `take` is called `takeCols` when called on a [column group](DataColumn.md#columngroup).
##### Drop (Last) (Cols) (While) {collapsible="true"}
`drop(5)`, `dropLastCols(2)`, `dropLastWhile {}`, `dropColsWhile {}`
Creates a [`ColumnSet`](#column-resolvers) without the first / last `n` columns from the top-level,
specified [column group](DataColumn.md#columngroup), or [`ColumnSet`](#column-resolvers) or those that adhere to the given condition.
Note, to avoid ambiguity, `drop` is called `dropCols` when called on a [column group](DataColumn.md#columngroup).
##### Select from [Column Group](DataColumn.md#columngroup) {collapsible="true"}
`colGroupA.select {}`, `"colGroupA" {}`
Creates a [`ColumnSet`](#column-resolvers) containing the columns selected by a `ColumnsSelector` relative to the specified
[column group](DataColumn.md#columngroup). In practice, this means you're opening a new selection DSL scope inside a
[column group](DataColumn.md#columngroup) and selecting columns from there.
The selected columns are referenced individually and "unpacked" from their parent
[column group](DataColumn.md#columngroup).
**For example:**
Select `myColGroup.someCol` and all `String` columns from `myColGroup`:
`df.select { myColGroup.select { someCol and colsOf<String>() } }`
`df.select { "myGroupCol" { "colA" and expr("newCol") { colB + 1 } } }`
`df.select { "pathTo"["myGroupCol"].select { "colA" and "colB" } }`
`df.select { it["myGroupCol"].asColumnGroup()() { "colA" and "colB" } }`
> Did you know? Because the Columns Selection DSL uses
> [`@DslMarker`](https://kotlinlang.org/docs/type-safe-builders.html#scope-control-dslmarker), outer scope leaking is
> prohibited! This means that you can't reference columns from the outer scope inside the `select {}` block. This
> ensures safety and prevents issues for when multiple columns exist with the same name.
>
> `userData.select { age and address.select { `~~`age`~~` } }`
##### (All) (Cols) Except {collapsible="true"}
`colSet.except()`, `allExcept {}`, `colGroupA.allColsExcept {}`, `colGroupA.except {}`
Exclude a selection of columns from the current selection using a relative `ColumnsSelector`.
This function is best explained in parts:
**On Column Sets:** `except {}`
This function can be explained the easiest with a [`ColumnSet`](#column-resolvers).
Let's say we want all `Int` columns apart from `age` and `height`.
We can do:
`df.select { colsOf<Int>() except (age and height) }`
which will 'subtract' the [`ColumnSet`](#column-resolvers) created by `age and height` from the [`ColumnSet`](#column-resolvers) created by
[`colsOf<Int>()`](ColumnSelectors.md#cols-of).
This operation can also be used to exclude columns that are originally in [column groups](DataColumn.md#columngroup).
For instance, excluding `userData.age`:
`df.select { colsAtAnyDepth { "a" in it.name() } except userData.age }`
Note that the selection of columns to exclude from column sets is always done relative to the outer scope.
Use the [Extension Properties API](concepts/extensionPropertiesApi.md) to prevent scoping issues if possible.
> Special case: If a column that needs to be removed appears multiple times in the [`ColumnSet`](#column-resolvers),
> it is excepted each time it is encountered (including inside [Column Groups](DataColumn.md#columngroup)).
> You could say the receiver `ColumnSet` is [simplified](ColumnSelectors.md#simplify) before the operation is performed:
>
> `cols(a, a, a. b, a. b).except(a. b) == cols(a).except(a. b)`
**Directly in the DSL:** `allExcept {}`
Instead of having to write `all() except { ... }` in the DSL, you can use `allExcept { ... }` to achieve the same result.
This does the same but is a handy shorthand.
For example:
`df.select { allExcept { userData.age and height } }`
**On [Column Groups](DataColumn.md#columngroup):** `allColsExcept {}`
The variant of this function on [Column Groups](DataColumn.md#columngroup) is a bit different, as it changes the scope
to being relative to the [Column Groups](DataColumn.md#columngroup).
This is similar to the [`select`](ColumnSelectors.md#select-from-column-group) function.
In other words:
`df.select { myColGroup.allColsExcept { colA and colB } }`
is shorthand for
`df.select { myColGroup.select { allExcept { colA and colB } } }`
or
`df.select { myColGroup.allCols() except { myColGroup.colA and myColGroup.colB } }`
Note the name change, similar to [`allCols`](ColumnSelectors.md#cols), this makes it clearer that you're selecting
columns inside the group, 'lifting' them out.
**On [Column Groups](DataColumn.md#columngroup):** `except {}`
This variant can be used to exclude some nested columns from a [Column Group](DataColumn.md#columngroup) in the selection.
In contrast to `allColsExcept`, this function does not 'lift' the columns out of the group, preserving the structure.
So:
`df.select { colGroup.except { col } }`
is shorthand for:
`df.select { cols(colGroup) except colGroup.col }`
or:
`df.remove { colGroup.col }.select { colGroup }`
##### Column Name Filters {collapsible="true"}
`nameContains()`, `colsNameContains()`, `nameStartsWith()`, `colsNameEndsWith()`
Creates a [`ColumnSet`](#column-resolvers) containing columns from the top-level, specified [column group](DataColumn.md#columngroup),
or [`ColumnSet`](#column-resolvers) that have names that satisfy the given function. These functions accept a `String` as argument, as
well as an optional `ignoreCase` parameter. For the `nameContains` variant, you can also pass a `Regex` as an argument.
Note, on [column groups](DataColumn.md#columngroup), the functions have names starting with `cols` to avoid
ambiguity.
##### (Cols) Without Nulls {collapsible="true"}
`withoutNulls()`, `colsWithoutNulls()`
Creates a [`ColumnSet`](#column-resolvers) containing columns from the top-level, specified [column group](DataColumn.md#columngroup),
or [`ColumnSet`](#column-resolvers) that have no `null` values. This is a shorthand for `cols { !it.hasNulls() }`.
Note, to avoid ambiguity, `withoutNulls` is called `colsWithoutNulls` when called on a
[column group](DataColumn.md#columngroup).
##### Distinct {collapsible="true"}
`colSet.distinct()`
Returns a new [`ColumnSet`](#column-resolvers) from the specified [`ColumnSet`](#column-resolvers) containing only distinct columns (by path).
This is useful when you've selected the same column multiple times but only want it once.
This does not cover the case where a column is selected individually and through its enclosing
[column group](DataColumn.md#columngroup). See [`simplify`](ColumnSelectors.md#simplify) for that.
NOTE: This doesn't solve the `DuplicateColumnNamesException` if you've selected two columns with the same name.
For this, you'll need to [rename](ColumnSelectors.md#rename) one of the columns.
##### None {collapsible="true"}
`none()`
Creates an empty [`ColumnSet`](#column-resolvers), essentially selecting no columns at all.
This is the opposite of [`all()`](ColumnSelectors.md#all-cols).
This function mostly exists for completeness, but can be useful in some very specific cases.
##### Cols Of {collapsible="true"}
`colsOf<T>()`, `colsOf<T> {}`
Creates a [`ColumnSet`](#column-resolvers) containing columns from the top-level, specified [column group](DataColumn.md#columngroup),
or [`ColumnSet`](#column-resolvers) that are a subtype of the specified type `T` and adhere to the optional condition.
##### Simplify {collapsible="true"}
`colSet.simplify()`
Returns a new [`ColumnSet`](#column-resolvers) from the specified [`ColumnSet`](#column-resolvers) in 'simplified' form.
This function simplifies the structure of the [`ColumnSet`](#column-resolvers) by removing columns that are already present in
[column groups](DataColumn.md#columngroup), returning only these groups,
plus columns not belonging in any of the groups.
In other words, this means that if a column in the [`ColumnSet`](#column-resolvers) is inside a [column group](DataColumn.md#columngroup)
in the [`ColumnSet`](#column-resolvers), it will not be included in the result.
It's useful in combination with [`colsAtAnyDepth {}`](ColumnSelectors.md#cols-at-any-depth), as that function can
create a [`ColumnSet`](#column-resolvers) containing both a column and the [column group](DataColumn.md#columngroup) it's in.
In the past, was named `top()` and `roots()`, but these names have been deprecated.
**For example:**
`cols(a, a.b, d.c).simplify() == cols(a, d.c)`
##### Filter {collapsible="true"}
`colSet.filter {}`
Returns a new [`ColumnSet`](#column-resolvers) from the specified [`ColumnSet`](#column-resolvers) containing only columns that satisfy the given condition.
This function behaves the same as [`cols {}` and `[{}]`](ColumnSelectors.md#cols), but only exists on column sets.
##### And {collapsible="true"}
`colSet and colB`
Creates a [`ColumnSet`](#column-resolvers) containing the columns from both the left and right side of the function. This allows
you to combine selections or simply select multiple columns at once.
Any combination of [AccessApi](concepts/apiLevels.md) can be used on either side of the `and` operator.
Note, while you can write `col1 and col2 and col3...`, it may be more concise to use
[`cols(col1, col2, col3...)`](ColumnSelectors.md#cols) instead. The only downside is that you can't mix
[Access APIs](concepts/apiLevels.md) with that notation.
##### Rename {collapsible="true"}
`colA named "colB"`, `colA into namedColAccessor`
Renaming a column in the Columns Selection DSL is done by calling the infix functions
`named` or `into`.
They behave exactly the same, so it's up to contextual preference which one to use.
Any combination of [Access API](concepts/apiLevels.md) can be used to specify the column to rename
and which name should be used instead.
##### Expr (Column Expression) {collapsible="true"}
`expr {}`, `expr("newCol") {}`
Creates a temporary new column by defining an expression to fill up each row.
You may have come across this name before in the [Add DSL](add.md) or
[`toDataFrame {}` DSL](createDataFrame.md#todataframe).
It's extremely useful when you want to create a new column based on existing columns for operations like
[`sortBy`](sortBy.md), [`groupBy`](groupBy.md), etc.
#### Examples
**Select columns by name:**
<!---FUN columnSelectors-->
<tabs>
<tab title="Properties">
```kotlin
// by column name
df.select { it.name }
df.select { name }
// by column path
df.select { name.firstName }
// with a new name
df.select { name named "Full Name" }
// converted
df.select { name.firstName.map { it.lowercase() } }
// column arithmetics
df.select { 2021 - age }
// two columns
df.select { name and age }
// range of columns
df.select { name..age }
// all columns of ColumnGroup
df.select { name.allCols() }
// traversal of columns at any depth from here excluding ColumnGroups
df.select { name.colsAtAnyDepth().filter { !it.isColumnGroup() } }
```
</tab>
<tab title="Strings">
```kotlin
// by column name
df.select { it["name"] }
// by column path
df.select { it["name"]["firstName"] }
df.select { "name"["firstName"] }
// with a new name
df.select { "name" named "Full Name" }
// converted
df.select { "name"["firstName"]<String>().map { it.uppercase() } }
// column arithmetics
df.select { 2021 - "age"<Int>() }
// two columns
df.select { "name" and "age" }
// by range of names
df.select { "name".."age" }
// all columns of ColumnGroup
df.select { "name".allCols() }
// traversal of columns at any depth from here excluding ColumnGroups
df.select { "name".colsAtAnyDepth().filter { !it.isColumnGroup() } }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.columnSelectors.html" width="100%"/>
<!---END-->
**Select columns by column index:**
<!---FUN columnsSelectorByIndices-->
```kotlin
// by index
df.select { col(2) }
// by several indices
df.select { cols(0, 1, 3) }
// by range of indices
df.select { cols(1..4) }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.columnsSelectorByIndices.html" width="100%"/>
<!---END-->
**Other column selectors:**
<!---FUN columnSelectorsMisc-->
```kotlin
// by condition
df.select { cols { it.name().startsWith("year") } }
df.select { nameStartsWith("year") }
// by type
df.select { colsOf<String>() }
// by type with condition
df.select { colsOf<String?> { it.countDistinct() > 5 } }
// all top-level columns
df.select { all() }
// first/last n columns
df.select { take(2) }
df.select { takeLast(2) }
// all except first/last n columns
df.select { drop(2) }
df.select { dropLast(2) }
// find the first column satisfying the condition
df.select { first { it.name.startsWith("year") } }
// find the last column inside a column group satisfying the condition
df.select {
colGroup("name").lastCol { it.name().endsWith("Name") }
}
// traversal of columns at any depth from here excluding ColumnGroups
df.select { colsAtAnyDepth().filter { !it.isColumnGroup() } }
// traversal of columns at any depth from here including ColumnGroups
df.select { colsAtAnyDepth() }
// traversal of columns at any depth with condition
df.select { colsAtAnyDepth().filter { it.name().contains(":") } }
// traversal of columns at any depth to find columns of given type
df.select { colsAtAnyDepth().colsOf<String>() }
// all columns except given column set
df.select { allExcept { colsOf<String>() } }
// union of column sets
df.select { take(2) and col(3) }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.columnSelectorsMisc.html" width="100%"/>
<!---END-->
**Modify the set of selected columns:**
<!---FUN columnSelectorsModifySet-->
```kotlin
// first/last n value- and frame columns in column set
df.select { colsAtAnyDepth().filter { !it.isColumnGroup() }.take(3) }
df.select { colsAtAnyDepth().filter { !it.isColumnGroup() }.takeLast(3) }
// all except first/last n value- and frame columns in column set
df.select { colsAtAnyDepth().filter { !it.isColumnGroup() }.drop(3) }
df.select { colsAtAnyDepth().filter { !it.isColumnGroup() }.dropLast(3) }
// filter column set by condition
df.select { colsAtAnyDepth().filter { !it.isColumnGroup() && it.name().startsWith("year") } }
// exclude columns from column set
df.select { colsAtAnyDepth().filter { !it.isColumnGroup() }.except { age } }
// keep only unique columns
df.select { (colsOf<Int>() and age).distinct() }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.columnSelectorsModifySet.html" width="100%"/>
<!---END-->
### Column Resolvers
`ColumnsResolver` is the base type used to resolve columns within the **Columns Selection DSL**,
as well as the return type of columns selection expressions.
All functions described above for selecting columns in various ways return a `ColumnResolver` of a specific kind:
- **`SingleColumn`** — resolves to a single [`DataColumn`](DataColumn.md).
- **`ColumnAccessor`** — a specialized `SingleColumn` with a defined path and type argument.
It can also be renamed during selection.
- **`ColumnPath`** — a wrapper for a [`DataColumn`](DataColumn.md) path
in a [`DataFrame`](DataFrame.md) also can serve as a `ColumnAccessor`.
```kotlin
// Select all columns from the group by path "group2"/"info":
df.select { pathOf("group2", "info").allCols() }
// For each selected column, place it under its ancestor group
// from two levels up in the column path hierarchy:
df.group { colsAtAnyDepth().colsOf<String>() }
.into { it.path.dropLast(2) }
```
- **`ColumnSet`** — resolves to an ordered list of [`DataColumn`s](DataColumn.md).
+170
View File
@@ -0,0 +1,170 @@
# Setup And Overview
<web-summary>
Explore the Kotlin DataFrame Compiler Plugin —
a powerful tool providing on-the-fly type-safe column-accessors for dataframes.
</web-summary>
<card-summary>
Explore the Kotlin DataFrame Compiler Plugin —
a powerful tool providing on-the-fly type-safe column-accessors for dataframes.
</card-summary>
<link-summary>
Explore the Kotlin DataFrame Compiler Plugin —
a powerful tool providing on-the-fly type-safe column-accessors for dataframes.
</link-summary>
> Now available in Gradle (IDEA 2025.2+) and Maven (IDEA 2025.3+) projects, is coming soon to Kotlin Notebook.
**Kotlin DataFrame Compiler Plugin** is a Kotlin compiler plugin that automatically generates
**[type-safe extension properties](extensionPropertiesApi.md)** for your dataframes,
allowing you to access columns and row values in a type-safe way and avoid mistakes in column names.
## Why use it?
- Access columns as regular properties: `df.name` instead of `df["name"]`.
- Get full IDE and compiler support: autocompletion, refactoring, and type checking.
- Improve code readability and safety when working with DataFrame.
Check out this video that shows how expressions update the schema of a dataframe:
<video src="compiler_plugin.mp4" controls=""/>
## Setup
We recommend using an up-to-date IntelliJ IDEA and Kotlin version for the best experience. Requires at least versions 2025.2 and 2.2.20, respectively.
<tabs>
<tab title="Gradle">
Setup plugins in build.gradle.kts:
```kotlin
kotlin("jvm") version "%compilerPluginKotlinVersion%"
```
```kotlin
kotlin("plugin.dataframe") version "%compilerPluginKotlinVersion%"
```
Setup library dependency:
```kotlin
implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
```
Due to the [known issue](https://youtrack.jetbrains.com/issue/KT-66735), incremental compilation must be disabled for now.
Add the following line to your `gradle.properties` file:
```properties
kotlin.incremental=false
```
Sync the project.
</tab>
<tab title="Maven">
The DataFrame compiler plugin can be used in Maven projects starting from IntelliJ IDEA 2025.3, available now as EAP builds
Setup plugin in pom.xml:
```xml
<plugin>
<artifactId>kotlin-maven-plugin</artifactId>
<groupId>org.jetbrains.kotlin</groupId>
<version>%compilerPluginKotlinVersion%</version>
<configuration>
<compilerPlugins>
<plugin>kotlin-dataframe</plugin>
</compilerPlugins>
</configuration>
<dependencies>
<dependency>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-maven-dataframe</artifactId>
<version>%compilerPluginKotlinVersion%</version>
</dependency>
</dependencies>
</plugin>
```
Setup library dependency:
```xml
<dependency>
<groupId>org.jetbrains.kotlinx</groupId>
<artifactId>dataframe</artifactId>
<version>%dataFrameVersion%</version>
</dependency>
```
Sync the project.
</tab>
</tabs>
## Features overview
### Static interpretation of DataFrame API
Plugin evaluates dataframe operations, given compile-time known arguments such as constant String, resolved types, property access calls.
It updates the return type of the function call to provide properties that match column names and types.
The goal is to reflect the result of operations you apply to dataframe in types and have convenient typed API
```kotlin
val weatherData = dataFrameOf(
"time" to columnOf(0, 1, 2, 4, 5, 7, 8, 9),
"temperature" to columnOf(12.0, 14.2, 15.1, 15.9, 17.9, 15.6, 14.2, 24.3),
"humidity" to columnOf(0.5, 0.32, 0.11, 0.89, 0.68, 0.57, 0.56, 0.5)
)
weatherData.filter { temperature > 15.0 }.print()
```
The schema of DataFrame, as the compiler plugin sees it,
is displayed when you hover on an expression or variable:
![image.png](schema_info.png)
### @DataSchema declarations
Untyped DataFrame can be assigned a data schema - top-level interface or class that describes names and types of columns in the dataframe.
```kotlin
@DataSchema
data class Repositories(
@ColumnName("full_name")
val fullName: String,
@ColumnName("html_url")
val htmlUrl: java.net.URL,
@ColumnName("stargazers_count")
val stargazersCount: Int,
val topics: String,
val watchers: Int
)
fun main() {
val df = DataFrame
.readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
.convertTo<Repositories>()
df.filter { stargazersCount > 50 }.print()
}
```
[Learn more](dataSchema.md) about data schema declarations
## Examples
* [Kotlin DataFrame in the IntelliJ IDEA Gradle project example](https://github.com/Kotlin/dataframe/blob/master/examples/kotlin-dataframe-plugin-gradle-example)
— an IntelliJ IDEA Gradle project showcasing simple DataFrame expressions using the Compiler Plugin.
* [Kotlin DataFrame in the IntelliJ IDEA Maven project example](https://github.com/Kotlin/dataframe/blob/master/examples/kotlin-dataframe-plugin-maven-example)
— an IntelliJ IDEA Maven project showcasing simple DataFrame expressions using the Compiler Plugin.
* [](compilerPluginExamples.md) — few examples of Compiler Plugin usages.
+220
View File
@@ -0,0 +1,220 @@
# Frequently Asked Questions
Here's a list of frequently asked questions about Kotlin DataFrame.
If you havent found an answer to yours, feel free to ask it on:
- [GitHub Issues](https://github.com/Kotlin/dataframe/issues)
- [#datascience](https://slack-chats.kotlinlang.org/c/datascience) channel in Kotlin Slack
([request an invite](https://surveys.jetbrains.com/s3/kotlin-slack-sign-up?_gl=1*1ssyqy3*_gcl_au*MTk5NzUwODYzOS4xNzQ2NzkxMDMz*FPAU*MTk5NzUwODYzOS4xNzQ2NzkxMDMz*_ga*MTE0ODQ1MzY3OS4xNzM4OTY1NzM3*_ga_9J976DJZ68*czE3NTE1NDUxODUkbzIyNyRnMCR0MTc1MTU0NTE4NSRqNjAkbDAkaDA.)).
## What is Kotlin DataFrame?
**Kotlin DataFrame** is an official open-source Kotlin framework written in pure
Kotlin for working with tabular data.
Its goal is to reconcile Kotlins static typing with the dynamic nature of data,
providing a flexible and convenient idiomatic DSL for working with data in Kotlin.
## Is Kotlin DataFrame a Multiplatform Library?
Not yet — Kotlin DataFrame currently supports only the **JVM** target.
Were actively exploring multiplatform support.
To stay updated on progress, subscribe to the
[corresponding issue](https://github.com/Kotlin/dataframe/issues/24).
### Does Kotlin DataFrame work on Android?
Yes — Kotlin DataFrame can be used in Android projects.
There is no dedicated Android artifact yet, but you can include the standard **JVM artifact**
by setting up a [custom Gradle configuration](SetupAndroid.md).
## How to start with Kotlin DataFrame ?
If you're new to Kotlin DataFrame, the [Quickstart guide](quickstart.md) is the perfect place to begin —
it gives a brief yet comprehensive introduction to the basics of working with DataFrame.
You can also check out [other guides and examples](Guides-And-Examples.md)
to explore various use cases and deepen your understanding of Kotlin DataFrame.
## What is the best environment to use Kotlin DataFrame?
For the best experience, Kotlin DataFrame is most effective in an interactive environment.
- **[Kotlin Notebook](SetupKotlinNotebook.md)** is ideal for exploring Kotlin DataFrame.
Everything works out of the box — interactivity, rich rendering of DataFrames and plots.
You can instantly see the results of each operation, view the contents of your DataFrames after every transformation,
inspect individual rows and columns, and explore data step-by-step in a live and interactive way.
See the [](quickstart.md) to get started quickly.
- **[Kotlin DataFrame Compiler Plugin for IDEA projects](Compiler-Plugin.md)** enhances your usual
[IntelliJ IDEA](https://www.jetbrains.com/idea/) Kotlin projects by enabling compile-time
[extension properties](extensionPropertiesApi.md) generation.
This allows you to work with DataFrames in a name- and type-safe manner,
integrating seamlessly with the IDE.
## Is `DataFrame` mutable?
No, [`DataFrame`](DataFrame.md) is a completely immutable structure.
Kotlin DataFrame follows the functional style of Kotlin —
each [operation](operations.md) that modifies the data returns a new, updated `DataFrame` instance.
This means original data is never changed in-place, which improves code safety.
## How do I interoperate with collections like `List` or `Map`?
[`DataFrame`](DataFrame.md) integrates seamlessly with Kotlin collections.
You can:
- Create a `DataFrame` from a `Map` using [`toDataFrame()`](createDataFrame.md#todataframe).
- Convert a `DataFrame` back to a `Map` using [`toMap()`](toMap.md).
- Create a [`DataColumn`](DataColumn.md) from a `List` using [`toColumn()`](createColumn.md#tocolumn).
- Convert a `DataColumn` to a `List` of values.
- Convert a `DataFrame<T>` into a `List<T>` of data class instances corresponding to each row
using [`toList()`](toList.md).
## Are there any limitations on the types used in a DataFrame?
No! You can store values of **any Kotlin or Java types** inside a [`DataFrame`](DataFrame.md)
and work with them in a type-safe manner using [extension properties](extensionPropertiesApi.md)
across various [operations](operations.md).
For some commonly used types — such as
[Kotlin basic types](https://kotlinlang.org/docs/basic-types.html) and
[Kotlin date-time types](https://github.com/Kotlin/kotlinx-datetime) —
there is built-in support for automatic conversion and parsing.
## What data sources are supported?
<!------TODO data sources---->
Kotlin DataFrame supports all popular data sources — CSV, JSON, Excel, Apache Arrow, SQL databases, and more!
See the [Data Sources section](Data-Sources.md) for a complete list of supported formats
and instructions on how to integrate them into your workflow.
Some sources — such as Apache Spark, [Exposed](https://www.jetbrains.com/help/exposed/home.html),
and [Multik](https://github.com/Kotlin/multik) — are not supported directly (yet),
but you can find [official integration examples here](Integrations.md).
If the data source you need isn't supported yet,
feel free to open an [issue](https://github.com/Kotlin/dataframe/issues)
and describe your use case — wed love to hear from you!
## I see magically appearing properties in examples. What is it?
These are [extension properties](extensionPropertiesApi.md) — one of the key features of Kotlin DataFrame.
Extension properties correspond to the columns of a [`DataFrame`](DataFrame.md), allowing you to access and select them
in a **type-safe** and **name-safe** way.
They are generated automatically when working with Kotlin DataFrame in:
- [Kotlin Notebook](SetupKotlinNotebook.md), where extension properties are generated
after each cell execution.
- A Kotlin project in [IntelliJ IDEA](https://www.jetbrains.com/idea/) with the
[compiler plugin](Compiler-Plugin.md) enabled, where the properties are generated at compile time.
## I used the KProperties API in older versions, what should I use now that it's deprecated?
The KProperty API was a useful access mechanism in earlier versions.
However, with the introduction of [extension properties](extensionPropertiesApi.md)
and the [Kotlin DataFrame compiler plugin](Compiler-Plugin.md),
you now have a more flexible and powerful alternative.
Annotate your Kotlin class with [`@DataSchema`](Compiler-Plugin.md#dataschema-declarations),
and the plugin will automatically generate type-safe extension properties
for your [`DataFrame`](DataFrame.md).
Or alternatively, call [`toDataFrame()`](createDataFrame.md#todataframe) on a list of Kotlin or Java objects,
and the resulting `DataFrame` will have schema according to their properties or getters.
See [compiler plugin examples](Compiler-Plugin.md#examples).
## How to visualize data from a DataFrame?
[Kandy](https://kotlin.github.io/kandy) is a Kotlin plotting library
designed to integrate seamlessly with Kotlin DataFrame.
It provides a convenient and idiomatic Kotlin DSL for building charts,
leveraging all Kotlin DataFrame features — including [extension properties](extensionPropertiesApi.md).
See the [Kandy Quick Start Guide](https://kotlin.github.io/kandy/quick-start-guide.html)
and explore the [Examples Gallery](https://kotlin.github.io/kandy/examples.html).
## Can I work with hierarchical/nested data?
Yes, Kotlin DataFrame is designed to work with hierarchical data.
You can read JSON or any other nested format into a [`DataFrame`](DataFrame.md)
with hierarchical structure — using `FrameColumn`
(a column of dataframes) and `ColumnGroup` (a column with nested subcolumns).
Both [dataframe schemas](schemas.md) and [extension properties](extensionPropertiesApi.md)
fully support nested data structures, allowing type-safe access and transformations at any depth.
See [](hierarchical.md) for more information.
Also, you can transform your data into grouped structures using [`groupBy`](groupBy.md) or [`pivot`](pivot.md).
## Does Kotlin DataFrame support OpenAPI schemas?
Yes — the experimental `dataframe-openapi` module adds support for OpenAPI JSON schemas.
You can use it to parse and work with OpenAPI-defined structures directly in Kotlin DataFrame.
See the [OpenAPI Guide](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/json/KeyValueAndOpenApi.ipynb)
for details and examples.
## Does Kotlin DataFrame support geospatial data?
Yes — the experimental `dataframe-geo` module provides functionality for working with geospatial data,
including support for reading and writing GeoJSON and Shapefile formats, as well as tools for manipulating geometry types.
See the [GeoDataFrame Guide](https://kotlin.github.io/kandy/geo-plotting-guide.html) for details
and examples with beautiful [Kandy](https://kotlin.github.io/kandy) geo visualizations.
## What is the difference between Compiler Plugin, Gradle Plugin, and KSP Plugin?
> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated
> in future releases.
>
> The KSP plugin is **not compatible with [KSP2](https://github.com/google/ksp?tab=readme-ov-file#ksp2-is-here)**
> and may **not work properly with Kotlin 2.1 or newer**.
>
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead
> of relying on the plugins.
> See [](Migration-From-Plugins.md).
{style="warning"}
All these plugins relate to working with [dataframe schemas](schemas.md), but they serve different purposes:
- **[Gradle Plugin](Gradle-Plugin.md)** and **[KSP Plugin](https://github.com/Kotlin/dataframe/tree/master/plugins/symbol-processor)**
are used to **generate data schemas** from external sources as part of the Gradle build process.
- **Gradle Plugin**: You declare the data source in your `build.gradle.kts` file
using the `dataframes { ... }` block.
- **KSP Plugin**: You annotate your Kotlin file with `@ImportDataSchema` file annotation,
and the schema will be generated via Kotlin Symbol Processing.
See [Data Schemas in Gradle Projects](https://kotlin.github.io/dataframe/schemasgradle.html) for more.
- **[Compiler Plugin](Compiler-Plugin.md)** provides **on-the-fly generation** of
[extension properties](extensionPropertiesApi.md)
based on an existing schema **during compilation**, and updates the [`DataFrame`](DataFrame.md)
schema seamlessly after operations.
However, when reading data from files or external sources (like SQL),
the initial `DataFrame` schema cannot be inferred automatically —
you need to specify it manually or generate it using the [`generate..()` methods](DataSchemaGenerationMethods.md).
## How do I contribute or report an issue?
Were always happy to receive contributions!
If youd like to contribute, please refer to our
[contributing guidelines](https://github.com/Kotlin/dataframe/blob/master/CONTRIBUTING.md).
To report bugs or suggest improvements, open an issue on the
[DataFrame GitHub repository](https://github.com/Kotlin/dataframe/issues).
Youre also welcome to ask questions or discuss anything related to Kotlin DataFrame in the
[#datascience](https://slack-chats.kotlinlang.org/c/datascience) channel on Kotlin Slack.
If youre not yet a member, you can
[request an invitation](https://surveys.jetbrains.com/s3/kotlin-slack-sign-up?_gl=1*1ssyqy3*_gcl_au*MTk5NzUwODYzOS4xNzQ2NzkxMDMz*FPAU*MTk5NzUwODYzOS4xNzQ2NzkxMDMz*_ga*MTE0ODQ1MzY3OS4xNzM4OTY1NzM3*_ga_9J976DJZ68*czE3NTE1NDUxODUkbzIyNyRnMCR0MTc1MTU0NTE4NSRqNjAkbDAkaDA.).
+39
View File
@@ -0,0 +1,39 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
SYSTEM "https://resources.jetbrains.com/writerside/1.0/xhtml-entities.dtd">
<topic xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="https://resources.jetbrains.com/writerside/1.0/topic.v2.xsd"
title="Home" id="Home">
<section-starting-page>
<title>Kotlin DataFrame</title>
<description>
Kotlin DataFrame is an open-source library for Kotlin that provides a powerful
and typesafe DSL for structured in-memory data processing.
</description>
<spotlight>
<a href="quickstart.md" type="start"/>
<a href="Guides-And-Examples.md" type="library"/>
</spotlight>
<primary>
<title>First steps</title>
<a href="SetupKotlinNotebook.md"/>
<a href="concepts.md"/>
<a href="operations.md"/>
<a href="read.md">Reading from files: CSV, JSON, ApacheArrow</a>
</primary>
<secondary>
<title>Featured topics</title>
<a href="Kotlin-DataFrame-Features-in-Kotlin-Notebook.md"/>
<a href="Compiler-Plugin.md"/>
<a href="Data-Sources.md"/>
<a href="readSqlDatabases.md"/>
</secondary>
</section-starting-page>
</topic>
+187
View File
@@ -0,0 +1,187 @@
# Migration to 1.0
## Deprecations and removals
As we move toward version 1.0, many functions have been changed, deprecated, or removed.
This section provides a complete overview of all API changes to help you migrate to 1.0.
### Renamed functions and classes to the correct CamelCase spelling { id="camelCase" }
All functions and classes in Kotlin DataFrame
have been renamed to
[the correct CamelCase spelling](https://developer.android.com/kotlin/style-guide#camel_case).
See below for a complete list of the renamed functions and classes.
### Migration to Deephaven CSV
All CSV (as well as TSV) IO was migrated to a new, fast, and efficient
[Deephaven CSV](https://github.com/deephaven/deephaven-csv) implementation.
It significantly improves CSV IO performance and brings many new parametrization options.
All related methods are now located in the separate [`dataframe-csv`](Modules.md#dataframe-csv) module
(which is included by default in the general [`dataframe`](Modules.md#dataframe-general) artifact
and in `%use dataframe` in [Kotlin Notebook](SetupKotlinNotebook.md)).
Functions were also renamed to [the correct CamelCase spelling](#camelCase).
All new functions keep the same arguments as before and additionally introduce new ones.
Also, [there are new arguments that expose Deephaven CSV features](read.md#unlocking-deephaven-csv-features).
See [](read.md#read-from-csv).
> All outdated CSV IO functions raise `WARNING` in 1.0 and will raise `ERROR` in 1.1.
| 0.15 | 1.0 |
|-------------------------------------------------|-------------------------------------------------|
| `CSV`/`TSV` | `CsvDeephaven`/`TsvDeephaven` |
| `DataFrame.readCSV(..)`/`DataFrame.readTSV(..)` | `DataFrame.readCsv(..)`/`DataFrame.readTsv(..)` |
| `DataFrame.read(delimeter=.., ..)` | `DataFrame.readCsv(delimeter=.., ..)` |
| `df.writeCSV(..)`/`df.writeTSV(..)` | `df.writeCsv(..)`/`df.writeTsv(..)` |
| `df.toCSV(..)` | `df.toCsvStr(..)` |
### Migration to Standard Library `Instant`
Since Kotlin 2.1.20,
[`Instant` is now part of the standard library](https://kotlinlang.org/docs/whatsnew2120.html#new-time-tracking-functionality)
(as `kotlin.time.Instant`).
You can still use the old (deprecated) `kotlinx.datetime.Instant` type, but its support will be removed in Kotlin DataFrame 1.1.
> New `Instant` in the Kotlin Standard Library becomes stable in 2.3.0.
> In earlier versions, all related operations should be marked with the `@OptIn(ExperimentalTime::class)` annotation.
{style="note"}
For now, each `Instant`-related operation has been split into two new ones —
one for the new stdlib `kotlin.time.Instant` and one for the old deprecated `kotlinx.datetime.Instant`.
The behavior of old operations remains unchanged: they work with `kotlinx.datetime.Instant` and raise `ERROR` in 1.0.
In version 1.1, they will be returned and will operate on the new stdlib `kotlin.time.Instant`.
<table>
<tr>
<th>0.15</th>
<th>1.0</th>
<th>Note</th>
</tr>
<tr>
<td rowspan="2"><code>col.convertToInstant()</code></td>
<td><code>col.convertToDeprecatedInstant()</code></td>
<td><code>WARNING</code> in 1.0, <code>ERROR</code> in 1.1</td>
</tr>
<tr>
<td><code>col.convertToStdlibInstant()</code></td>
<td>Will be renamed back into <code>convertToInstant() in 1.1</code></td>
</tr>
<tr>
<td rowspan="2"><code>df.convert { columns }.toInstant()</code></td>
<td><code>df.convert { columns }.convertToDeprecatedInstant()</code></td>
<td><code>WARNING</code> in 1.0, <code>ERROR</code> in 1.1</td>
</tr>
<tr>
<td><code>df.convert { columns }.convertToStdlibInstant()</code></td>
<td>Will be renamed back into <code>convertToInstant() in 1.1</code></td>
</tr>
<tr>
<td rowspan="2"><code>ColType.Instant</code></td>
<td><code>ColType.DeprecatedInstant</code></td>
<td><code>WARNING</code> in 1.0, <code>ERROR</code> in 1.1</td>
</tr>
<tr>
<td><code>ColType.StdlibInstant</code></td>
<td>Will be renamed back into <code>Instant</code> in 1.1</td>
</tr>
</table>
In version 1.0-Beta5 and later, all parsing operations convert `Instant`
values into the new standard library `kotlin.time.Instant` type by default.
To enable parsing into the deprecated `kotlinx.datetime.Instant`,
set the corresponding parsing option **`ParserOptions.parseExperimentalInstant = false`**
(before 1.0-Beta5, this option was `false`, from 1.0-Beta5 onwards it is `true` by default).
For example:
```kotlin
DataFrame.readCsv(
...,
parserOptions = ParserOptions(parseExperimentalInstant = false)
)
```
### Deprecation of `cols()` and other methods in Columns Selection DSL
`cols()` overloads without arguments, which select all columns of a DataFrame or
all subcolumns inside a column group in the [Columns Selection DSL](ColumnSelectors.md),
are deprecated in favor of `all()` and `allCols()` respectively.
These replacements allow the [Compiler Plugin](Compiler-Plugin.md) to fully support such selections.
`colsAtAnyDepth()`, `colsInGroups()`, and `single()` overloads with a `predicate` argument
that filters columns are also deprecated for better Compiler Plugin support.
Use `.filter(predicate)` for filtering instead.
| 0.15 | 1.0 |
|--------------------------------------------------|------------------------------------------------------------------|
| `df.select { cols() }` | `df.select { all() }` |
| `df.select { colGroup.cols() }` | `df.select { colGroup.allCols() }` |
| `df.select { colsAtAnyDepth { predicate } }` | `df.select { colsAtAnyDepth().filter { predicate } }` |
| `df.select { colsInGroups { predicate } }` | `df.select { colsInGroups().filter { predicate } }` |
| `df.select { single { predicate } }` | `df.select { cols().filter { predicate }.single() }` |
| `df.select { colGroup.singleCol { predicate } }` | `df.select { colGroup.allCols().filter { predicate }.single() }` |
| `df.select { colSet.single { predicate } }` | `df.select { colSet.filter { predicate }.single() }` |
### Removed functions and classes
The next functions and classes raise `ERROR` in 1.0 and will be removed in 1.1.
| 0.15 | 1.0 | Reason |
|----------------------------------------------------------------|------------------------------------------------------------------------------|----------------------------------------|
| `DataColumn.createFrameColumn(name, df, startIndices)` | `df.chunked(name, startIndices)` | Replaced with another function. |
| `DataColumn.createWithTypeInference(name, values, nullable)` | `DataColumn.createByInference(name, values, TypeSuggestion.Infer, nullable)` | Replaced with another function. |
| `DataColumn.create(name, values, infer)` | `DataColumn.createByType(name, values, infer)` | Replaced with another function. |
| `col.isComparable()` | `col.valuesAreComparable()` | Renamed to better reflect its purpose. |
| `df.minus { columns }` | `df.remove { columns }` | Replaced with another function. |
| `df.move { columns }.toLeft()`/`df.moveToLeft{ columns }` | `df.move { columns }.toStart()`/`df.moveToStart { columns }` | Renamed to better reflect its purpose. |
| `df.move { columns }.toRight()`/`df.moveToRight{ columns }` | `df.move { columns }.toEnd()`/`df.moveToEnd{ columns }` | Renamed to better reflect its purpose. |
| `row.rowMin()`/`row.rowMinOrNull()` | `row.rowMinOf()`/`row.rowMinOfOrNull()` | Renamed to better reflect its purpose. |
| `row.rowMax()`/`row.rowMaxOrNull()` | `row.rowMaxOf()`/`AnyRow.rowMaxOfOrNull()` | Renamed to better reflect its purpose. |
| `row.rowPercentile()`/`row.rowPercentileOrNull()` | `row.rowPercentileOf()`/`row.rowPercentileOfOrNull()` | Renamed to better reflect its purpose. |
| `row.rowMedian()`/`row.rowMedianOrNull()` | `row.rowMedianOf()`/`row.rowMedianOfOrNull()` | Renamed to better reflect its purpose. |
| `df.convert { columns }.to { converter }` | `df.convert { columns }.asColumn { converter }` | Renamed to better reflect its purpose. |
| `df.toHTML(..)`/`df.toStandaloneHTML()` | `df.toHtml(..)`/`df.toStandaloneHtml()` | Renamed to the correct CamelCase. |
| `df.writeHTML()` | `df.writeHtml()` | Renamed to the correct CamelCase. |
| `asURL(fileOrUrl)`/`isURL(path)` | `asUrl(fileOrUrl)`/`isUrl(path)` | Renamed to the correct CamelCase. |
| `df.convert { columns }.toURL()`/`df.convertToURL { columns }` | `df.convert { columns }.toUrl()`/`df.convertToUrl { columns }` | Renamed to the correct CamelCase. |
| `df.filterBy(column)` | `df.filter { column }` | Replaced with another function. |
| `FormattingDSL` | `FormattingDsl` | Renamed to the correct CamelCase. |
| `RGBColor` | `RgbColor` | Renamed to the correct CamelCase. |
| `df.insert(column).after(columnPath)` | `df.insert(column).after { columnPath }` | Replaced with another function. |
| `CompareResult.Equals` / `CompareResult.isEqual()` | `CompareResult.Matches` / `CompareResult.matches()` | Renamed to better reflect its purpose. |
| `CompareResult.isSuperOrEqual()` | `CompareResult.isSuperOrMatches()` | Renamed to better reflect its purpose. |
The next functions and classes raise `WARNING` in 1.0 and `ERROR` in 1.1.
| 0.15 | 1.0 | Reason |
|----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| `df.split { columns }.default(..)` / `df.split { columns }.into(..)` / `df.split { columns }.inward(..)` | `df.split { columns }.by(..).default(..)` / `df.split { columns }.by(..).into(..)` / `df.split { columns }.by(..).inward(..)` | Removed a shortcut to clarify the behaviour; Only for `String` columns. |
| `dataFrameOf(header, values)` | `dataFrameOf(header).withValues(values)` | Replaced with another function. |
| `df.generateCode(..)` | `df.generateInterfaces(..)` | Replaced with another function. |
| `df.select { mapToColumn(name, infer) { body } }` | `df.select { expr(name, infer) { body } }` | Removed duplicated functionality. |
| `stringCol.length()` | `stringCol.map { it?.length ?: 0 }` | Removed a shortcut to clarify the behaviour; Only for `String` columns. |
| `stringCol.lowercase()` / `stringCol.uppercase()` | `stringCol.map { it?.lowercase() }` / `stringCol.map { it?.uppercase() }` | Removed a shortcut to clarify the behaviour; Only for `String` columns. |
| `df.add(columns)` / `df.add(dataframes)` | `df.addAll(columns)` / `df.addAll(dataframes)` | Renamed to to improve completion. |
| `row.isEmpty()` / `row.isNotEmpty()` | `row.values().all { it == null }` / `row.values().all { it == null }` | Removed a shortcut to clarify the behaviour; |
| `row.getRow(index)` / `row.getRowOrNull(index)` / `row.getRows(indices)` | `row.df().getRow(index)` / `row.df().getRowOrNull(index)` / `row.df().getRows(indices)` | Removed a shortcut to clarify the behaviour; |
| `df.copy()` | `df.columns().toDataFrame().cast()` | Removed a shortcut to clarify the behaviour; |
| `KeyValueProperty<T>` | `NameValueProperty<T>` | Removed duplicated functionality. |
<!--TODO (https://github.com/Kotlin/dataframe/issues/1630)
## Modules
## Compiler Plugin
## Changes in working with JDBC
-->
+14
View File
@@ -0,0 +1,14 @@
# Support
* <img src="github-mark.svg" alt="GitHub logo" height="24"/>
[**Issue Tracker**](https://github.com/Kotlin/dataframe/issues)
If you find a bug or have an idea for a new feature,
file an issue in our [DataFrame GitHub repository](https://github.com/Kotlin/dataframe).
* <img src="https://kotlinlang.org/docs/images/slack.svg" alt="Slack logo" height="24"/>
[**Community**](https://github.com/Kotlin/dataframe/issues)
Peer-to-peer support is available on the Kotlin Slack
[#datascience](https://kotlinlang.slack.com/archives/C4W52CFEZ) channel
([request an invite](https://surveys.jetbrains.com/s3/kotlin-slack-sign-up?_gl=1*1ssyqy3*_gcl_au*MTk5NzUwODYzOS4xNzQ2NzkxMDMz*FPAU*MTk5NzUwODYzOS4xNzQ2NzkxMDMz*_ga*MTE0ODQ1MzY3OS4xNzM4OTY1NzM3*_ga_9J976DJZ68*czE3NTE1NDUxODUkbzIyNyRnMCR0MTc1MTU0NTE4NSRqNjAkbDAkaDA.)).
+12
View File
@@ -0,0 +1,12 @@
# Troubleshooting
## Freezing when working with large page sizes
Some operations in the UI may lag or freeze noticeably when working with large page sizes. Examples of these operations include sorting using the column header or navigating to the next page. This occurs because the UI attempts to render all the data at once, leading to excessive allocations and garbage collection (GC) pressure.
To mitigate this issue, try the following:
- Reduce the page size to a smaller value.
- Increase the JVM heap size in the Kotlin Notebook Plugin settings.
- Tune the JVM GC settings of IntelliJ IDEA.
- For example, adjust the `G1ReservePercent` parameter.
- Further tuning is possible. For more detailed guidance, refer to the [Java Garbage-First Garbage Collector Tuning manual](https://docs.oracle.com/en/java/javase/17/gctuning/garbage-first-garbage-collector-tuning.html#GUID-90E30ACA-8040-432E-B3A0-1E0440AB556A).
+209
View File
@@ -0,0 +1,209 @@
<resource src=".DS_Store"></resource>
<resource src="example.csv"></resource>
<resource src="movies.csv"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotAsDataRowOrFrame.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.mergeDefault.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.compareInnerColumns.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.dropNulls.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.statisticPivotManySeparate.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.compareLeft.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.concatGroupBy.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.DataRowApi.conditions.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.add.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.flattenAll.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByExpr.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.updateAsFrame.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotInward.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.columnsSelectorByIndices.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Join.joinSpecial.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Create.duplicatedColumns.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.dropWhere.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.select.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.convertToValueClass.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.mergeDifferentWith.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.fillNaNs.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.mergeSameWith.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivot2.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.reorderInGroup.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.getSeveralRowsByRanges.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.columnSelectorsModifySet.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.move.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.merge.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.countAggregation.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.sortBy.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Join.join.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotAggregate1.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.parseAll.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.meanAggregationsSkipNA.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotGroupBy.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.split.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.filterJoinWith.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.statisticPivotMany.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotGroupByOther.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByAggregateWithoutInto.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.head.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.splitIntoRows.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.group.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.statisticGroupByMany.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Create.columnAccessorMap.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.rightJoinWith.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.columnSelectors.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.convertAsFrame.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.gatherNames.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.takeWhile.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.excludeJoinWith.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.mergeIntoList.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotCounts.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotCommonAggregations.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotInAggregate.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.dropLast.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.sortByDesc.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotAggregate.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.valueCounts.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.dropNA.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.convert.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.parseWithOptions.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.getColumnsByName.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.insertColumn.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.implode.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.compareInnerValues.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.distinctColumns.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.split1.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.convertTo.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.dropWhile.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.fillNulls.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Create.createDataFrameFromMap.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.schemaGroupBy.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.renameExpression.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.statisticGroupBySingleNamed.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.dropNaNs.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.gather.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.fullJoinWith.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.joinWith.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.columnSelectorsUsages.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.getRowByCondition.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Join.joinWithMatch.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.remove.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.describeColumns.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.statisticGroupBySingle.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.takeLast.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Create.toDataFrameColumn.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Create.toDataFrameLists.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.updateWith.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.addRecurrent.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByWithoutAggregation.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Join.joinDefault.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupBy.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.addDataFrames.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.leftJoinWith.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.take.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.drop.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.reorder.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.shuffle.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.parseSome.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByMoveToTopFalse.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.addMany.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.splitRegex.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.splitRegex1.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.distinctBy.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.DataRowApi.expressions.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.updatePerRowCol.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.mapMany.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByAggregations.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivot.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.getRowByIndex.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.insert.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.updateWithConst.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByMoveToTop.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.ungroup.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.flatten.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.filter.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.xs.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.distinct.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.reverse.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.convertToEnum.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByToFrame.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.splitInplace.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.byRow.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.convertAsColumn.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.columnSelectorsMisc.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.replace.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.rename.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotDefault1.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotDefault.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.sortWith.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.update.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.crossProduct.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.describe.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.compareRight.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.addExisting.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.columnsFor.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotMatches.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Analyze.statisticPivotSingle.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Access.getSeveralRowsByIndices.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Modify.fillNA.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.api.ColumnsSelectionDsl.DslGrammar.ColumnGroupPartOfGrammar.ForHtml.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.api.FormatDocs.Grammar.ForHtml.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.api.ColumnsSelectionDsl.DslGrammar.ColumnSetPartOfGrammar.ForHtml.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.documentation.UnifyingNumbers.Graph.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.api.ColumnsSelectionDsl.DslGrammar.PlainDslPartOfGrammar.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.api.ColumnsSelectionDsl.DslGrammar.DefinitionsPartOfGrammar.html"></resource>
<resource src="extensionPropertiesApi1.html"></resource>
<resource src="formatExample_strings.html"></resource>
<resource src="formatExample_properties.html"></resource>
<resource src="formatExampleNumbers.html"></resource>
<resource src="quickstart.ipynb"></resource>
<resource src="notebook_test_quickstart_6.html"></resource>
<resource src="notebook_test_quickstart_11.html"></resource>
<resource src="notebook_test_quickstart_10.html"></resource>
<resource src="notebook_test_quickstart_7.html"></resource>
<resource src="notebook_test_quickstart_13.html"></resource>
<resource src="notebook_test_quickstart_4.html"></resource>
<resource src="notebook_test_quickstart_8.html"></resource>
<resource src="notebook_test_quickstart_5.html"></resource>
<resource src="notebook_test_quickstart_12.html"></resource>
<resource src="notebook_test_quickstart_3.html"></resource>
<resource src="notebook_test_quickstart_14.html"></resource>
<resource src="notebook_test_generate_docs_1.html"></resource>
<resource src="notebook_test_shuffle_2.html"></resource>
<resource src="notebook_test_shuffle_1.html"></resource>
<resource src="notebook_test_join_8.html"></resource>
<resource src="notebook_test_join_10.html"></resource>
<resource src="dfRightImplicit.html"></resource>
<resource src="notebook_test_join_5.html"></resource>
<resource src="notebook_test_join_11.html"></resource>
<resource src="notebook_test_join_20.html"></resource>
<resource src="notebook_test_join_16.html"></resource>
<resource src="notebook_test_join_17.html"></resource>
<resource src="notebook_test_join_3.html"></resource>
<resource src="notebook_test_join_18.html"></resource>
<resource src="notebook_test_join_14.html"></resource>
<resource src="notebook_test_join_15.html"></resource>
<resource src="dfLeftImplicit.html"></resource>
<resource src="notebook_test_join_19.html"></resource>
<resource src="notebook_test_join_12.html"></resource>
<resource src="notebook_test_join_6.html"></resource>
<resource src="notebook_test_join_13.html"></resource>
<resource src="notebook_test_chunked_1.html"></resource>
<resource src="notebook_test_chunked_3.html"></resource>
<resource src="notebook_test_chunked_2.html"></resource>
<resource src="notebook_test_all_3.html"></resource>
<resource src="notebook_test_tail_1.html"></resource>
<resource src="notebook_test_tail_2.html"></resource>
<resource src="notebook_test_tail_3.html"></resource>
<resource src="notebook_test_between_3.html"></resource>
<resource src="notebook_test_between_2.html"></resource>
<resource src="notebook_test_between_1.html"></resource>
<resource src="notebook_test_any_3.html"></resource>
<resource src="notebook_test_associate_1.html"></resource>
<resource src="notebook_test_associateBy_1.html"></resource>
<resource src="formatHeader.html"></resource>
<resource src="notebook_test_rename_3.html"></resource>
<resource src="notebook_test_rename_4.html"></resource>
<resource src="notebook_test_rename_5.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Create.createNestedRandomDataFrame.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Create.createDataFrameWithFill.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Create.createRandomDataFrame.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Create.readDataFrameFromObject.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Create.readDataFrameFromDeepObject.html"></resource>
<resource src="org.jetbrains.kotlinx.dataframe.samples.api.Create.readDataFrameFromDeepObjectWithExclude.html"></resource>
+22
View File
@@ -0,0 +1,22 @@
[//]: # (title: Access Data)
<show-structure depth="3"/>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
Get [rows](DataRow.md) or [columns](DataColumn.md):
<!---FUN getRowsColumns-->
```kotlin
df.columns() // List<DataColumn>
df.rows() // Iterable<DataRow>
df.values() // Sequence<Any?>
```
<!---END-->
**Learn how to:**
* [Access data by index](indexing.md)
* [Iterate over data](iterate.md)
* [Get a single row](getRow.md)
* [Get single column](getColumns.md)
@@ -0,0 +1,22 @@
# asSequence
<web-summary>
Discover `asSequence` operation in Kotlin Dataframe.
</web-summary>
<card-summary>
Discover `asSequence` operation in Kotlin Dataframe.
</card-summary>
<link-summary>
Discover `asSequence` operation in Kotlin Dataframe.
</link-summary>
Returns [`DataRow`s](DataRow.md) of this [`DataFrame`](DataFrame.md) as
[Sequence](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.sequences/-sequence/).
```kotlin
df.asSequence()
```
+191
View File
@@ -0,0 +1,191 @@
[//]: # (title: add)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Returns [`DataFrame`](DataFrame.md) which contains all columns from the original [`DataFrame`](DataFrame.md) followed by newly added columns.
Original [`DataFrame`](DataFrame.md) is not modified.
`add` appends columns to the end of the dataframe by default.
If you want to add a single column to a specific position in the dataframe, use [insert](insert.md).
**Related operations**: [](addRemove.md)
## Create a new column and add it to [`DataFrame`](DataFrame.md)
```text
add(columnName: String) { rowExpression }
rowExpression: DataRow.(DataRow) -> Value
```
<!---FUN add-->
<tabs>
<tab title="Properties">
```kotlin
df.add("year of birth") { 2021 - age }
```
</tab>
<tab title="Strings">
```kotlin
df.add("year of birth") { 2021 - "age"<Int>() }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.add.html" width="100%"/>
<!---END-->
See [row expressions](DataRow.md#row-expressions)
You can use the `newValue()` function to access value that was already calculated for the preceding row.
It is helpful for recurrent computations:
<!---FUN addRecurrent-->
```kotlin
df.add("fibonacci") {
if (index() < 2) 1
else prev()!!.newValue<Int>() + prev()!!.prev()!!.newValue<Int>()
}
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.addRecurrent.html" width="100%"/>
<!---END-->
## Create and add several columns to [`DataFrame`](DataFrame.md)
```kotlin
add {
columnMapping
columnMapping
...
}
columnMapping = column into columnName
| columnName from column
| columnName from { rowExpression }
| columnGroupName {
columnMapping
columnMapping
...
}
```
<!---FUN addMany-->
<tabs>
<tab title="Properties">
```kotlin
df.add {
"year of birth" from { 2021 - age }
expr { age > 18 } into "is adult"
"details" {
name.lastName.map { it.length } into "last name length"
"full name" from { name.firstName + " " + name.lastName }
}
}
```
</tab>
<tab title="Strings">
```kotlin
df.add {
"year of birth" from { 2021 - "age"<Int>() }
expr { "age"<Int>() > 18 } into "is adult"
"details" {
"name"["lastName"]<String>().map { it.length } into "last name length"
"full name" from { "name"["firstName"]<String>() + " " + "name"["lastName"]<String>() }
}
}
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.addMany.html" width="100%"/>
<!---END-->
### Create columns using intermediate result
Consider this API:
<!---FUN addCalculatedApi-->
```kotlin
class CityInfo(val city: String?, val population: Int, val location: String)
fun queryCityInfo(city: String?): CityInfo = CityInfo(city, city?.length ?: 0, "35.5 32.2")
```
<!---END-->
Use the following approach to add multiple columns by calling the given API only once per row:
<!---FUN addCalculated-->
<tabs>
<tab title="Properties">
```kotlin
val personWithCityInfo = df.add {
val cityInfo = city.map { queryCityInfo(it) }
"cityInfo" {
cityInfo.map { it.location } into "location"
cityInfo.map { it.population } into "population"
}
}
```
</tab>
<tab title="Strings">
```kotlin
val personWithCityInfo = df.add {
val cityInfo = "city"<String?>().map { queryCityInfo(it) }
"cityInfo" {
cityInfo.map { it.location } into "location"
cityInfo.map { it.population } into "population"
}
}
```
</tab></tabs>
<!---END-->
## Add existing column to [`DataFrame`](DataFrame.md)
<!---FUN addExisting-->
```kotlin
val score by columnOf(4, 3, 5, 2, 1, 3, 5)
df.addAll(score)
df + score
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.addExisting.html" width="100%"/>
<!---END-->
## Add all columns from another [`DataFrame`](DataFrame.md)
<!---FUN addDataFrames-->
```kotlin
df.addAll(df1, df2)
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.addDataFrames.html" width="100%"/>
<!---END-->
## addId
Adds a column with sequential values 0, 1, 2,...
The new column will be added in the beginning of the column list
and will become the first column in [`DataFrame`](DataFrame.md).
```
addId(name: String = "id")
```
**Parameters:**
* `name: String = "id"` - name of the new column.
+16
View File
@@ -0,0 +1,16 @@
[//]: # (title: add)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Returns [`DataFrame`](DataFrame.md) with union of columns from several given [`DataFrame`](DataFrame.md) objects.
<!---FUN addDataFrames-->
```kotlin
df.addAll(df1, df2)
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.addDataFrames.html" width="100%"/>
<!---END-->
See [all use cases of 'add' operation](add.md).
+5
View File
@@ -0,0 +1,5 @@
[//]: # (title: Add / map / remove columns)
* [`add`](add.md) columns to [`DataFrame`](DataFrame.md)
* [`map`](map.md) columns to new [`DataFrame`](DataFrame.md), [`DataColumn`](DataColumn.md) or `List`
* [`remove`](remove.md) columns from [`DataFrame`](DataFrame.md)
+16
View File
@@ -0,0 +1,16 @@
[//]: # (title: Adjust schema)
The [`DataFrame`](DataFrame.md) interface has type argument `T` that doesn't affect the contents of [`DataFrame`](DataFrame.md),
but marks [`DataFrame`](DataFrame.md) with a type that represents the data schema that this [`DataFrame`](DataFrame.md) is supposed to have.
This argument is used to generate [extension properties](extensionPropertiesApi.md) for typed data access.
Another place where this argument has a special role is in [interop with data classes](collectionsInterop.md#interop-with-data-classes):
* `List<T>` -> `DataFrame<T>`: [toDataFrame](createDataFrame.md#todataframe)
* `DataFrame<T>` -> `List<T>`: [toList](toList.md)
Actual data in [`DataFrame`](DataFrame.md) may diverge from compile-time schema marker `T` due to dynamic nature of data inside [`DataFrame`](DataFrame.md).
However, at some points of code you may know exactly what [`DataFrame`](DataFrame.md) schema is expected.
To match your knowledge with expected real-time [`DataFrame`](DataFrame.md) contents you can use one of two functions:
* [`cast`](cast.md) — change type argument of [`DataFrame`](DataFrame.md) to the expected schema without changing data in [`DataFrame`](DataFrame.md).
* [`convertTo`](convertTo.md) — convert [`DataFrame`](DataFrame.md) contents to match the expected schema.
+28
View File
@@ -0,0 +1,28 @@
[//]: # (title: append)
Adds one or several rows to [`DataFrame`](DataFrame.md)
```kotlin
df.append(
"Mike", 15,
"John", 17,
"Bill", 30,
)
```
If the [compiler plugin](Compiler-Plugin.md) is enabled, a typesafe overload of `append` is available for `@DataSchema` classes.
```kotlin
@DataSchema
data class Person(val name: String, val age: Int)
```
```kotlin
val df = dataFrameOf(
Person("Mike", 15),
Person("John", 17),
)
df.append(Person("Bill", 30))
```
**Related operations**: [](appendDuplicate.md)
+4
View File
@@ -0,0 +1,4 @@
[//]: # (title: Append / duplicate rows)
* [`append`](append.md) — append new rows
* [`duplicate`](duplicate.md) — duplicate selected rows
+64
View File
@@ -0,0 +1,64 @@
[//]: # (title: cast)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Changes the type argument of the [`DataFrame`](DataFrame.md) instance without changing its contents.
```kotlin
cast<T>(verify = false)
```
Related operations: [](adjustSchema.md)
**Parameters:**
* `verify: Boolean = false`
when `true`, the function throws an exception if the [`DataFrame`](DataFrame.md) instance doesn't match the given schema.
Otherwise, it just changes the format type without actual data checks.
Use this operation to change the formal type of a [`DataFrame`](DataFrame.md) instance
to match the expected schema and enable generated [extension properties](extensionPropertiesApi.md) for it.
```kotlin
@DataSchema
interface Person {
val age: Int
val name: String
}
df.cast<Person>()
```
To convert [`DataFrame`](DataFrame.md) columns to match given schema, use [`convertTo`](convertTo.md) operation.
**Reusing implicitly generated schema**
```kotlin
castTo<T>(df: DataFrame<T>)
```
In notebooks, dataframe types are implicitly generated.
![Implicitly generated schema](implicitlyGeneratedSchema.png)
This type can be referred to, but its name will change whenever you re-execute cells.
Here how you can do it in a more robust way:
<!---FUN castToGenerateSchema-->
```kotlin
val sample = DataFrame.readJson("sample.json")
```
<!---END-->
<!---FUN castTo-->
```kotlin
for (file in files) {
// df here is expected to have the same structure as sample
val df = DataFrame.readJson(file).castTo(sample)
val count = df.count { perf > 10.0 }
println("$file: $count")
}
```
<!---END-->
+125
View File
@@ -0,0 +1,125 @@
[//]: # (title: Interop with Collections)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Collections-->
_Kotlin DataFrame_ and _Kotlin Collection_ represent two different approaches to data storage:
* [`DataFrame`](DataFrame.md) stores data by fields/columns
* `Collection` stores data by records/rows
Although [`DataFrame`](DataFrame.md)
doesn't implement the [`Collection`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-collection/#kotlin.collections.Collection)
or [`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/)
interface, it has many similar operations,
such as [`filter`](filter.md), [`take`](sliceRows.md#take),
[`first`](first.md), [`map`](map.md), [`groupBy`](groupBy.md) etc.
[`DataFrame`](DataFrame.md) has two-way compatibility with [`Map`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-map/) and [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/):
* `List<T>` -> `DataFrame<T>`: [toDataFrame](createDataFrame.md#dataframe-from-iterable-t)
* `DataFrame<T>` -> `List<T>`: [toList](toList.md)
* `Map<String, List<*>>` -> `DataFrame<*>`: [toDataFrame](createDataFrame.md#dataframe-from-map-string-list)
* `DataFrame<*>` -> `Map<String, List<*>>`: [toMap](toMap.md)
* `List<List<T>>` -> `DataFrame<*>`: [toDataFrame](createDataFrame.md#dataframe-from-list-list-t)
Columns, rows, and values of [`DataFrame`](DataFrame.md)
can be accessed as [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/),
[`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/)
and [`Sequence`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.sequences/-sequence/) accordingly:
<!---FUN getRowsColumns-->
```kotlin
df.columns() // List<DataColumn>
df.rows() // Iterable<DataRow>
df.values() // Sequence<Any?>
```
<!---END-->
## Interop with data classes
[`DataFrame`](DataFrame.md) can be used as an intermediate object for transformation from one data structure to another.
Assume you have a list of instances of some [data class](https://kotlinlang.org/docs/data-classes.html) that you need to transform into some other format.
<!---FUN listInterop1-->
```kotlin
data class Input(val a: Int, val b: Int)
val list = listOf(Input(1, 2), Input(3, 4))
```
<!---END-->
You can convert this list into [`DataFrame`](DataFrame.md) using [`toDataFrame()`](createDataFrame.md#todataframe) extension:
<!---FUN listInterop2-->
```kotlin
val df = list.toDataFrame()
```
<!---END-->
Mark the original data class with [`DataSchema`](schemas.md)
annotation to get [extension properties](extensionPropertiesApi.md) and perform data transformations.
<!---FUN listInterop3-->
```kotlin
@DataSchema
data class Input(val a: Int, val b: Int)
val df2 = df.add("c") { a + b }
```
<!---END-->
<tip>
To enable extension properties generation, you should use the [DataFrame plugin](schemasGradle.md)
for Gradle or the [Kotlin Jupyter kernel](SetupJupyter.md)
</tip>
After your data is transformed, [`DataFrame`](DataFrame.md) instances can be exported eagerly
into [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/) of another data class using [toList](toList.md) or [toListOf](toList.md#tolistof) extensions:
<!---FUN listInterop4-->
```kotlin
data class Output(val a: Int, val b: Int, val c: Int)
val result = df2.toListOf<Output>()
```
<!---END-->
```kotlin
data class Output(val a: Int, val b: Int, val c: Int)
val result = df2.toListOf<Output>()
```
Alternatively, one can create lazy [`Sequence`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-sequence/) objects.
This avoids holding the entire list of objects in memory as objects are created on the fly as needed.
<!---FUN listInterop5-->
```kotlin
val df = dataFrameOf("name", "lastName", "age")("John", "Doe", 21)
.group("name", "lastName").into("fullName")
data class FullName(val name: String, val lastName: String)
data class Person(val fullName: FullName, val age: Int)
val persons = df.toListOf<Person>() // [Person(fullName = FullName(name = "John", lastName = "Doe"), age = 21)]
```
<!---END-->
### Converting columns with object instances to ColumnGroup
[unfold](unfold.md) can be used as [`toDataFrame()`](createDataFrame.md#todataframe) analogue for specific columns inside existing dataframes
@@ -0,0 +1,67 @@
# associate
<web-summary>
Discover `associate` operation for Kotlin DataFrame.
</web-summary>
<card-summary>
Discover `associate` operation for Kotlin DataFrame.
</card-summary>
<link-summary>
Discover `associate` operation for Kotlin DataFrame.
</link-summary>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.collectionsInterop.AssociateSamples-->
The `associate` function builds a `Map` from keyvalue `Pair`s produced by applying a transformation to each row
of this [`DataFrame`](DataFrame.md)
using a [row expression](DataRow.md#row-expressions).
If multiple rows produce the same key, only the last value for that key is kept. This matches the behavior of Kotlins standard [`kotlin.collections.associate`](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.sequences/associate.html) function.
```kotlin
df.associate { pairSelector }
pairSelector: (DataRow) -> Pair
```
### Related functions
- [`toMap`](toMap.md) — converts a [`DataFrame`](DataFrame.md) into a `Map` by using column names as keys and their values as map values.
- [`associateBy`](associateBy.md) — creates a map with rows as values.
### Example
<!---FUN notebook_test_associate_1-->
```kotlin
df
```
<!---END-->
<inline-frame src="./resources/notebook_test_associate_1.html" width="100%" height="500px"></inline-frame>
Create a map from name to age using a pair selector:
<!---FUN notebook_test_associate_2-->
```kotlin
df.associate { "${name.firstName} ${name.lastName}" to age }
```
<!---END-->
Output:
```text
{
Alice Cooper: 15,
Bob Dylan: 45,
Charlie Daniels: 20,
Charlie Chaplin: 40,
Bob Marley: 30,
Alice Wolf: 20,
Charlie Byrd: 30
}
```
@@ -0,0 +1,70 @@
# associateBy
<web-summary>
Discover `associateBy` operation for Kotlin DataFrame.
</web-summary>
<card-summary>
Discover `associateBy` operation for Kotlin DataFrame.
</card-summary>
<link-summary>
Discover `associateBy` operation for Kotlin DataFrame.
</link-summary>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.collectionsInterop.AssociateBySamples-->
The `associateBy` function builds a `Map` from a [`DataFrame`](DataFrame.md)
by selecting a key for each row using a [row expression](DataRow.md#row-expressions).
The rows themselves (or values derived from them) become the map values.
If multiple rows produce the same key, only the last row (or value) for that key is kept.
This matches the behavior of Kotlins standard
[`kotlin.collections.associateBy`](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.sequences/associate-by.html)
function.
```kotlin
df.associateBy { keySelector }
keySelector: (DataRow) -> Key
```
### Related functions
- [`toMap`](toMap.md) — converts a [`DataFrame`](DataFrame.md) into a `Map` by using column names as keys and their values as map values.
- [`associate`](associate.md) — builds a map from keyvalue pairs produced by transforming each row.
### Example
<!---FUN notebook_test_associateBy_1-->
```kotlin
df
```
<!---END-->
<inline-frame src="./resources/notebook_test_associateBy_1.html" width="100%" height="500px"></inline-frame>
Create a map with names as keys:
<!---FUN notebook_test_associateBy_2-->
```kotlin
df.associateBy { "${name.firstName} ${name.lastName}" }
```
<!---END-->
Output:
```text
{
Alice Cooper: { name:{ firstName:Alice, lastName:Cooper }, age:15, city:London, weight:54, isHappy:true },
Bob Dylan: { name:{ firstName:Bob, lastName:Dylan }, age:45, city:Dubai, weight:87, isHappy:true },
Charlie Daniels: { name:{ firstName:Charlie, lastName:Daniels }, age:20, city:Moscow, isHappy:false },
Charlie Chaplin: { name:{ firstName:Charlie, lastName:Chaplin }, age:40, city:Milan, isHappy:true },
Bob Marley: { name:{ firstName:Bob, lastName:Marley }, age:30, city:Tokyo, weight:68, isHappy:true },
Alice Wolf: { name:{ firstName:Alice, lastName:Wolf }, age:20, weight:55, isHappy:false },
Charlie Byrd: { name:{ firstName:Charlie, lastName:Byrd }, age:30, city:Moscow, weight:90, isHappy:true }
}
```
+3
View File
@@ -0,0 +1,3 @@
[//]: # (title: columnNames)
Returns a list of names for top-level columns of the [`DataFrame`](DataFrame.md) instance.
@@ -0,0 +1,7 @@
[//]: # (title: Column Operations)
<show-structure depth="3"/>
* [`statistics`](columnStatistics.md) — summary statistics ([`sum`](sum.md), [`mean`](mean.md), etc.) and cumulative statistics ([`cumSum`](cumSum.md), etc.)
* [`asIterable`](asIterable.md) — returns values of [`DataColumn`](DataColumn.md) as [`Iterable`](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.collections/-iterable/)
* [`asSequence`](asSequenceColumn.md) — returns values of [`DataColumn`](DataColumn.md) as [`Sequence`](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.sequences/-sequence/)
* [`between`](between.md) — returns a Boolean [`DataColumn`](DataColumn.md) indicating whether each value lies between two bounds
@@ -0,0 +1,5 @@
[//]: # (title: Column statistics)
Statistics on columns are described:
- [here](summaryStatistics.md) for summary statistics, like [sum](sum.md) and [mean](mean.md)
- [here](columnStatistics.md) for cumulative statistics, like [cumSum](cumSum.md)
+3
View File
@@ -0,0 +1,3 @@
[//]: # (title: columnTypes)
Returns a list of types for top-level columns of [`DataFrame`](DataFrame.md).
+3
View File
@@ -0,0 +1,3 @@
[//]: # (title: columns)
Return top-level columns of [`DataFrame`](DataFrame.md) as `List<DataColumn<*>>`
+3
View File
@@ -0,0 +1,3 @@
[//]: # (title: columnsCount)
Returns number of top-level columns in [`DataFrame`](DataFrame.md).
@@ -0,0 +1,91 @@
[//]: # (title: Compiler Plugin Examples)
This page provides a few examples that you can copy directly to your project.
[Schema info](staticInterpretation.md#schema-info) will be a convenient way to observe the result of different operations.
> See also an
> [IntelliJ IDEA project example](https://github.com/Kotlin/dataframe/blob/master/examples/kotlin-dataframe-plugin-gradle-example),
> showcasing simple DataFrame expressions using the Compiler Plugin.
### Example 1
```kotlin
import org.jetbrains.kotlinx.dataframe.api.*
fun main() {
val df = dataFrameOf("location", "income")(
"mall", "2.49",
"university", "2.99",
"university", "1.49",
"school", "0.99",
"hospital", "2.99",
"university", "0.49",
"hospital", "1.49",
"mall", "0.99",
"hospital", "0.49",
)
df
.convert { income }.with { it.toDouble() }
.groupBy { location }.aggregate {
income.toList() into "allTransactions"
sumOf { income } into "totalIncome"
}.forEach {
println(location)
println("totalIncome = $totalIncome")
}
}
```
### Example 2
```kotlin
import org.jetbrains.kotlinx.dataframe.api.*
import org.jetbrains.kotlinx.dataframe.io.*
enum class State {
Idle, Productive, Maintenance
}
class Event(val toolId: String, val state: State, val timestamp: Long)
fun main() {
val tool1 = "tool_1"
val tool2 = "tool_2"
val tool3 = "tool_3"
val events = listOf(
Event(tool1, State.Idle, 0),
Event(tool1, State.Productive, 5),
Event(tool2, State.Idle, 0),
Event(tool2, State.Maintenance, 10),
Event(tool2, State.Idle, 20),
Event(tool3, State.Idle, 0),
Event(tool3, State.Productive, 25),
).toDataFrame()
val lastTimestamp = events.maxOf { timestamp }
val groupBy = events
.groupBy { toolId }
.sortBy { timestamp }
.add("stateDuration") {
(next()?.timestamp ?: lastTimestamp) - timestamp
}
groupBy.updateGroups {
val allStates = State.entries.toDataFrame {
"state" from { it }
}
val df = allStates.leftJoin(it) { state }
.fillNulls { stateDuration }
.with { -1 }
df.groupBy { state }.sumFor { stateDuration }
}
.toDataFrame()
.toStandaloneHtml()
.openInBrowser()
}
```
@@ -0,0 +1,65 @@
# Known limitations and workarounds
### Compiler plugin in lambdas with receiver marked as DslMarker
Problem: Property calls on a dataframe type created inside @Composable `Column { }` lambda cannot be resolved.
[Issue 1604](https://github.com/Kotlin/dataframe/issues/1604)
The lambda of `Column` has a receiver parameter `content: @Composable ColumnScope.() -> Unit`.
Here's the declaration from Compose. Receiver parameter types with annotations similar to this one will conflict with the plugin.
```kotlin
@LayoutScopeMarker
interface ColumnScope
@DslMarker
annotation class LayoutScopeMarker
```
Repro: The snippet below shows a dataframe variable initialized with a local DataFrame type inside a `Column` lambda. `ageComposableLambdaScope` cannot be resolved.
```kotlin
@DataSchema
data class Person(val age: Int, val name: String)
@Composable
fun DataFrameScreen(df: DataFrame<Person>) {
Column {
val filteredDf = remember(df) {
df
.add("ageComposableLambdaScope") { age }
.filter { ageComposableLambdaScope >= 20 }
}
filteredDf.ageComposableLambdaScope // error
}
}
```
Error message:
```
val ColumnsScope<Person_59I>.ageComposableLambdaScope: Int'
cannot be called in this context with an implicit receiver.
Use an explicit receiver if necessary
```
Workaround:
Initialize your dataframe properties outside lambdas with DslMarker receiver parameters.
```kotlin
@DataSchema
data class Person(val age: Int, val name: String)
@Composable
fun DataFrameScreen(df: DataFrame<Person>) {
val filteredDf = remember(df) {
df
.add("ageValidScope") { age }
.filter { ageValidScope >= 20 }
}
filteredDf.ageValidScope // OK
Column {
Text(filteredDf.ageValidScope.toString())
}
}
```
+106
View File
@@ -0,0 +1,106 @@
[//]: # (title: concat)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Returns a [`DataFrame`](DataFrame.md) with the union of rows from several given [`DataFrame`](DataFrame.md) objects.
**Related operations**: [](multipleDataFrames.md)
`concat` is available for:
[`DataFrame`](DataFrame.md):
<!---FUN concatDataFrames-->
```kotlin
df.concat(df1, df2)
```
<!---END-->
[`DataColumn`](DataColumn.md):
<!---FUN concatColumns-->
```kotlin
val a by columnOf(1, 2)
val b by columnOf(3, 4)
a.concat(b)
```
<!---END-->
`Iterable<DataFrame>`:
<!---FUN concatIterable-->
```kotlin
listOf(df1, df2).concat()
```
<!---END-->
`Iterable<DataRow>`:
<!---FUN concatRows-->
```kotlin
val rows = listOf(df[2], df[4], df[5])
rows.concat()
```
<!---END-->
`Iterable<DataColumn>`:
<!---FUN concatColumnsIterable-->
```kotlin
val a by columnOf(1, 2)
val b by columnOf(3, 4)
listOf(a, b).concat()
```
<!---END-->
[`groupBy`](groupBy.md#transformation):
<!---FUN concatGroupBy-->
```kotlin
df.groupBy { name }.concat()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.concatGroupBy.html" width="100%"/>
<!---END-->
[`FrameColumn`](DataColumn.md#framecolumn):
<!---FUN concatFrameColumn-->
```kotlin
val x = dataFrameOf("a", "b")(
1, 2,
3, 4,
)
val y = dataFrameOf("b", "c")(
5, 6,
7, 8,
)
val frameColumn by columnOf(x, y)
frameColumn.concat()
```
<!---END-->
If you want to take the union of columns (not rows) from several [`DataFrame`](DataFrame.md) objects, see [`add`](add.md).
## Schema unification
If input [`DataFrame`](DataFrame.md) objects have different schemas, every column in the resulting [`DataFrame`](DataFrame.md)
will get the lowest common type of the original columns with the same name.
For example, if one [`DataFrame`](DataFrame.md) has a column `A: Int` and another [`DataFrame`](DataFrame.md) has a column `A: Double`,
the resulting [`DataFrame`](DataFrame.md) will have a column `A: Number`.
Missing columns in [`DataFrame`](DataFrame.md) objects will be filled with `null`.
+23
View File
@@ -0,0 +1,23 @@
[//]: # (title: concat)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Returns [`DataFrame`](DataFrame.md) with the union of rows from several given [`DataFrame`](DataFrame.md) objects.
<!---FUN concatDataFrames-->
```kotlin
df.concat(df1, df2)
```
<!---END-->
<!---FUN concatIterable-->
```kotlin
listOf(df1, df2).concat()
```
<!---END-->
See [all use cases of 'concat' operation](concat.md).
@@ -0,0 +1,41 @@
[//]: # (title: DataColumn)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Create-->
[`DataColumn`](DataColumn.md) represents a column of values.
It can store objects of primitive or reference types,
or other [`DataFrame`](DataFrame.md) objects.
See [how to create columns](createColumn.md)
### Properties
* `name: String` — name of the column; should be unique within containing dataframe
* `path: ColumnPath` — path to the column; depends on the way column was retrieved from dataframe
* `type: KType` — type of elements in the column
* `hasNulls: Boolean` — flag indicating whether column contains `null` values
* `values: Iterable<T>` — column data
* `size: Int` — number of elements in the column
### Column kinds
[`DataColumn`](DataColumn.md) instances can be one of three subtypes: `ValueColumn`, [`ColumnGroup`](DataColumn.md#columngroup) or [`FrameColumn`](DataColumn.md#framecolumn)
#### ValueColumn
Represents a sequence of values.
It can store values of primitive (integers, strings, decimals, etc.) or reference types.
Currently, it uses [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/) as underlying data storage.
#### ColumnGroup
Container for nested columns. Used to create column hierarchy.
You can create column groups using the group operation or by splitting inward — see [group](group.md) and [split](split.md) for details.
#### FrameColumn
Special case of [`ValueColumn`](#valuecolumn) that stores another [`DataFrame`](DataFrame.md) objects as elements.
[`DataFrame`](DataFrame.md) stored in [`FrameColumn`](DataColumn.md#framecolumn) may have different schemas.
[`FrameColumn`](DataColumn.md#framecolumn) may appear after [reading](read.md) from JSON or other hierarchical data structures, or after grouping operations such as [groupBy](groupBy.md) or [pivot](pivot.md).
@@ -0,0 +1,14 @@
[//]: # (title: DataFrame)
[`DataFrame`](DataFrame.md) represents a list of [`DataColumn`](DataColumn.md).
Columns in [`DataFrame`](DataFrame.md) must have equal size and unique names.
**Learn how to:**
- [Create DataFrame](createDataFrame.md)
- [Read DataFrame](read.md)
- [Get an overview of DataFrame](info.md)
- [Access data in DataFrame](access.md)
- [Modify data in DataFrame](modify.md)
- [Compute statistics for DataFrame](summaryStatistics.md)
- [Combine several DataFrame objects](multipleDataFrames.md)
+103
View File
@@ -0,0 +1,103 @@
[//]: # (title: DataRow)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.DataRowApi-->
`DataRow` represents a single record, one piece of data within a [`DataFrame`](DataFrame.md)
## Row functions
<snippet id="rowFunctions">
* `index(): Int` — sequential row number in [`DataFrame`](DataFrame.md), starts from 0
* `prev(): DataRow?` — previous row (`null` for the first row)
* `next(): DataRow?` — next row (`null` for the last row)
* `diff(T) { rowExpression }: T / diffOrNull { rowExpression }: T?` — difference between the results of a [row expression](DataRow.md#row-expressions) calculated for current and previous rows
* `explode(columns): DataFrame<T>` — spread lists and [`DataFrame`](DataFrame.md) objects vertically into new rows
* `values(): List<Any?>` — list of all cell values from the current row
* `valuesOf<T>(): List<T>` — list of values of the given type
* `columnsCount(): Int` — number of columns
* `columnNames(): List<String>` — list of all column names
* `columnTypes(): List<KType>` — list of all column types
* `namedValues(): List<NameValuePair<Any?>>` — list of name-value pairs where `name` is a column name and `value` is cell value
* `namedValuesOf<T>(): List<NameValuePair<T>>` — list of name-value pairs where value has given type
* `transpose(): DataFrame<NameValuePair<*>>` — [`DataFrame`](DataFrame.md) of two columns: `name: String` is column names and `value: Any?` is cell values
* `transposeTo<T>(): DataFrame<NameValuePair<T>>`— [`DataFrame`](DataFrame.md) of two columns: `name: String` is column names and `value: T` is cell values
* `getRow(Int): DataRow` — row from [`DataFrame`](DataFrame.md) by row index
* `getRows(Iterable<Int>): DataFrame` — [`DataFrame`](DataFrame.md) with subset of rows selected by absolute row index.
* `relative(Iterable<Int>): DataFrame` — [`DataFrame`](DataFrame.md) with subset of rows selected by relative row index: `relative(-1..1)` will return previous, current and next row. Requested indices will be coerced to the valid range and invalid indices will be skipped
* `getValue<T>(columnName)` — cell value of type `T` by this row and given `columnName`
* `getValueOrNull<T>(columnName)` — cell value of type `T?` by this row and given `columnName` or `null` if there's no such column
* `get(column): T` — cell value by this row and given `column`
* `String.invoke<T>(): T` — cell value of type `T` by this row and given `this` column name
* `ColumnPath.invoke<T>(): T` — cell value of type `T` by this row and given `this` column path
* `ColumnReference.invoke(): T` — cell value of type `T` by this row and given `this` column
* `df()` — [`DataFrame`](DataFrame.md) that current row belongs to
</snippet>
## Row expressions
Row expressions provide a value for every row of [`DataFrame`](DataFrame.md) and are used in [add](add.md), [filter](filter.md), [forEach](iterate.md), [update](update.md) and other operations.
<!---FUN expressions-->
```kotlin
// Row expression computes values for a new column
df.add("fullName") { name.firstName + " " + name.lastName }
// Row expression computes updated values
df.update { weight }.at(1, 3, 4).with { prev()?.weight }
// Row expression computes cell content for values of pivoted column
df.pivot { city }.with { name.lastName.uppercase() }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.DataRowApi.expressions.html" width="100%"/>
<!---END-->
Row expression signature: ```DataRow.(DataRow) -> T```. Row values can be accessed with or without ```it``` keyword. Implicit and explicit argument represent the same `DataRow` object.
## Row conditions
Row condition is a special case of [row expression](#row-expressions) that returns `Boolean`.
<!---FUN conditions-->
```kotlin
// Row condition is used to filter rows by index
df.filter { index() % 5 == 0 }
// Row condition is used to drop rows where `age` is the same as in the previous row
df.drop { diffOrNull { age } == 0 }
// Row condition is used to filter rows for value update
df.update { weight }.where { index() > 4 && city != "Paris" }.with { 50 }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.DataRowApi.conditions.html" width="100%"/>
<!---END-->
Row condition signature: ```DataRow.(DataRow) -> Boolean```
## Row statistics
<snippet id="rowStatistics">
The following [statistics](summaryStatistics.md) are available for `DataRow`:
* `rowSum`
* `rowMean`
* `rowStd`
These statistics will be applied only to values of appropriate types, and incompatible values will be ignored.
For example, if a [dataframe](DataFrame.md) has columns of types `String` and `Int`,
`rowSum()` will compute the sum of the `Int` values in the row and ignore `String` values.
To apply statistics only to values of a particular type use `-Of` versions:
* `rowSumOf<T>`
* `rowMeanOf<T>`
* `rowStdOf<T>`
* `rowMinOf<T>`
* `rowMaxOf<T>`
* `rowMedianOf<T>`
* `rowPercentileOf<T>`
</snippet>
+117
View File
@@ -0,0 +1,117 @@
[//]: # (title: Access APIs)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.ApiLevels-->
By nature, dataframes are dynamic objects;
column labels depend on the input source and new columns can be added
or deleted while wrangling.
Kotlin, in contrast, is a statically typed language where all types are defined and verified
ahead of execution.
That's why creating a flexible, handy, and, at the same time, safe API to a dataframe is tricky.
In the Kotlin DataFrame library, we provide two different ways to access columns
## List of Access APIs
Here's a list of all APIs in order of increasing safety.
* **String API** <br/>
Columns are accessed by `string` representing their name. Type-checking is done at runtime, name-checking too.
* [**Extension Properties API**](extensionPropertiesApi.md) <br/>
Extension access properties are generated based on the dataframe schema. The name and type of properties are inferred
from the name and type of the corresponding columns.
## Example
Here's an example of how the same operations can be performed via different Access APIs:
<note>
In the most of the code snippets in this documentation there's a tab selector that allows switching across Access APIs.
</note>
<tabs>
<tab title="String API">
<!---FUN strings-->
```kotlin
DataFrame.read("titanic.csv")
.add("lastName") { "name"<String>().split(",").last() }
.dropNulls("age")
.filter {
"survived"<Boolean>() &&
"home"<String>().endsWith("NY") &&
"age"<Int>() in 10..20
}
```
<!---END-->
</tab>
<tab title = "Extension Properties API">
<!---FUN extensionProperties1-->
```kotlin
val df /* : AnyFrame */ = DataFrame.read("titanic.csv")
```
<!---END-->
<!---FUN extensionProperties2-->
```kotlin
df.add("lastName") { name.split(",").last() }
.dropNulls { age }
.filter { survived && home.endsWith("NY") && age in 10..20 }
```
<!---END-->
</tab>
</tabs>
The `titanic.csv` file can be found [here](https://github.com/Kotlin/dataframe/blob/master/data/titanic.csv).
# Comparing APIs
The String API is the simplest and unsafest of them all. The main advantage of it is that it can be
used at any time, including when accessing new columns in chain calls. So we can write something like:
```kotlin
df.add("weight") { ... } // add a new column `weight`, calculated by some expression
.sortBy("weight") // sorting dataframe rows by its value
```
In contrast, generated [extension properties](extensionPropertiesApi.md) form the most convenient and the safest API.
Using them, you can always be sure that you work with correct data and types.
However, there's a bottleneck at the moment of generation.
To get new extension properties, you have to run a cell in a notebook,
which could lead to unnecessary variable declarations.
Currently, we are working on a compiler plugin that generates these properties on the fly while typing!
<table>
<tr>
<td> API </td>
<td> Type-checking </td>
<td> Column names checking </td>
<td> Column existence checking </td>
</tr>
<tr>
<td> String API </td>
<td> Runtime </td>
<td> Runtime </td>
<td> Runtime </td>
</tr>
<tr>
<td> Extension Properties API </td>
<td> Generation-time </td>
<td> Generation-time </td>
<td> Generation-time </td>
</tr>
</table>
+62
View File
@@ -0,0 +1,62 @@
# Concepts And Principles
<web-summary>
Learn what Kotlin DataFrame is about — its core concepts, design principles, and usage philosophy.
</web-summary>
<card-summary>
Discover the fundamentals of the library —
understand key concepts, motivation, and the overall structure of the library.
</card-summary>
<link-summary>
Explore the fundamentals of Kotlin DataFrame —
understand key concepts, motivation, and the overall structure of the library.
</link-summary>
<show-structure depth="3"/>
## What is a dataframe
A *dataframe* is an abstraction for working with structured data.
Essentially, its a 2-dimensional table with labeled columns of potentially different types.
You can think of it like a spreadsheet or SQL table, or a dictionary of series objects.
The handiness of this abstraction is not in the table itself but in a set of operations defined on it.
The Kotlin DataFrame library is an idiomatic Kotlin DSL defining such operations.
The process of working with dataframe is often called *data wrangling* which
is the process of transforming and mapping data from one "raw" data form into another format
that is more appropriate for analytics and visualization.
The goal of data wrangling is to ensure quality and useful data.
## Main Features and Concepts
* [**Hierarchical**](hierarchical.md) — the Kotlin DataFrame library provides an ability to read and present data from different sources,
including not only plain **CSV** but also **JSON** or **[SQL databases](readSqlDatabases.md)**.
This is why it was designed to be hierarchical and allows nesting of columns and cells.
* **Functional** — the data processing pipeline is organized in a chain of [`DataFrame`](DataFrame.md) transformation operations.
* **Immutable** — every operation returns a new instance of [`DataFrame`](DataFrame.md) reusing underlying storage wherever it's possible.
* **Readable** — data transformation operations are defined in DSL close to natural language.
* **Practical** — provides simple solutions for common problems and the ability to perform complex tasks.
* **Minimalistic** — simple, yet powerful data model of three [column kinds](DataColumn.md#column-kinds).
* [**Interoperable**](collectionsInterop.md) — convertable with Kotlin data classes and collections.
This also means conversion to/from other libraries' data structures is usually quite straightforward!
See our [examples](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources)
for some conversions between DataFrame and [Apache Spark](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/spark), [Multik](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/multik), and [JetBrains Exposed](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/exposed).
* **Generic** — can store objects of any type, not only numbers or strings.
* **Typesafe** — the Kotlin DataFrame library provides a mechanism of on-the-fly [**generation of extension properties**](extensionPropertiesApi.md)
that correspond to the columns of a dataframe.
In interactive notebooks like Jupyter or Datalore, the generation runs after each cell execution.
In IntelliJ IDEA there's a Gradle plugin for generation properties based on CSV file or JSON file.
Also, were working on a compiler plugin that infers and transforms [`DataFrame`](DataFrame.md) schema while typing.
You can now clone this [project with many examples](https://github.com/koperagen/df-plugin-demo) showcasing how it allows you to reliably use our most convenient extension properties API.
The generated properties ensure youll never misspell column name and dont mess up with its type, and of course nullability is also preserved.
* [**Polymorphic**](schemas.md) —
if all columns of a [`DataFrame`](DataFrame.md) instance are presented in another dataframe,
then the first one will be seen as a superclass for the latter.
This means you can define a function on an interface with some set of columns
and then execute it safely on any [`DataFrame`](DataFrame.md) which contains this same set of columns.
In notebooks, this works out-of-the-box.
In ordinary projects, this requires casting (for now).
@@ -0,0 +1,20 @@
[//]: # (title: Hierarchical data structures)
[`DataFrame`](DataFrame.md) can represent hierarchical data structures using two special types of columns:
* [`ColumnGroup`](DataColumn.md#columngroup) is a group of [columns](DataColumn.md)
* [`FrameColumn`](DataColumn.md#framecolumn) is a column of [dataframes](DataFrame.md)
You can read [`DataFrame`](DataFrame.md) [from json](read.md#read-from-json) or [from in-memory object graph](createDataFrame.md#todataframe) preserving original tree structure.
Hierarchical columns can also appear as a result of some [modification operations](modify.md):
* [group](group.md) produces [`ColumnGroup`](DataColumn.md#columngroup)
* [groupBy](groupBy.md) produces [`FrameColumn`](DataColumn.md#framecolumn)
* [pivot](pivot.md) may produce [`FrameColumn`](DataColumn.md#framecolumn)
* [split](split.md) of [`FrameColumn`](DataColumn.md#framecolumn) will produce several [`ColumnGroup`](DataColumn.md#columngroup)
* [implode](implode.md) converts [`ColumnGroup`](DataColumn.md#columngroup) into [`FrameColumn`](DataColumn.md#framecolumn)
* [explode](explode.md) converts [`FrameColumn`](DataColumn.md#framecolumn) into [`ColumnGroup`](DataColumn.md#columngroup)
* [merge](merge.md) converts [`ColumnGroup`](DataColumn.md#columngroup) into [`FrameColumn`](DataColumn.md#framecolumn)
* etc.
Operations in the navigation tree are grouped such that you can find operations and their respective inverse together, like `group` and `ungroup`. This allows you to quickly find out how to simplify any hierarchical structure you come across.
+24
View File
@@ -0,0 +1,24 @@
[//]: # (title: NaN and NA)
Using the Kotlin DataFrame library, you might come across the terms `NaN` and `NA`.
This page explains what they mean and how to work with them.
## NaN
`Float` or `Double` values can be represented as `NaN`,
in cases where a mathematical operation is undefined, such as for dividing by zero. The
result of such an operation can only be described as "**N**ot **a** **N**umber".
This is different from `null`, which means that a value is missing and, in Kotlin, can only occur
for `Float?` and `Double?` types.
You can use [fillNaNs](fill.md#fillnans) to replace `NaNs` in certain columns with a given value or expression
or [dropNaNs](drop.md#dropnans) to drop rows with `NaNs` in them.
## NA
`NA` in Dataframe can be seen as: [`NaN`](#nan) or `null`. Which is another way to say that the value
is "**N**ot **A**vailable".
You can use [fillNA](fill.md#fillna) to replace `NAs` in certain columns with a given value or expression
or [dropNA](drop.md#dropna) to drop rows with `NAs` in them.
@@ -0,0 +1,46 @@
[//]: # (title: Number Unification)
Unifying numbers means converting them to a common number type without losing information.
This is currently an internal part of the library,
but its logic implementation can be encountered in multiple places, such as
[statistics](summaryStatistics.md), and [reading JSON](read.md#read-from-json).
The following graph shows the hierarchy of number types in Kotlin DataFrame.
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.documentation.UnifyingNumbers.Graph.html" width="100%"/>
The order is top-down from the most complex type to the simplest one.
For each number type in the graph, it holds that a number of that type can be expressed lossless by
a number of a more complex type (any of its parents).
This is either because the more complex type has a larger range or higher precision (in terms of bits).
Nullability, while not displayed in the graph, is also taken into account.
This means that `Int?` and `Float` will be unified to `Double?`.
`Nothing` is at the bottom of the graph and is the starting point in unification.
This can be interpreted as "no type" and can have no instance, while `Nothing?` can only be `null`.
> There may be parts of the library that "unify" numbers, such as [`readCsv`](read.md#column-type-inference-from-csv),
> or [`readExcel`](read.md#read-from-excel).
> However, because they rely on another library (like [Deephaven CSV](https://github.com/deephaven/deephaven-csv))
> this may behave slightly differently.
### Unified Number Type Options
There are variants of this graph that exclude some types, such as `BigDecimal` and `BigInteger`, or
allow some slightly lossy conversions, like from `Long` to `Double`.
This follows either `UnifiedNumberTypeOptions.PRIMITIVES_ONLY` or
`UnifiedNumberTypeOptions.DEFAULT`.
For `PRIMITIVES_ONLY`, used by [statistics](summaryStatistics.md), big numbers are excluded from the graph.
Additionally, `Double` is considered the most complex type,
meaning `Long`/`ULong` and `Double` can be joined to `Double`,
potentially losing a little precision(!).
For `DEFAULT`, used by [`readJson`](read.md#read-from-json), big numbers can appear.
`BigDecimal` is considered the most complex type, meaning that `Long`/`ULong` and `Double` will be joined
to `BigDecimal` instead.
@@ -0,0 +1,25 @@
# Spelling Conventions
<web-summary>
Clarifies naming conventions used in Kotlin DataFrame documentation for the library, data format, and Kotlin type.
</web-summary>
<card-summary>
Understand how to distinguish between "Kotlin DataFrame", "dataframe", and `DataFrame` in the documentation.
</card-summary>
<link-summary>
Spelling and naming rules for using "Kotlin DataFrame", "dataframe", and `DataFrame` properly.
</link-summary>
While reading Kotlin DataFrame documentation, you may come across several similar terms referring to different concepts:
* **Kotlin DataFrame** (or just "DataFrame") — the name of the official library.
* *dataframe* — a general term for data in a tabular (frame) format.
* [`DataFrame`](DataFrame.md) — a Kotlin type or its instance that represents a wrapper around a dataframe.
Heres a correct usage example:
```markdown
Kotlin DataFrame allows you to read a dataframe from a CSV file into a `DataFrame`.
```
+5
View File
@@ -0,0 +1,5 @@
[//]: # (title: Data Abstractions)
* [`DataColumn`](DataColumn.md) is a named, typed and ordered collection of elements
* [`DataFrame`](DataFrame.md) consists of one or several [`DataColumns`](DataColumn.md) with unique names and equal size
* [`DataRow`](DataRow.md) is a single row of [`DataFrame`](DataFrame.md) and provides a single value for every [`DataColumn`](DataColumn.md)
+130
View File
@@ -0,0 +1,130 @@
[//]: # (title: convert)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Returns [`DataFrame`](DataFrame.md) with changed values in some columns. Allows changing column types.
```text
convert { columnsSelector }
.with { rowExpression } | .asFrame { frameExpression } | .perRowCol { rowColExpression } | to<Type>() | to { colExpression }
rowExpression = DataRow.(OldValue) -> NewValue
rowColExpression = (DataRow, DataColumn) -> NewValue
colExpression = DataFrame.(DataColumn) -> DataColumn
frameExpression: DataFrame.(DataFrame) -> DataFrame
```
**Related operations**: [](updateConvert.md)
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation and
[row expressions](DataRow.md#row-expressions) for how to provide new values.
<!---FUN convert-->
```kotlin
df.convert { age }.with { it.toDouble() }
df.convert { colsAtAnyDepth().colsOf<String>() }.with { it.toCharArray().toList() }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.convert.html" width="100%"/>
<!---END-->
ColumnGroup can be converted using DataFrame API, for example:
<!---FUN convertAsFrame-->
```kotlin
df.convert { name }.asFrame { it.add("fullName") { "$firstName $lastName" } }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.convertAsFrame.html" width="100%"/>
<!---END-->
Similar to `replace with` operation,
columns can be converted in a compiler plugin-friendly fashion
whenever you need to perform an operation on the entire column without changing its name.
For example, parallel reading.
<!---FUN convertAsColumn-->
```kotlin
df.convert { name }.asColumn { col ->
col.toList().parallelStream().map { it.toString() }.collect(Collectors.toList()).toColumn()
}
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.convertAsColumn.html" width="100%"/>
<!---END-->
`convert {}.to<>()` supports automatic type conversions between the following types:
* `String`, `Char` (uses [`parse`](parse.md) to convert from `String` to other types)
* `Boolean`
* `Byte`
* `Short`
* `Int` (and `Char`)
* `Long`
* `Float`
* `Double` (See [parsing doubles](parse.md#parsing-doubles) for `String` to `Double` conversion)
* `BigDecimal`
* `BigInteger`
* `LocalDateTime` (kotlinx.datetime and java.time)
* `LocalDate` (kotlinx.datetime and java.time)
* `LocalTime` (kotlinx.datetime and java.time)
* `Instant` (kotlinx.datetime, kotlin.time, and java.time)
* `enum` classes (by name)
> Note that converting between `Char` and `Int` is done by UTF-16 character code.
> This means the `Char` `'1'` becomes the `Int` `49`.
> To convert `Char -> Int` the way it is written, use `parse()` instead, or,
> in either case, use `String` as intermediary type.
> {style="warning"}
If you want to convert `Char` `'1'` to the `Int` `1`, use [parse()](parse.md) instead, or use `String`
as intermediate type.
<!---FUN convertTo-->
```kotlin
df.convert { age }.to<Double>()
df.convert { colsOf<Number>() }.to<String>()
df.convert { name.firstName and name.lastName }.asColumn { col -> col.map { it.length } }
df.convert { weight }.toFloat()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.convertTo.html" width="100%"/>
<!---END-->
Automatic conversion from `String` to [enum classes](https://kotlinlang.org/docs/enum-classes.html#enum-classes.md)
is also supported:
```kotlin
enum class Direction { NORTH, SOUTH, WEST, EAST }
```
<!---FUN convertToEnum-->
```kotlin
dataFrameOf("direction")("NORTH", "WEST")
.convert("direction").to<Direction>()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.convertToEnum.html" width="100%"/>
<!---END-->
And finally, [Value classes](https://kotlinlang.org/docs/inline-classes.html) can be used with `convert` too.
Both as conversion source and target:
```kotlin
@JvmInline
value class IntClass(val value: Int)
```
<!---FUN convertToValueClass-->
```kotlin
dataFrameOf("value")("1", "2") // note that values are strings; conversion is done automatically
.convert("value").to<IntClass>()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.convertToValueClass.html" width="100%"/>
<!---END-->
+57
View File
@@ -0,0 +1,57 @@
[//]: # (title: convertTo)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
[Converts](convert.md) all columns in the [`DataFrame`](DataFrame.md) to match a given schema [`Schema`](schema.md).
```kotlin
convertTo<Schema>(excessiveColumns = ExcessiveColumns.Keep)
```
**Related operations**: [](adjustSchema.md), [](convert.md)
Conversion to match the target schema is done mostly automatically;
DataFrame knows how to convert between many types (see [](convert.md) for details and the supported types).
However, if you have a custom type in your target schema, or the automatic conversion fails,
you can provide a custom converter, parser, or filler for it.
These have priority over the automatic ones.
Customization DSL:
* `convert<A>.with { it.toB() }`
* Provides `convertTo<>()` with the knowledge of how to convert `A` to `B`
* `parser { YourType.fromString(it) }`
* Provides `convertTo<>()` with the knowledge of how to parse strings/chars into `YourType`
* Shortcut for `convert<String>().with { YourType.fromString(it) }`
* Chars are treated as strings unless you explicitly specify `convert<Char>().with { YourType.fromChar(it) }`
* `fill { some cols }.with { rowExpression }`
* Makes `convertTo<>()` fill missing (or existing) columns from the target schema
with values computed by the given row expression
<!---FUN customConvertersData-->
```kotlin
class MyType(val value: Int)
@DataSchema
class MySchema(val a: MyType, val b: MyType, val c: Int)
```
<!---END-->
<!---FUN customConverters-->
```kotlin
val df: AnyFrame = dataFrameOf(
"a" to columnOf(1, 2, 3),
"b" to columnOf("1", "2", "3"),
)
df.convertTo<MySchema> {
// providing the converter: Int -> MyType, so column `a` can be converted
convert<Int>().with { MyType(it) }
// providing the parser: String -> MyType, so column `b` can be converted
parser { MyType(it.toInt()) }
// providing the filler for `c`, as it's missing in `df`
fill { c }.with { a.value + b.value }
}
```
<!---END-->
+34
View File
@@ -0,0 +1,34 @@
[//]: # (title: corr)
Returns [`DataFrame`](DataFrame.md) with the pairwise correlation between two sets of columns.
It computes the [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient).
```kotlin
corr { columns1 }
.with { columns2 } | .withItself()
```
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
To compute pairwise correlation between all columns in the [`DataFrame`](DataFrame.md) use `corr` without arguments:
```kotlin
corr()
```
The function is available for numeric- and `Boolean` columns.
`Boolean` values are converted into `1` for `true` and `0` for `false`.
All other columns are ignored.
If a [`ColumnGroup`](DataColumn.md#columngroup) instance is passed as the target column for correlation,
it will be unpacked into suitable nested columns.
The resulting [`DataFrame`](DataFrame.md) will have `n1` rows and `n2+1` columns,
where `n1` and `n2` are the number of columns in `columns1` and `columns2` correspondingly.
The first column will have the name "column" and will contain names of columns in `column1`.
Other columns will have the same names as in `columns2` and will contain the computed correlation coefficients.
If exactly one [`ColumnGroup`](DataColumn.md#columngroup) is passed in `columns1`,
the first column in the output will have its name.
+37
View File
@@ -0,0 +1,37 @@
[//]: # (title: count)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Analyze-->
Counts the number of rows.
<!---FUN count-->
```kotlin
df.count()
```
<!---END-->
Pass a [row condition](DataRow.md#row-conditions) to count only the number of rows that satisfy that condition:
<!---FUN countCondition-->
```kotlin
df.count { age > 15 }
```
<!---END-->
When `count` is used in [`groupBy`](groupBy.md#aggregation) or [`pivot`](pivot.md#aggregation) aggregations,
it counts rows for every data group:
<!---FUN countAggregation-->
```kotlin
df.groupBy { city }.count()
df.pivot { city }.count { age > 18 }
df.pivot { name.firstName }.groupBy { name.lastName }.count()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.countAggregation.html" width="100%"/>
<!---END-->
+33
View File
@@ -0,0 +1,33 @@
[//]: # (title: countDistinct)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
Returns number of distinct combinations of values in selected columns of [`DataFrame`](DataFrame.md).
<!---FUN countDistinctColumns-->
<tabs>
<tab title="Properties">
```kotlin
df.countDistinct { age and name }
```
</tab>
<tab title="Strings">
```kotlin
df.countDistinct("age", "name")
```
</tab></tabs>
<!---END-->
When `columns` are not specified, returns number of distinct rows in [`DataFrame`](DataFrame.md).
<!---FUN countDistinct-->
```kotlin
df.countDistinct()
```
<!---END-->
+10
View File
@@ -0,0 +1,10 @@
[//]: # (title: Create)
<show-structure depth="3"/>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Create-->
There are several ways to create [`DataFrame`](DataFrame.md) objects from data that is already loaded into memory:
* [create columns with data](createColumn.md) and then [bundle them](createDataFrame.md) into a [`DataFrame`](DataFrame.md)
* create and initialize [`DataFrame`](DataFrame.md) directly from values using `vararg` variants of the [corresponding functions](createDataFrame.md).
* [convert Kotlin objects](createDataFrame.md#todataframe) into [`DataFrame`](DataFrame.md)
To learn how to read dataframes from files and URLs, go to the [next section](read.md).
+102
View File
@@ -0,0 +1,102 @@
[//]: # (title: Create DataColumn)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Create-->
This section describes ways to create a [`DataColumn`](DataColumn.md).
### columnOf
Returns new column with the given elements.
The column [`type`](DataColumn.md#properties) is deduced from the compile-time type of the elements inside.
The column [`name`](DataColumn.md#properties) is taken from the name of the variable.
<!---FUN createValueByColumnOf-->
```kotlin
// Create ValueColumn with name 'student' and two elements of type String
val student by columnOf("Alice", "Bob")
```
<!---END-->
To assign column name explicitly, use the `named` infix function and replace `by` with `=`.
<!---FUN createColumnRenamed-->
```kotlin
val column = columnOf("Alice", "Bob") named "student"
```
<!---END-->
When column elements are columns themselves, it returns a [`ColumnGroup`](DataColumn.md#columngroup):
<!---FUN createColumnGroup-->
```kotlin
val firstName by columnOf("Alice", "Bob")
val lastName by columnOf("Cooper", "Marley")
// Create ColumnGroup with two nested columns
val fullName by columnOf(firstName, lastName)
```
<!---END-->
When column elements are [`DataFrame`](DataFrame.md) objects it returns a [`FrameColumn`](DataColumn.md#framecolumn):
<!---FUN createFrameColumn-->
```kotlin
val df1 = dataFrameOf("name", "age")("Alice", 20, "Bob", 25)
val df2 = dataFrameOf("name", "temp")("Charlie", 36.6)
// Create FrameColumn with two elements of type DataFrame
val frames by columnOf(df1, df2)
```
<!---END-->
### toColumn
Converts an `Iterable` of values into a column.
<!---FUN createValueByToColumn-->
```kotlin
listOf("Alice", "Bob").toColumn("name")
```
<!---END-->
To compute a column type at runtime by scanning through the actual values, enable the `Infer.Type` option.
To inspect values only for nullability, enable the `Infer.Nulls` option.
<!---FUN createValueColumnInferred-->
```kotlin
val values: List<Any?> = listOf(1, 2.5)
values.toColumn("data") // type: Any?
values.toColumn("data", Infer.Type) // type: Number
values.toColumn("data", Infer.Nulls) // type: Any
```
<!---END-->
### toColumnOf
Converts an [`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/)
of values into a column of a given type:
<!---FUN createValueColumnOfType-->
```kotlin
val values: List<Any?> = listOf(1, 2.5)
values.toColumnOf<Number?>("data") // type: Number?
```
<!---END-->
+325
View File
@@ -0,0 +1,325 @@
[//]: # (title: Create DataFrame)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Create-->
This section describes ways to create a [`DataFrame`](DataFrame.md) instance.
### emptyDataFrame
Returns a [`DataFrame`](DataFrame.md) with no rows and no columns.
<!---FUN createEmptyDataFrame-->
```kotlin
val df = emptyDataFrame<Any>()
```
<!---END-->
### dataFrameOf
<!---FUN createDataFrameOfPairs-->
```kotlin
// DataFrame with 2 columns and 3 rows
val df = dataFrameOf(
"name" to listOf("Alice", "Bob", "Charlie"),
"age" to listOf(15, 20, 100),
)
```
<!---END-->
Create DataFrame with nested columns inplace:
<!---FUN createNestedDataFrameInplace-->
```kotlin
// DataFrame with 2 columns and 3 rows
val df = dataFrameOf(
"name" to columnOf(
"firstName" to columnOf("Alice", "Bob", "Charlie"),
"lastName" to columnOf("Cooper", "Dylan", "Daniels"),
),
"age" to columnOf(15, 20, 100),
)
```
<!---END-->
<!---FUN createDataFrameFromColumns-->
```kotlin
// DataFrame with 2 columns
val df = dataFrameOf(
"name" to columnOf("Alice", "Bob", "Charlie"),
"age" to columnOf(15, 20, 22)
)
```
<!---END-->
Returns a [`DataFrame`](DataFrame.md) with given column names and values.
<!---FUN createDataFrameOf-->
```kotlin
// DataFrame with 2 columns and 3 rows
val df = dataFrameOf("name", "age")(
"Alice", 15,
"Bob", 20,
"Charlie", 100,
)
```
<!---END-->
### toDataFrame
#### `DataFrame` from `Map<String, List<*>>`:
<!---FUN createDataFrameFromMap-->
```kotlin
val map = mapOf("name" to listOf("Alice", "Bob", "Charlie"), "age" to listOf(15, 20, 22))
// DataFrame with 2 columns
map.toDataFrame()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Create.createDataFrameFromMap.html" width="100%"/>
<!---END-->
#### `DataFrame` from random data:
Use `IntRange` to generate rows filled with random values:
<!---FUN createRandomDataFrame-->
```kotlin
val categories = listOf("Electronics", "Books", "Clothing")
// DataFrame with 4 columns and 7 rows
(0 until 7).toDataFrame {
"productId" from { "P${1000 + it}" }
"category" from { categories.random() }
"price" from { Random.nextDouble(10.0, 500.0) }
"inStock" from { Random.nextInt(0..100) }
}
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Create.createRandomDataFrame.html" width="100%"/>
<!---END-->
Generate DataFrame with nested ColumnGroup and FrameColumn:
<!---FUN createNestedRandomDataFrame-->
```kotlin
val categories = listOf("Electronics", "Books", "Clothing")
// DataFrame with 5 columns and 7 rows
(0 until 7).toDataFrame {
"productId" from { "P${1000 + it}" }
"category" from { categories.random() }
"price" from { Random.nextDouble(10.0, 500.0) }
// Column Group
"manufacturer" {
"country" from { listOf("USA", "China", "Germany", "Japan").random() }
"yearEstablished" from { Random.nextInt(1950..2020) }
}
// Frame Column
"reviews" from {
val reviewCount = Random.nextInt(0..7)
(0 until reviewCount).toDataFrame {
val ratings: DataColumn<Int> = expr { Random.nextInt(1..5) }
val comments = ratings.map {
when (it) {
5 -> listOf("Amazing quality!", "Best purchase ever!", "Highly recommend!", "Absolutely perfect!")
4 -> listOf("Great product!", "Very satisfied", "Good value for money", "Would buy again")
3 -> listOf("It's okay", "Does the job", "Average quality", "Neither good nor bad")
2 -> listOf("Could be better", "Disappointed", "Not what I expected", "Poor quality")
else -> listOf("Terrible!", "Not worth the price", "Complete waste of money", "Do not buy!")
}.random()
}
"author" from { "User${Random.nextInt(1000..10000)}" }
ratings into "rating"
comments into "comment"
}
}
}
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Create.createNestedRandomDataFrame.html" width="100%"/>
<!---END-->
Use `from` in combination with loops to generate DataFrame:
<!---FUN createDataFrameWithFill-->
```kotlin
// Multiplication table
(1..10).toDataFrame {
(1..10).forEach { x ->
"$x" from { x * it }
}
}
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Create.createDataFrameWithFill.html" width="100%"/>
<!---END-->
#### `DataFrame` from [`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/) of [basic types](https://kotlinlang.org/docs/basic-types.html) (except arrays):
The return type of these overloads is a typed [`DataFrame`](DataFrame.md).
Its data schema defines the column that can be used right after the conversion for additional computations.
<!---FUN readDataFrameFromValues-->
```kotlin
val names = listOf("Alice", "Bob", "Charlie")
// TODO fix with plugin???
val df = names.toDataFrame() as DataFrame<ValueProperty<String>>
df.add("length") { value.length }
```
<!---END-->
#### [`DataFrame`](DataFrame.md) with one column from [`Iterable<T>`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/)
This is an easy way to create a [`DataFrame`](DataFrame.md) when you have a list of Files, URLs, or a structure
you want to extract data from.
In a notebook,
it can be convenient to start from the column of these values to see the number of rows, their `toString` in a table
and then iteratively add columns with the parts of the data you're interested in.
It could be a File's content, a specific section of an HTML document, some metadata, etc.
<!---FUN toDataFrameColumn-->
```kotlin
val files = listOf(File("data.csv"), File("data1.csv"))
val df = files.toDataFrame(columnName = "data")
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Create.toDataFrameColumn.html" width="100%"/>
<!---END-->
#### [`DataFrame`](DataFrame.md) from `List<List<T>>`:
This is useful for parsing text files. For example, the `.srt` subtitle format can be parsed like this:
<!---FUN toDataFrameLists-->
```kotlin
val lines = """
1
00:00:05,000 --> 00:00:07,500
This is the first subtitle.
2
00:00:08,000 --> 00:00:10,250
This is the second subtitle.
""".trimIndent().lines()
lines.chunked(4) { it.take(3) }.toDataFrame(header = listOf("n", "timestamp", "text"))
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Create.toDataFrameLists.html" width="100%"/>
<!---END-->
#### [`DataFrame`](DataFrame.md) from `Iterable<T>`:
<!---FUN readDataFrameFromObject-->
```kotlin
data class Person(val name: String, val age: Int)
val persons = listOf(Person("Alice", 15), Person("Bob", 20), Person("Charlie", 22))
val df = persons.toDataFrame()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Create.readDataFrameFromObject.html" width="100%"/>
<!---END-->
Scans object properties using reflection and creates a [ValueColumn](DataColumn.md#valuecolumn) for every property.
The scope of properties for scanning is defined at compile-time by the formal types of the objects in the [`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/),
so the properties of implementation classes will not be scanned.
Specify the `depth` parameter to perform deep object graph traversal
and convert nested objects into [ColumnGroups](DataColumn.md#columngroup) and [FrameColumns](DataColumn.md#framecolumn):
<!---FUN readDataFrameFromDeepObject-->
```kotlin
data class Name(val firstName: String, val lastName: String)
data class Score(val subject: String, val value: Int)
data class Student(val name: Name, val age: Int, val scores: List<Score>)
val students = listOf(
Student(Name("Alice", "Cooper"), 15, listOf(Score("math", 4), Score("biology", 3))),
Student(Name("Bob", "Marley"), 20, listOf(Score("music", 5))),
)
val df = students.toDataFrame(maxDepth = 1)
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Create.readDataFrameFromDeepObject.html" width="100%"/>
<!---END-->
For detailed control over object graph transformations, use the configuration DSL.
It allows you to exclude particular properties or classes from the object graph traversal,
compute additional columns, and configure column grouping.
<!---FUN readDataFrameFromDeepObjectWithExclude-->
```kotlin
val df = students.toDataFrame {
// add column
"year of birth" from { 2021 - it.age }
// scan all properties
properties(maxDepth = 1) {
exclude(Score::subject) // `subject` property will be skipped from object graph traversal
preserve<Name>() // `Name` objects will be stored as-is without transformation into DataFrame
}
// add column group
"summary" {
"max score" from { it.scores.maxOf { it.value } }
"min score" from { it.scores.minOf { it.value } }
}
}
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Create.readDataFrameFromDeepObjectWithExclude.html" width="100%"/>
<!---END-->
### DynamicDataFrameBuilder
Previously mentioned [`DataFrame`](DataFrame.md) constructors throw an exception when column names are duplicated.
When implementing a custom operation involving multiple [`DataFrame`](DataFrame.md) objects,
or computed columns or when parsing some third-party data,
it might be desirable to disambiguate column names instead of throwing an exception.
<!---FUN duplicatedColumns-->
```kotlin
fun peek(vararg dataframes: AnyFrame): AnyFrame {
val builder = DynamicDataFrameBuilder()
for (df in dataframes) {
df.columns().firstOrNull()?.let { builder.add(it) }
}
return builder.toDataFrame()
}
val col by columnOf(1, 2, 3)
peek(dataFrameOf(col), dataFrameOf(col))
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Create.duplicatedColumns.html" width="100%"/>
<!---END-->
+32
View File
@@ -0,0 +1,32 @@
[//]: # (title: cumSum)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Analyze-->
Computes the cumulative sum of values in the selected columns.
```text
cumSum(skipNA = true) [ { columns } ]
```
Returns a [`DataFrame`](DataFrame.md) or [`DataColumn`](DataColumn.md) containing the cumulative sum.
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
**Parameters:**
* `skipNA` — when `true`, ignores [`NA` values](nanAndNa.md#na) (`null` or `NaN`).
When `false`, all values after first `NA` will be `NaN` (for `Double` and `Float` columns) or `null` (for integer columns).
**Available for:**
* [`DataFrame`](DataFrame.md)
* [`DataColumn`](DataColumn.md)
* [`GroupBy DataFrame`](groupBy.md#transformation) — cumulative sum per every data group
<!---FUN cumSum-->
```kotlin
df.cumSum { weight }
df.weight.cumSum()
df.groupBy { city }.cumSum { weight }.concat()
```
<!---END-->
@@ -0,0 +1,3 @@
[//]: # (title: Cumulative statistics)
* [cumSum](cumSum.md) — cumulative sum (running total)
+184
View File
@@ -0,0 +1,184 @@
# @DataSchema Declarations
`DataSchema` can be used as an argument for [cast](cast.md) and [convertTo](convertTo.md) functions.
It provides typed data access for raw dataframes you read from I/O sources and serves as a starting point for the compiler plugin to derive schema changes.
Example 1:
```kotlin
@DataSchema
interface Person {
val firstName: String
}
```
Generated code:
```kotlin
val DataRow<Person>.firstName: Int = this["firstName"] as String
val ColumnsScope<Person>.firstName: DataColumn<Int> = this["firstName"] as DataColumn<String>
```
Example 2:
```kotlin
@DataSchema
interface Person {
@ColumnName("first_name")
val firstName: String
}
```
`ColumnName` annotation changes how generated extension properties pull the data from a dataframe:
Generated code:
```kotlin
val DataRow<Person>.firstName: Int = this["first_name"] as String
val ColumnsScope<Person>.firstName: DataColumn<Int> = this["first_name"] as DataColumn<String>
```
Generated extension properties are used to access values in `DataRow` and to access columns in `ColumnsScope`, which is either `DataFrame` or `ColumnSelectionDsl`
`DataRow`:
```kotlin
val row = df[0]
row.firstName
```
```kotlin
df.filter { firstName.startsWith("L") }
df.add("newCol") { firstName }
```
`DataFrame`:
```kotlin
val col = df.firstName
val value = col[0]
```
`ColumnSelectionDsl`:
```kotlin
df.convert { firstName }.with { it.uppercase() }
df.select { firstName }
df.rename { firstName }.into("name")
```
## Data Class
DataSchema can be a top-level data class, in which case two additional API become available
```kotlin
@DataSchema
class WikiData(val name: String, val paradigms: List<String>)
```
1. `dataFrameOf` overload that creates a dataframe instance from objects
```kotlin
val languages = dataFrameOf(
WikiData("Kotlin", listOf("object-oriented", "functional", "imperative")),
WikiData("Haskell", listOf("Purely functional")),
WikiData("C", listOf("imperative")),
WikiData("Pascal", listOf("imperative")),
WikiData("Idris", listOf("functional")),
)
```
2. `append` overload that takes an object and appends it as a row
```kotlin
val ocaml = WikiData("OCaml", listOf("functional", "imperative", "modular", "object-oriented"))
val languages1 = languages.append(ocaml)
```
## Schemas for nested structures
Nested structure can be a JSON that you read from a file.
```json
[
{
"id": "1",
"participants": [
{
"name": {
"firstName": "Alice",
"lastName": "Cooper"
},
"age": 15,
"city": "London"
},
{
"name": {
"firstName": "Bob",
"lastName": "Dylan"
},
"age": 45,
"city": "Dubai"
}
]
},
{
"id": "2",
"participants": [
{
"name": {
"firstName": "Charlie",
"lastName": "Daniels"
},
"age": 20,
"city": "Moscow"
},
{
"name": {
"firstName": "Charlie",
"lastName": "Chaplin"
},
"age": 40,
"city": "Milan"
}
]
}
]
```
You get dataframe with this schema
```text
id: String
participants: *
name:
firstName: String
lastName: String
age: Int
city: String
```
- `participants` is `FrameColumn`
- `name` is `ColumnGroup`
Here's the data schema that matches it:
```kotlin
@DataSchema
data class Group(
val id: String,
val participants: List<Person>
)
@DataSchema
data class Person(
val name: Name,
val age: Int,
val city: String?
)
@DataSchema
data class Name(
val firstName: String,
val lastName: String,
)
```
```kotlin
val url = "https://raw.githubusercontent.com/Kotlin/dataframe/refs/heads/master/data/participants.json"
val df = DataFrame.readJson(url).cast<Group>()
```
@@ -0,0 +1,53 @@
# Apache Arrow
<web-summary>
Read and write Apache Arrow files in Kotlin — efficient binary format support with Kotlin DataFrame.
</web-summary>
<card-summary>
Work with Arrow files in Kotlin for fast I/O — supports both streaming and random access formats.
</card-summary>
<link-summary>
Kotlin DataFrame provides full support for reading and writing Apache Arrow files in high-performance workflows.
</link-summary>
Kotlin DataFrame supports reading from and writing to Apache Arrow files.
Requires the [`dataframe-arrow` module](Modules.md#dataframe-arrow), which is included by
default in the general [`dataframe`](Modules.md#dataframe-general) artifact
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
> Make sure to follow the
> [Apache Arrow Java compatibility guide](https://arrow.apache.org/docs/java/install.html#java-compatibility)
> when using Java 9+.
> {style="warning"}
> Structured (nested) Arrow types such as Struct are not supported yet in Kotlin DataFrame.
> See the issue: [Add inner / Struct type support in Arrow](https://github.com/Kotlin/dataframe/issues/536)
> {style="warning"}
## Read
[`DataFrame`](DataFrame.md) supports both the
[Arrow interprocess streaming format](https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-streaming-format)
and the [Arrow random access format](https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-random-access-files).
You can read a `DataFrame` from Apache Arrow data sources
(via a file path, URL, or stream) using the [`readArrowFeather()`](read.md#read-apache-arrow-formats) method:
```kotlin
val df = DataFrame.readArrowFeather("example.feather")
```
```kotlin
val df = DataFrame.readArrowFeather("https://kotlin.github.io/dataframe/resources/example.feather")
```
## Write
A [`DataFrame`](DataFrame.md) can be written to Arrow format using the interprocess streaming or random access format.
Output targets include `WritableByteChannel`, `OutputStream`, `File`, or `ByteArray`.
See [](write.md#writing-to-apache-arrow-formats) for more details.
@@ -0,0 +1,52 @@
# CSV / TSV
<web-summary>
Work with CSV and TSV files — read, analyze, and export tabular data using Kotlin DataFrame.
</web-summary>
<card-summary>
Seamlessly load and write CSV or TSV files in Kotlin — perfect for common tabular data workflows.
</card-summary>
<link-summary>
Kotlin DataFrame support for reading and writing CSV and TSV files with simple, type-safe APIs.
</link-summary>
Kotlin DataFrame supports reading from and writing to CSV and TSV files.
Requires the [`dataframe-csv` module](Modules.md#dataframe-csv),
which is included by default in the general [`dataframe`](Modules.md#dataframe-general)
artifact and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
## Read
You can read a [`DataFrame`](DataFrame.md) from a CSV or TSV file (via a file path or URL)
using the [`readCsv()`](read.md#read-from-csv) or `readTsv()` methods:
```kotlin
val df = DataFrame.readCsv("example.csv")
```
```kotlin
val df = DataFrame.readCsv("https://kotlin.github.io/dataframe/resources/example.csv")
```
## Write
You can write a [`DataFrame`](DataFrame.md) to a CSV file using the [`writeCsv()`](write.md#writing-to-csv) method:
```kotlin
df.writeCsv("example.csv")
```
## Deephaven CSV
The [`dataframe-csv`](Modules.md#dataframe-csv) module uses the high-performance
[Deephaven CSV library](https://github.com/deephaven/deephaven-csv) under the hood
for fast and efficient CSV reading and writing.
If you're working with large CSV files, you can adjust the parser manually
by [configuring Deephaven-specific parameters](https://kotlin.github.io/dataframe/read.html#unlocking-deephaven-csv-features)
to get the best performance for your use case.
@@ -0,0 +1,35 @@
# Data Sources
<web-summary>
Discover all the data formats Kotlin DataFrame can work with — including JSON, CSV, Excel, SQL databases, and more.
</web-summary>
<card-summary>
Explore supported data sources in Kotlin DataFrame and how to integrate them into your data processing workflow.
</card-summary>
<link-summary>
Explore supported data sources in Kotlin DataFrame and how to integrate them into your data processing workflow.
</link-summary>
One of the key aspects of working with data is being able to read from and write to various data sources.
Kotlin DataFrame provides seamless support for a wide range of formats to integrate into your data workflows.
Below you'll find a list of supported sources along with instructions on how to read and write data using them.
- [JSON](JSON.md)
- [OpenAPI](OpenAPI.md)
- [CSV / TSV](CSV-TSV.md)
- [Excel](Excel.md)
- [Apache Arrow](ApacheArrow.md)
- [Parquet](Parquet.md)
- [SQL](SQL.md):
- [PostgreSQL](PostgreSQL.md)
- [MySQL](MySQL.md)
- [Microsoft SQL Server](Microsoft-SQL-Server.md)
- [SQLite](SQLite.md)
- [H2](H2.md)
- [MariaDB](MariaDB.md)
- [DuckDB](DuckDB.md)
- [Custom SQL Source](Custom-SQL-Source.md)
- [Custom integrations with unsupported data sources](Integrations.md)
+42
View File
@@ -0,0 +1,42 @@
# Excel
<web-summary>
Read from and write to Excel files in `.xls` or `.xlsx` formats with Kotlin DataFrame for seamless spreadsheet integration.
</web-summary>
<card-summary>
Kotlin DataFrame makes it easy to load and save data from Excel files — perfect for working with spreadsheet-based workflows.
</card-summary>
<link-summary>
Learn how to read and write Excel files using Kotlin DataFrame with just a single line of code.
</link-summary>
Kotlin DataFrame supports reading from and writing to Excel files in both `.xls` and `.xlsx` formats.
Requires the [`dataframe-excel` module](Modules.md#dataframe-excel),
which is included by default in the general [`dataframe`](Modules.md#dataframe-general)
artifact and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
## Read
You can read a [`DataFrame`](DataFrame.md) from an Excel file (via a file path or URL)
using the [`readExcel()`](read.md#read-from-excel) method:
```kotlin
val df = DataFrame.readExcel("example.xlsx")
```
```kotlin
val df = DataFrame.readExcel("https://kotlin.github.io/dataframe/resources/example.xlsx")
```
## Write
You can write a [`DataFrame`](DataFrame.md) to an Excel file using the
[`writeExcel()`](write.html#write-to-excel-spreadsheet) method:
```kotlin
df.writeExcel("example.xlsx")
```
@@ -0,0 +1,26 @@
# Custom integrations with unsupported data sources
<web-summary>
Examples of how to integrate Kotlin DataFrame with other data frameworks like Exposed, Spark, or Multik.
</web-summary>
<card-summary>
Integrate Kotlin DataFrame with unsupported sources — see practical examples with Exposed, Spark, and more.
</card-summary>
<link-summary>
How to connect Kotlin DataFrame with data sources like Exposed, Apache Spark, or Multik.
</link-summary>
Some data sources are not officially supported in the Kotlin DataFrame API yet —
but you can still integrate them easily using custom code.
Below is a list of example integrations with other data frameworks.
These examples demonstrate how to bridge Kotlin DataFrame with external libraries or APIs.
- [Kotlin Exposed](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/exposed)
- [Apache Spark (with/without Kotlin Spark API)](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/spark)
- [Multik](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/multik)
You can use these examples as templates to create your own integrations
with any data processing library that produces structured tabular data.
+47
View File
@@ -0,0 +1,47 @@
# JSON
<web-summary>
Support for working with JSON data — load, explore, and save structured JSON using Kotlin DataFrame.
</web-summary>
<card-summary>
Easily handle JSON data in Kotlin — read from files or URLs, and export your data back to JSON format.
</card-summary>
<link-summary>
Kotlin DataFrame support for reading and writing JSON files in a structured and type-safe way.
</link-summary>
Kotlin DataFrame supports reading from and writing to JSON files.
Requires the [`dataframe-json` module](Modules.md#dataframe-json),
which is included by default in the general [`dataframe`](Modules.md#dataframe-general)
artifact and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe)
for Kotlin Notebook.
> Kotlin DataFrame is suitable only for working with table-like structured JSON —
> a list of objects where each object represents a row and all objects share the same structure.
>
> Experimental support for [OpenAPI JSON schemas](OpenAPI.md) is also available.
> {style="note"}
## Read
You can read a [`DataFrame`](DataFrame.md) or [`DataRow`](DataRow.md)
from a JSON file (via a file path or URL) using the [`readJson()`](read.md#read-from-json) method:
```kotlin
val df = DataFrame.readJson("example.json")
```
```kotlin
val df = DataFrame.readJson("https://kotlin.github.io/dataframe/resources/example.json")
```
## Write
You can write a [`DataFrame`](DataFrame.md) to a JSON file using the [`writeJson()`](write.md#writing-to-json) method:
```kotlin
df.writeJson("example.json")
```
@@ -0,0 +1,34 @@
# OpenAPI
<web-summary>
Work with JSON data based on OpenAPI 3.0 schemas using Kotlin DataFrame — helpful for consuming structured API responses.
</web-summary>
<card-summary>
Use Kotlin DataFrame to read and write data that conforms to OpenAPI specifications. Great for API-driven data workflows.
</card-summary>
<link-summary>
Learn how to use OpenAPI 3.0 JSON schemas with Kotlin DataFrame to load and manipulate API-defined data.
</link-summary>
> **Experimental**: Support for OpenAPI 3.0.0 schemas is currently experimental
> and may change or be removed in future releases.
> {style="warning"}
Kotlin DataFrame provides support for reading and writing JSON data
that conforms to [OpenAPI 3.0 specifications](https://www.openapis.org).
This feature is useful when working with APIs that expose structured data defined via OpenAPI schemas.
Requires the [`dataframe-openapi` module](Modules.md#dataframe-openapi),
which **is not included** in the general [`dataframe`](Modules.md#dataframe-general) artifact.
To enable it in Kotlin Notebook, use:
```kotlin
%use dataframe(enableExperimentalOpenApi=true)
```
See [the OpenAPI guide notebook](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/json/KeyValueAndOpenApi.ipynb)
for details on how to work with OpenAPI-based data.
@@ -0,0 +1,159 @@
# Parquet
<web-summary>
Read Parquet files via Apache Arrow in Kotlin DataFrame — highperformance columnar storage for analytics.
</web-summary>
<card-summary>
Use Kotlin DataFrame to read Parquet datasets using Apache Arrow for fast, typed, columnar I/O.
</card-summary>
<link-summary>
Kotlin DataFrame can read Parquet files through Apache Arrows Dataset API. Learn how and when to use it.
</link-summary>
Kotlin DataFrame supports reading [Apache Parquet](https://parquet.apache.org/) files through the Apache Arrow integration.
Requires the [`dataframe-arrow` module](Modules.md#dataframe-arrow), which is included by default in the general [`dataframe`](Modules.md#dataframe-general) artifact and in and when using `%use dataframe` for Kotlin Notebook.
> We currently only support READING Parquet via Apache Arrow; writing Parquet is not supported in Kotlin DataFrame.
> {style="note"}
> Apache Arrow is not supported on Android, so reading Parquet files on Android is not available.
> {style="warning"}
> Structured (nested) Arrow types such as Struct are not supported yet in Kotlin DataFrame.
> See the issue: [Add inner / Struct type support in Arrow](https://github.com/Kotlin/dataframe/issues/536)
> {style="warning"}
## Reading Parquet Files
Kotlin DataFrame provides four `readParquet()` methods that can read from different source types.
All overloads accept optional `nullability` inference settings and `batchSize` for Arrow scanning.
```kotlin
// 1) URLs
public fun DataFrame.Companion.readParquet(
vararg urls: URL,
nullability: NullabilityOptions = NullabilityOptions.Infer,
batchSize: Long = ARROW_PARQUET_DEFAULT_BATCH_SIZE,
): AnyFrame
// 2) Strings (interpreted as file paths or URLs, e.g., "data/file.parquet", "file://", or "http(s)://")
public fun DataFrame.Companion.readParquet(
vararg strUrls: String,
nullability: NullabilityOptions = NullabilityOptions.Infer,
batchSize: Long = ARROW_PARQUET_DEFAULT_BATCH_SIZE,
): AnyFrame
// 3) Paths
public fun DataFrame.Companion.readParquet(
vararg paths: Path,
nullability: NullabilityOptions = NullabilityOptions.Infer,
batchSize: Long = ARROW_PARQUET_DEFAULT_BATCH_SIZE,
): AnyFrame
// 4) Files
public fun DataFrame.Companion.readParquet(
vararg files: File,
nullability: NullabilityOptions = NullabilityOptions.Infer,
batchSize: Long = ARROW_PARQUET_DEFAULT_BATCH_SIZE,
): AnyFrame
```
These overloads are defined in the `dataframe-arrow` module and internally use `FileFormat.PARQUET` from Apache Arrows
Dataset API to scan the data and materialize it as a Kotlin `DataFrame`.
### Examples
```kotlin
// Read from file paths (as strings)
val df = DataFrame.readParquet("data/sales.parquet")
```
<!---FUN readParquetFilePath-->
```kotlin
// Read from Path objects
val path = Paths.get("data/sales.parquet")
val df = DataFrame.readParquet(path)
```
<!---END-->
<!---FUN readParquetURL-->
```kotlin
// Read from URLs
val df = DataFrame.readParquet(url)
```
<!---END-->
<!---FUN readParquetFile-->
```kotlin
// Read from File objects
val file = File("data/sales.parquet")
val df = DataFrame.readParquet(file)
```
<!---END-->
<!---FUN readParquetFileWithParameters-->
```kotlin
// Read from File objects
val file = File("data/sales.parquet")
val df = DataFrame.readParquet(
file,
nullability = NullabilityOptions.Infer,
batchSize = 64L * 1024
)
```
<!---END-->
If you want to see a complete, realistic dataengineering example using Spark and Parquet with Kotlin DataFrame,
check out the [example project](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/spark-parquet-dataframe).
### Multiple Files
It's possible to read multiple Parquet files:
<!---FUN readMultipleParquetFiles-->
```kotlin
val file = File("data/sales.parquet")
val file1 = File("data/sales1.parquet")
val file2 = File("data/sales2.parquet")
val df = DataFrame.readParquet(file, file1, file2)
```
<!---END-->
**Requirements:**
- All files must have compatible schemas
- Files are vertically concatenated (union of rows)
- Column types must match exactly
- Missing columns in some files will result in null values
### Performance tips
- **Column selection**: Because the `readParquet` method reads all columns, use DataFrame operations like `select()` immediately after reading to reduce memory usage in later operations
- **Predicate pushdown**: Currently not supported—filtering happens after data is loaded into memory
- Use Arrowcompatible JVMs as documented in
[Apache Arrow Java compatibility](https://arrow.apache.org/docs/java/install.html#java-compatibility).
- Adjust `batchSize` if you read huge files and need to tune throughput vs. memory.
### See also
- [](ApacheArrow.md) — reading/writing Arrow IPC formats.
- [Parquet official site](https://parquet.apache.org/).
- Example: [Spark + Parquet + Kotlin DataFrame](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/spark-parquet-dataframe)
- [](Data-Sources.md) — Overview of all supported formats
@@ -0,0 +1,22 @@
# Custom SQL Source
<web-summary>
Connect Kotlin DataFrame to any JDBC-compatible database using a custom SQL source configuration.
</web-summary>
<card-summary>
Easily integrate unsupported SQL databases in Kotlin DataFrame using a flexible custom source setup.
</card-summary>
<link-summary>
Define a custom SQL source in Kotlin DataFrame to work with any JDBC-based database.
</link-summary>
If your SQL database is not officially supported, you can either
[create an issue](https://github.com/Kotlin/dataframe/issues)
or define a simple, configurable custom SQL source.
See the [How to Extend DataFrame Library for Custom SQL Database Support guide](readSqlFromCustomDatabase.md)
for detailed instructions and an example with HSQLDB.
@@ -0,0 +1,107 @@
# DuckDB
<web-summary>
Work with DuckDB databases in Kotlin — read tables and queries into DataFrames using JDBC.
</web-summary>
<card-summary>
Use Kotlin DataFrame to query and transform DuckDB data directly via JDBC.
</card-summary>
<link-summary>
Read DuckDB data into Kotlin DataFrame with JDBC support.
</link-summary>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.io.DuckDb-->
Kotlin DataFrame supports reading from [DuckDB](https://duckdb.org/) databases using JDBC.
This requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [the official DuckDB JDBC driver](https://duckdb.org/docs/stable/clients/java):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("org.duckdb:duckdb_jdbc:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("org.duckdb:duckdb_jdbc:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version can be found
[here](https://mvnrepository.com/artifact/org.duckdb/duckdb_jdbc).
## Read
A [`DataFrame`](DataFrame.md) instance can be loaded from a database in several ways:
a user can read data from a SQL table by a given name ([`readSqlTable`](readSqlDatabases.md)),
as the result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([
`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
<!---FUN readSqlTable-->
```kotlin
val url = "jdbc:duckdb:/testDatabase"
val username = "duckdb"
val password = "password"
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
<!---END-->
### Extensions
DuckDB has a special trick up its sleeve: it has support
for [extensions](https://duckdb.org/docs/stable/extensions/overview).
These can be installed, loaded, and used to connect to a different database via DuckDB.
See [Core Extensions](https://duckdb.org/docs/stable/core_extensions/overview) for a list of available extensions.
For example, let's load a dataframe
from [Apache Iceberg via DuckDB](https://duckdb.org/docs/stable/core_extensions/iceberg/overview.html),
as Iceberg is an unsupported data source in DataFrame at the moment:
<!---FUN readIcebergExtension-->
```kotlin
// Creating an in-memory DuckDB database
val connection = DriverManager.getConnection("jdbc:duckdb:")
val df = connection.use { connection ->
// install and load Iceberg
connection.createStatement().execute("INSTALL iceberg; LOAD iceberg;")
// query a table from Iceberg using a specific SQL query
DataFrame.readSqlQuery(
connection = connection,
sqlQuery = "SELECT * FROM iceberg_scan('data/iceberg/lineitem_iceberg', allow_moved_paths = true);",
)
}
```
<!---END-->
As you can see, the process is very similar to reading from any other JDBC database,
just without needing explicit DataFrame support.
@@ -0,0 +1,98 @@
# H2
<web-summary>
Use Kotlin DataFrame to query H2 databases via JDBC — read tables, run SQL queries, or fetch result sets directly.
</web-summary>
<card-summary>
Connect to H2 databases in Kotlin DataFrame and load data using simple JDBC configurations.
</card-summary>
<link-summary>
Read from H2 databases in Kotlin DataFrame using built-in SQL reading methods.
</link-summary>
Kotlin DataFrame supports reading from an [H2](https://www.h2database.com/html/main.html) database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [the official H2 JDBC driver](https://www.h2database.com/html/main.html):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("com.h2database:h2:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("com.h2database:h2:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/com.h2database/h2).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
### H2 Compatibility Modes
When working with H2 database, the library automatically detects the compatibility mode from the connection.
If no `MODE` is specified in the JDBC URL, the default `Regular` mode is used.
H2 supports the following compatibility modes: `MySQL`, `PostgreSQL`, `MSSQLServer`, `MariaDB`, and `Regular`.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
// Basic H2 connection (uses Regular mode by default)
val url = "jdbc:h2:mem:testDatabase"
val username = "sa"
val password = ""
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
// H2 with PostgreSQL compatibility mode
val postgresUrl = "jdbc:h2:mem:testDatabase;MODE=PostgreSQL"
val username = "sa"
val password = ""
val postgresConfig = DbConnectionConfig(postgresUrl, username, password)
val tableName = "Customer"
val dfPostgres = DataFrame.readSqlTable(postgresConfig, tableName)
```
@@ -0,0 +1,72 @@
# MariaDB
<web-summary>
Access MariaDB databases using Kotlin DataFrame and JDBC — fetch data from tables or custom SQL queries with ease.
</web-summary>
<card-summary>
Seamlessly integrate MariaDB with Kotlin DataFrame — load data using JDBC and analyze it in Kotlin.
</card-summary>
<link-summary>
Read data from MariaDB into Kotlin DataFrame using standard JDBC configurations.
</link-summary>
Kotlin DataFrame supports reading from [MariaDB](https://mariadb.org) database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [the official MariaDB JDBC driver](https://mariadb.com/docs/connectors/mariadb-connector-j):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("org.mariadb.jdbc:mariadb-java-client:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("org.mariadb.jdbc:mariadb-java-client:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/org.mariadb.jdbc/mariadb-java-client).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
val url = "jdbc:mariadb://localhost:3306/testDatabase"
val username = "root"
val password = "password"
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
@@ -0,0 +1,74 @@
# Microsoft SQL Server (MS SQL)
<web-summary>
Connect to Microsoft SQL Server using Kotlin DataFrame and JDBC — load structured data directly into your Kotlin workflow.
</web-summary>
<card-summary>
Use Kotlin DataFrame to read from Microsoft SQL Server — run queries or load entire tables via JDBC.
</card-summary>
<link-summary>
Fetch data from Microsoft SQL Server into Kotlin DataFrame using JDBC configuration.
</link-summary>
Kotlin DataFrame supports reading from [Microsoft SQL Server (MS SQL)](https://www.microsoft.com/en-us/sql-server)
database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need
[the official MS SQL JDBC driver](https://learn.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-ver17):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("com.microsoft.sqlserver:mssql-jdbc:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("com.microsoft.sqlserver:mssql-jdbc:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
val url = "jdbc:sqlserver://localhost:1433;databaseName=testDatabase"
val username = "sa"
val password = "password"
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
@@ -0,0 +1,72 @@
# MySQL
<web-summary>
Connect to MySQL databases and load data into Kotlin DataFrame using JDBC — query, analyze, and transform SQL data in Kotlin.
</web-summary>
<card-summary>
Use Kotlin DataFrame with MySQL — easily read tables and queries over JDBC into powerful data structures.
</card-summary>
<link-summary>
Read data from MySQL into Kotlin DataFrame using JDBC configuration.
</link-summary>
Kotlin DataFrame supports reading from [MySQL](https://www.mysql.com) database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [the official MySQL JDBC driver](https://dev.mysql.com/downloads/connector/j/):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("com.mysql:mysql-connector-j:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("com.mysql:mysql-connector-j:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/com.mysql/mysql-connector-j).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
val url = "jdbc:mysql://localhost:3306/testDatabase"
val username = "root"
val password = "password"
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
@@ -0,0 +1,71 @@
# PostgreSQL
<web-summary>
Work with PostgreSQL databases in Kotlin — read tables and queries into DataFrames using JDBC.
</web-summary>
<card-summary>
Use Kotlin DataFrame to query and transform PostgreSQL data directly via JDBC.
</card-summary>
<link-summary>
Read PostgreSQL data into Kotlin DataFrame with JDBC support.
</link-summary>
Kotlin DataFrame supports reading from [PostgreSQL](https://www.postgresql.org) database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [the official PostgreSQL JDBC driver](https://jdbc.postgresql.org):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("org.postgresql:postgresql:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("org.postgresql:postgresql:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/org.postgresql/postgresql).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
val url = "jdbc:postgresql://localhost:5432/testDatabase"
val username = "postgres"
val password = "password"
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
@@ -0,0 +1,46 @@
# SQL
<web-summary>
Work with SQL databases in Kotlin using DataFrame and JDBC — read tables and queries with ease.
</web-summary>
<card-summary>
Connect to PostgreSQL, MySQL, SQLite, and other SQL databases using Kotlin DataFrame's JDBC support.
</card-summary>
<link-summary>
Load data from SQL databases into Kotlin DataFrame using JDBC and built-in reading functions.
</link-summary>
Kotlin DataFrame supports reading from SQL databases using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need a JDBC driver for the specific database.
## Supported databases
Kotlin DataFrame provides out-of-the-box support for the most common SQL databases:
- [PostgreSQL](PostgreSQL.md)
- [MySQL](MySQL.md)
- [Microsoft SQL Server](Microsoft-SQL-Server.md)
- [SQLite](SQLite.md)
- [H2](H2.md)
- [MariaDB](MariaDB.md)
- [DuckDB](DuckDB.md)
You can also define a [Custom SQL Source](Custom-SQL-Source.md)
to work with any other JDBC-compatible database.
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame`
([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
@@ -0,0 +1,70 @@
# SQLite
<web-summary>
Use Kotlin DataFrame to read data from SQLite databases with minimal setup via JDBC.
</web-summary>
<card-summary>
Query and transform SQLite data directly in Kotlin using DataFrame and JDBC.
</card-summary>
<link-summary>
Read SQLite tables into Kotlin DataFrame using the built-in JDBC integration.
</link-summary>
Kotlin DataFrame supports reading from [SQLite](https://www.sqlite.org) database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [SQLite JDBC driver](https://github.com/xerial/sqlite-jdbc):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("org.xerial:sqlite-jdbc:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("org.xerial:sqlite-jdbc:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/org.xerial/sqlite-jdbc).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
val url = "jdbc:sqlite:testDatabase.db"
val dbConfig = DbConnectionConfig(url)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
+62
View File
@@ -0,0 +1,62 @@
[//]: # (title: describe)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Analyze-->
Returns [`DataFrame`](DataFrame.md) with general statistics for all [`ValueColumns`](DataColumn.md#valuecolumn).
```kotlin
describe [ columns ]
```
[`ColumnGroup`](DataColumn.md#columngroup) and [`FrameColumns`](DataColumn.md#framecolumn) are traversed recursively down to `ValueColumns`.
### Summary Metrics:
- **`name`** — The name of the column.
- **`path`** — path to the column (for hierarchical `DataFrame`)
- **`type`** — The data type of the column (e.g., Int, String, Boolean).
- **`count`** — The total number of non-null values in the column.
- **`unique`** — The number of unique values in the column.
- **`nulls`** — The count of null (missing) values in the column.
- **`top`** — The most frequently occurring value in the column.
- **`freq`** — The frequency of the most common value.
- **`mean`** — The arithmetic mean (only for numeric columns).
- **`std`** — The standard deviation (only for numeric columns).
- **`min`** — The minimum value in the column.
- **`p25`** — The 25th percentile value (first quartile).
- **`median`** — The median value (50th percentile / second quartile).
- **`p75`** — The 75th percentile value (third quartile).
- **`max`** — The maximum value in the column.
For non-numeric columns, statistical metrics
such as `mean` and `std` will return `null`. If column values are incomparable,
percentile values (`min`, `p25`, `median`, `p75`, `max`) will also return `null`.
<!---FUN describe-->
```kotlin
df.describe()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.describe.html" width="100%"/>
<!---END-->
To describe only specific columns, pass them as an argument:
<!---FUN describeColumns-->
<tabs>
<tab title="Properties">
```kotlin
df.describe { age and name.allCols() }
```
</tab>
<tab title="Strings">
```kotlin
df.describe { "age" and "name".allCols() }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.describeColumns.html" width="100%"/>
<!---END-->
+73
View File
@@ -0,0 +1,73 @@
[//]: # (title: distinct)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
Removes duplicate rows.
The rows in the resulting [`DataFrame`](DataFrame.md) are in the same order as they were in the original [`DataFrame`](DataFrame.md).
Related operations: [](filterRows.md)
<!---FUN distinct-->
```kotlin
df.distinct()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.distinct.html" width="100%"/>
<!---END-->
If columns are specified, resulting [`DataFrame`](DataFrame.md) will have only given columns with distinct values.
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
<!---FUN distinctColumns-->
<tabs>
<tab title="Properties">
```kotlin
df.distinct { age and name }
// same as
df.select { age and name }.distinct()
```
</tab>
<tab title="Strings">
```kotlin
df.distinct("age", "name")
// same as
df.select("age", "name").distinct()
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.distinctColumns.html" width="100%"/>
<!---END-->
## distinctBy
Keep only the first row for every group of rows grouped by some condition.
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
<!---FUN distinctBy-->
<tabs>
<tab title="Properties">
```kotlin
df.distinctBy { age and name }
// same as
df.groupBy { age and name }.mapToRows { group.first() }
```
</tab>
<tab title="Strings">
```kotlin
df.distinctBy("age", "name")
// same as
df.groupBy("age", "name").mapToRows { group.first() }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.distinctBy.html" width="100%"/>
<!---END-->
+83
View File
@@ -0,0 +1,83 @@
[//]: # (title: drop / dropNulls / dropNaNs / dropNA)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
Removes all rows that satisfy [row condition](DataRow.md#row-conditions)
**Related operations**: [](filterRows.md)
<!---FUN dropWhere-->
<tabs>
<tab title="Properties">
```kotlin
df.drop { weight == null || city == null }
```
</tab>
<tab title="Strings">
```kotlin
df.drop { it["weight"] == null || it["city"] == null }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.dropWhere.html" width="100%"/>
<!---END-->
## dropNulls
Remove rows with `null` values. This is a DataFrame equivalent of `filterNotNull`.
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
<!---FUN dropNulls-->
```kotlin
df.dropNulls() // remove rows with null value in any column
df.dropNulls(whereAllNull = true) // remove rows with null values in all columns
df.dropNulls { city } // remove rows with null value in 'city' column
df.dropNulls { city and weight } // remove rows with null value in 'city' OR 'weight' columns
df.dropNulls(whereAllNull = true) { city and weight } // remove rows with null value in 'city' AND 'weight' columns
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.dropNulls.html" width="100%"/>
<!---END-->
## dropNaNs
Remove rows with [`NaN` values](nanAndNa.md#nan) (`Double.NaN` or `Float.NaN`).
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
<!---FUN dropNaNs-->
```kotlin
df.dropNaNs() // remove rows containing NaN in any column
df.dropNaNs(whereAllNaN = true) // remove rows with NaN in all columns
df.dropNaNs { weight } // remove rows where 'weight' is NaN
df.dropNaNs { age and weight } // remove rows where either 'age' or 'weight' is NaN
df.dropNaNs(whereAllNaN = true) { age and weight } // remove rows where both 'age' and 'weight' are NaN
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.dropNaNs.html" width="100%"/>
<!---END-->
## dropNA
Remove rows with [`NA` values](nanAndNa.md#na) (`null`, `Double.NaN`, or `Float.NaN`).
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
<!---FUN dropNA-->
```kotlin
df.dropNA() // remove rows containing null or NaN in any column
df.dropNA(whereAllNA = true) // remove rows with null or NaN in all columns
df.dropNA { weight } // remove rows where 'weight' is null or NaN
df.dropNA { age and weight } // remove rows where either 'age' or 'weight' is null or NaN
df.dropNA(whereAllNA = true) { age and weight } // remove rows where both 'age' and 'weight' are null or NaN
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.dropNA.html" width="100%"/>
<!---END-->
+20
View File
@@ -0,0 +1,20 @@
[//]: # (title: duplicate)
Returns [`DataFrame`](DataFrame.md) with original [`DataRow`](DataRow.md) repeated `n` times.
```text
DataRow.duplicate(n): DataFrame
```
Returns [`FrameColumn`](DataColumn.md#framecolumn) with original [`DataFrame`](DataFrame.md) repeated `n` times.
Resulting [`FrameColumn`](DataColumn.md#framecolumn) will have an empty [`name`](DataColumn.md#properties).
**Related operations**: [](appendDuplicate.md)
```text
DataFrame.duplicate(n): FrameColumn
```
Returns [`DataFrame`](DataFrame.md) where rows that satisfy to the given [condition](DataRow.md#row-conditions) are repeated `n` times. If `rowCondition` is not specified all rows will be duplicated.
```text
DataFrame.duplicateRows(n) [ { rowCondition } ]: DataFrame
```
+86
View File
@@ -0,0 +1,86 @@
[//]: # (title: explode)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Splits list-like values in given columns and spreads them vertically. Values in other columns are duplicated.
```text
explode(dropEmpty = true) [ { columns } ]
```
**Reverse operation:** [`implode`](implode.md)
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
**Parameters:**
* `dropEmpty` — if `true`, removes rows with empty lists or [`DataFrame`](DataFrame.md) objects. Otherwise, they will be exploded into `null`.
**Available for:**
* [`DataFrame`](DataFrame.md)
* [`FrameColumn`](DataColumn.md#framecolumn)
* `DataColumn<Collection>`
Exploded columns will change their types:
* `List<T>` to `T`
* [`DataFrame`](DataFrame.md) to [`DataRow`](DataRow.md)
Exploded [`FrameColumn`](DataColumn.md#framecolumn) will be converted into [`ColumnGroup`](DataColumn.md#columngroup).
Explode [`DataFrame`](DataFrame.md):
<!---FUN explode-->
<tabs>
<tab title="Strings">
```kotlin
val df = dataFrameOf("a", "b")(
1, listOf(1, 2),
2, listOf(3, 4),
)
df.explode("b")
```
</tab></tabs>
<!---END-->
When several columns are exploded in one operation, lists in different columns will be aligned.
<!---FUN explodeSeveral-->
```kotlin
val a by columnOf(listOf(1, 2), listOf(3, 4, 5))
val b by columnOf(listOf(1, 2, 3), listOf(4, 5))
val df = dataFrameOf(a, b)
df.explode { a and b }
```
<!---END-->
Explode [`DataColumn<Collection>`](DataColumn.md):
<!---FUN explodeColumnList-->
```kotlin
val col by columnOf(listOf(1, 2), listOf(3, 4))
col.explode()
```
<!---END-->
Explode [`FrameColumn`](DataColumn.md#framecolumn):
<!---FUN explodeColumnFrames-->
```kotlin
val col by columnOf(
dataFrameOf("a", "b")(1, 2, 3, 4),
dataFrameOf("a", "b")(5, 6, 7, 8),
)
col.explode()
```
<!---END-->
+4
View File
@@ -0,0 +1,4 @@
[//]: # (title: Explode / implode columns)
* [`explode`](explode.md) — distributes lists of values or [`DataFrame`](DataFrame.md) object in given columns vertically, replicating data in other columns
* [`implode`](implode.md) — collects column values in given columns into lists or [`DataFrame`](DataFrame.md) objects, grouping by other columns
@@ -0,0 +1,214 @@
[//]: # (title: Extension Properties API)
When working with a [`DataFrame`](DataFrame.md), the most convenient and reliable way
to access its columns — including for operations and retrieving column values
in row expressions — is through *auto-generated extension properties*.
They are generated based on a [dataframe schema](schemas.md),
with the name and type of properties inferred from the name and type of the corresponding columns.
It also works for all types of hierarchical dataframes.
> The behavior of data schema generation differs between the
> [Compiler Plugin](Compiler-Plugin.md) and [Kotlin Notebook](SetupKotlinNotebook.md).
>
> * In **Kotlin Notebook**, a schema is generated **only after cell execution** for
> `DataFrame` variables defined within that cell.
> * With the **Compiler Plugin**, a new schema is generated **after every operation**
> — but support for all operations is still in progress.
> Retrieving the schema for `DataFrame` read from a file or URL is **not yet supported** either.
>
> This behavior may change in future releases. See the [example](#example) below that demonstrates these differences.
{style="warning"}
## Example
Consider a simple hierarchical dataframe from
<resource src="example.csv"></resource>.
This table consists of two columns: `name`, which is a `String` column, and `info`,
which is a [**column group**](DataColumn.md#columngroup) containing two nested
[value columns](DataColumn.md#valuecolumn) —
`age` of type `Int`, and `height` of type `Double`.
<table width="705">
<thead>
<tr>
<th>name</th>
<th colspan="2">info</th>
</tr>
<tr>
<th></th>
<th>age</th>
<th>height</th>
</tr>
</thead>
<tbody>
<tr>
<td>Alice</td>
<td>23</td>
<td>175.5</td>
</tr>
<tr>
<td>Bob</td>
<td>27</td>
<td>160.2</td>
</tr>
</tbody>
</table>
<tabs>
<tab title="Kotlin Notebook">
Read the [`DataFrame`](DataFrame.md) from the CSV file:
```kotlin
val df = DataFrame.readCsv("example.csv")
```
**After cell execution** data schema and extensions for this `DataFrame` will be generated
so you can use extensions for accessing columns,
using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
and [DataRow API](DataRow.md):
```kotlin
// Get nested column
df.info.age
// Sort by multiple columns
df.sortBy { name and info.height }
// Filter rows using a row condition.
// These extensions express the exact value in the row
// with the corresponding type:
df.filter { name.startsWith("A") && info.age >= 16 }
```
If you change the dataframe's schema by changing any column [name](rename.md),
or [type](convert.md) or [add](add.md) a new one, you need to
run a cell with a new [`DataFrame`](DataFrame.md) declaration first.
For example, rename the `name` column into "firstName":
```kotlin
val dfRenamed = df.rename { name }.into("firstName")
```
After running the cell with the code above, you can use `firstName` extensions in the following cells:
```kotlin
dfRenamed.firstName
dfRenamed.rename { firstName }.into("name")
dfRenamed.filter { firstName == "Nikita" }
```
See the [](quickstart.md) in Kotlin Notebook with basic Extension Properties API examples.
</tab>
<tab title="Compiler Plugin">
For now, if you read [`DataFrame`](DataFrame.md) from a file or URL, you need to define its schema manually.
You can do it quickly with [`generate..()` methods](DataSchemaGenerationMethods.md).
Define schemas:
```kotlin
@DataSchema
data class PersonInfo(
val age: Int,
val height: Float
)
@DataSchema
data class Person(
val info: PersonInfo,
val name: String
)
```
Read the [`DataFrame`](DataFrame.md) from the CSV file and specify the schema with
[`.convertTo()`](convertTo.md) or [`cast()`](cast.md):
```kotlin
val df = DataFrame.readCsv("example.csv").convertTo<Person>()
```
Extensions for this `DataFrame` will be generated automatically by the plugin,
so you can use extensions for accessing columns,
using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
and [DataRow API](DataRow.md).
```kotlin
// Get nested column
df.info.age
// Sort by multiple columns
df.sortBy { name and info.height }
// Filter rows using a row condition.
// These extensions express the exact value in the row
// with the corresponding type:
df.filter { name.startsWith("A") && info.age >= 16 }
```
Moreover, new extensions will be generated on-the-fly after each schema change:
by changing any column [name](rename.md),
or [type](convert.md) or [add](add.md) a new one.
For example, rename the `name` column into "firstName" and then we can use `firstName` extensions
in the following operations:
```kotlin
// Rename "name" column into "firstName"
df.rename { name }.into("firstName")
// Can use `firstName` extension in the row condition
// right after renaming
.filter { firstName == "Nikita" }
```
See [Compiler Plugin Example](https://github.com/Kotlin/dataframe/tree/plugin_example/examples/kotlin-dataframe-plugin-gradle-example)
IDEA project with basic Extension Properties API examples.
</tab>
</tabs>
## Properties name generation
By default, each extension property is generated with a name equal to the original column name.
```kotlin
val df = dataFrameOf("size_in_inches" to listOf(..))
df.size_in_inches
```
If the original column name cannot be used as a property name (for example, if it contains spaces
or has a name equal to a keyword in Kotlin),
it will be enclosed in backticks.
```kotlin
val df = dataFrameOf("size in inches" to listOf(..))
df.`size in inches`
```
However, sometimes the original column name contains special symbols
and can't be used as a property name in backticks.
In such cases, special symbols in the auto-generated property name will be replaced.
```kotlin
val df = dataFrameOf("size\nin:inches" to listOf(..))
df.`size in - inches`
```
> In such cases, use [**`rename`**](rename.md) to update column names,
> or [**`renameToCamelCase`**](rename.md#renametocamelcase) to convert all column names
> in a `DataFrame` to `camelCase`, which is the idiomatic and widely preferred naming style in Kotlin.
If you don't want to change the actual column name, but you need a convenient accessor for this column,
you can use the `@ColumnName` annotation in a manually declared [data schema](schemas.md).
It allows you to use a property name different
from the original column name without changing the column's actual name:
```kotlin
@DataSchema
interface Info {
@ColumnName("size\nin:inches")
val sizeInInches: Double
}
```
```kotlin
val df = dataFrameOf("size\nin:inches" to listOf(..)).cast<Info>()
df.sizeInInches
```
+54
View File
@@ -0,0 +1,54 @@
[//]: # (title: fill)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Replace missing values.
**Related operations**: [](updateConvert.md)
## fillNulls
Replaces `null` values with given value or expression.
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
<!---FUN fillNulls-->
```kotlin
df.fillNulls { colsOf<Int?>() }.with { -1 }
// same as
df.update { colsOf<Int?>() }.where { it == null }.with { -1 }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.fillNulls.html" width="100%"/>
<!---END-->
## fillNaNs
Replaces [`NaN` values](nanAndNa.md#nan) (`Double.NaN` and `Float.NaN`) with given value or expression.
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
<!---FUN fillNaNs-->
```kotlin
df.fillNaNs { colsOf<Double>() }.withZero()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.fillNaNs.html" width="100%"/>
<!---END-->
## fillNA
Replaces [`NA` values](nanAndNa.md#na) (`null`, `Double.NaN`, and `Float.NaN`) with given value or expression.
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
<!---FUN fillNA-->
```kotlin
df.fillNA { weight }.with { -1 }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.fillNA.html" width="100%"/>
<!---END-->
+26
View File
@@ -0,0 +1,26 @@
[//]: # (title: filter)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
Returns [`DataFrame`](DataFrame.md) with rows that satisfy [row condition](DataRow.md#row-conditions)
**Related operations**: [](filterRows.md)
<!---FUN filter-->
<tabs>
<tab title="Properties">
```kotlin
df.filter { age > 18 && name.firstName.startsWith("A") }
```
</tab>
<tab title="Strings">
```kotlin
df.filter { "age"<Int>() > 18 && "name"["firstName"]<String>().startsWith("A") }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.filter.html" width="100%"/>
<!---END-->
+5
View File
@@ -0,0 +1,5 @@
[//]: # (title: Filter rows)
* [`filter`](filter.md) — keep only rows that satisfy to condition
* [`drop`](drop.md) — remove rows that satisfy to condition
* [`distinct`](distinct.md) — remove duplicate rows
+7
View File
@@ -0,0 +1,7 @@
[//]: # (title: first)
Returns the first [row](DataRow.md) that matches the given [condition](DataRow.md#row-conditions), or throws exception if there is no matching rows.
## firstOrNull
Returns the first [row](DataRow.md) that matches the given [condition](DataRow.md#row-conditions), or `null` if there is no matching rows.
+50
View File
@@ -0,0 +1,50 @@
[//]: # (title: flatten)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Returns [`DataFrame`](DataFrame.md) without column groupings under selected columns.
```text
flatten [ { columns } ]
```
Columns will keep their original names after flattening.
Potential column name clashes are resolved by adding minimal possible name prefix from ancestor columns.
**Related operations**: [](groupUngroupFlatten.md)
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
<!---FUN flatten-->
<tabs>
<tab title="Properties">
```kotlin
// name.firstName -> firstName
// name.lastName -> lastName
df.flatten { name }
```
</tab>
<tab title="Strings">
```kotlin
// name.firstName -> firstName
// name.lastName -> lastName
df.flatten("name")
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.flatten.html" width="100%"/>
<!---END-->
To remove all column groupings in [`DataFrame`](DataFrame.md), invoke `flatten` without parameters:
<!---FUN flattenAll-->
```kotlin
df.flatten()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.flattenAll.html" width="100%"/>
<!---END-->
+165
View File
@@ -0,0 +1,165 @@
[//]: # (title: format)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
<web-summary>
DataFrame Format Operation: Apply CSS formatting for rendering a dataframe to HTML.
</web-summary>
<card-summary>
DataFrame Format Operation: Apply CSS formatting for rendering a dataframe to HTML.
</card-summary>
<link-summary>
DataFrame Format Operation: Apply CSS formatting for rendering a dataframe to HTML.
</link-summary>
Formats the specified columns or cells within the dataframe such that
they have specific CSS attributes applied to them when rendering the dataframe to HTML.
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
The selection of columns and rows to apply formatting to follows the [`update` operation](update.md).
This means you can `format { }` some columns `where {}` some predicate holds true `at()` a certain range of rows
`with {}` some cell attributes, just to name an example.
`.perRowCol { row, col -> }` is also available as an alternative to `.with {}`, if you want to format cells based on
their relative context. See the example below for a use-case for this operation.
There are also a handful of shortcuts for common operations within `format`, such as `.linearBg(-20 to blue, 50 to red)`
which is a shortcut for `.with { background(linear(it, -20 to blue, 50 to red)) }`, and `.notNull {}` which is a
shortcut
for `.notNull().with {}`, filtering cells to only include non-null ones.
Finally, you can decide which attributes the selected cells get.
You can combine as many as you like by chaining
them with the `and` infix inside the Formatting DSL.
Some common examples include `background(white)`, which sets the background to `white` for a cell,
`italic`, which makes the cell text _italic_, `textColor(linear(it, 0 to green, 100 to rgb(255, 255, 0)))`, which
interpolates the text color between green and yellow based on where the value of the cell lies in between 0 and 100, and
finally `attr("text-align", "center")`, a custom attribute which centers the text inside the cell.
See [](#grammar) for everything that's available.
The `format` function can be repeated as many times as needed and, to view the result, you can call
[`toHtml()`/`toStandaloneHtml()`](toHTML.md).
Specifying a [column group](DataColumn.md#columngroup) makes all of its inner columns
be formatted in the same way unless overridden.
Formatting is done additively, meaning you can add more formatting to a cell that's already formatted or
override certain attributes inherited from its outer group.
Specifying a [frame column](DataColumn.md#framecolumn) at the moment does nothing
([Issue #1375](https://github.com/Kotlin/dataframe/issues/1375)),
[](convert.md) each nested [`DataFrame`](DataFrame.md) to a `FormattedFrame` instead:
```kotlin
df.convert { myFrameCol }.with {
it.format { someCol }.with { background(green) }
}.toStandaloneHtml()
```
#### Grammar {collapsible="true"}
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.api.FormatDocs.Grammar.ForHtml.html" width="100%"/>
#### Examples
The formatting DSL allows you to create all sorts of formatted tables.
The formatting can depend on the data; for instance, to highlight how the value of
a column corresponds to values of other columns:
<!---FUN formatExample-->
<tabs>
<tab title="Properties">
```kotlin
val ageMin = df.age.min()
val ageMax = df.age.max()
df
.format().with { bold and textColor(black) and background(white) }
.format { name }.with { underline }
.format { name.lastName }.with { italic }
.format { isHappy }.with { background(if (it) green else red) }
.format { weight }.notNull().linearBg(50 to FormattingDsl.blue, 90 to FormattingDsl.red)
.format { age }.perRowCol { row, col ->
textColor(
linear(value = col[row], from = ageMin to blue, to = ageMax to green),
)
}
```
</tab>
<tab title="Strings">
```kotlin
val ageMin = df.min { "age"<Int>() }
val ageMax = df.max { "age"<Int>() }
df
.format().with { bold and textColor(black) and background(white) }
.format("name").with { underline }
.format { "name"["lastName"] }.with { italic }
.format("isHappy").with {
background(if (it as Boolean) green else red)
}
.format("weight").notNull().with { linearBg(it as Int, 50 to blue, 90 to red) }
.format("age").perRowCol { row, col ->
col as DataColumn<Int>
textColor(
linear(value = col[row], from = ageMin to blue, to = ageMax to green),
)
}
```
</tab></tabs>
<!---END-->
<inline-frame src="resources/formatExample_properties.html" width="100%"/>
Alternatively, you could also customize the dataframe in a data-independent manner:
<!---FUN formatExampleNumbers-->
```kotlin
df2.format().perRowCol { row, col ->
val rowIndex = row.index()
val colIndex = row.df().getColumnIndex(col)
if ((rowIndex - colIndex) % 3 == 0) {
background(darkGray) and textColor(white)
} else {
background(white) and textColor(black)
}
}
```
<!---END-->
<inline-frame src="resources/formatExampleNumbers.html" width="100%"/>
## formatHeader
> This method is experimental and may be unstable.
>
> {type="warning"}
Formats the specified column headers.
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.render.FormatHeaderSamples-->
<!---FUN formatHeader-->
```kotlin
df
// Format all column headers with bold
.formatHeader().with { bold }
// Format the "name" column (including nested) header with red text
.formatHeader { name }.with { textColor(red) }
// Override "name"/"lastName" column formating header with blue text
.formatHeader { name.lastName }.with { textColor(blue) }
// Format all numeric column headers with underlines
.formatHeader { colsOf<Number?>() }.with { underline }
```
<!---END-->
<inline-frame src="resources/formatHeader.html" width="100%"/>
+59
View File
@@ -0,0 +1,59 @@
[//]: # (title: gather)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Converts several columns into two columns `key` and `value`. `key` column will contain names of original columns, `value` column will contain values from original columns.
This operation is reverse to [](pivot.md)
```kotlin
gather { columns }
[.explodeLists()]
[.cast<Type>()]
[.notNull()]
[.where { valueFilter }]
[.mapKeys { keyTransform }]
[.mapValues { valueTransform }]
.into(keyColumn, valueColumn) | .keysInto(keyColumn) | .valuesInto(valueColumn)
valueFilter: (value) -> Boolean
keyTransform: (columnName: String) -> K
valueTransform: (value) -> R
```
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
Configuration options:
* `explodeLists` — gathered values of type [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/) will be exploded into their elements, so `where`, `cast`, `notNull` and `mapValues` will be applied to list elements instead of lists themselves
* `cast` — inform compiler about the expected type of gathered elements. This type will be passed to `where` and `mapKeys` lambdas
* `notNull` — skip gathered `null` values
* `where` — filter gathered values
* `mapKeys` — transform gathered column names (keys)
* `mapValues` — transform gathered column values
Storage options:
* `into(keyColumn, valueColumn)` — store gathered key-value pairs in two new columns with names `keyColumn` and `valueColumn`
* `keysInto(keyColumn)` — store only gathered keys (column names) in a new column `keyColumn`
* `valuesInto(valueColumn)` — store only gathered values in a new column `valueColumn`
<!---FUN gather-->
```kotlin
pivoted.gather { "London".."Tokyo" }.into("city", "population")
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.gather.html" width="100%"/>
<!---END-->
<!---FUN gatherWithMapping-->
```kotlin
pivoted.gather { "London".."Tokyo" }
.cast<Int>()
.where { it > 10 }
.mapKeys { it.lowercase() }
.mapValues { 1.0 / it }
.into("city", "density")
```
<!---END-->
+89
View File
@@ -0,0 +1,89 @@
[//]: # (title: getColumn)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
Return column by column name or [column selector](ColumnSelectors.md) as [`DataColumn`](DataColumn.md). Throws exception if requested column doesn't exist.
<!---FUN getColumn-->
<tabs>
<tab title="Properties">
```kotlin
df.getColumn { age }
```
</tab>
<tab title="Strings">
```kotlin
df.getColumn("age")
```
</tab></tabs>
<!---END-->
## getColumnOrNull
Return top-level column by column name or [column selector](ColumnSelectors.md) as [`DataColumn`](DataColumn.md) or null if requested column doesn't exist.
<!---FUN getColumnOrNull-->
<tabs>
<tab title="Properties">
```kotlin
df.getColumnOrNull { age }
```
</tab>
<tab title="Strings">
```kotlin
df.getColumnOrNull("age")
```
</tab></tabs>
<!---END-->
## getColumnGroup
Return top-level column by column name or [column selector](ColumnSelectors.md) as [`ColumnGroup`](DataColumn.md#columngroup). Throws exception if requested column doesn't exist or is not a `ColumnGroup`.
<!---FUN getColumnGroup-->
<tabs>
<tab title="Properties">
```kotlin
df.getColumnGroup { name }
```
</tab>
<tab title="Strings">
```kotlin
df.getColumnGroup("name")
```
</tab></tabs>
<!---END-->
## getColumns
Return list of selected columns.
<!---FUN getColumns-->
<tabs>
<tab title="Properties">
```kotlin
df.getColumns { age and name }
```
</tab>
<tab title="Strings">
```kotlin
df.getColumns("age", "name")
```
</tab></tabs>
<!---END-->
+36
View File
@@ -0,0 +1,36 @@
[//]: # (title: Get columns)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
Get single column by column name:
<!---FUN getColumnByName-->
<tabs>
<tab title="Properties">
```kotlin
df.age
df.name.lastName
```
</tab>
<tab title="Strings">
```kotlin
df["age"]
df["name"]["firstName"]
```
</tab></tabs>
<!---END-->
Get single column by index (starting from 0):
<!---FUN getColumnByIndex-->
```kotlin
df.getColumn(2)
df.getColumnGroup(0).getColumn(1)
```
<!---END-->
+43
View File
@@ -0,0 +1,43 @@
[//]: # (title: Get rows)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
Get single [`DataRow`](DataRow.md) by [index](indexing.md):
<!---FUN getRowByIndex-->
```kotlin
df[2]
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.getRowByIndex.html" width="100%"/>
<!---END-->
Get single [`DataRow`](DataRow.md) by [row condition](DataRow.md#row-conditions):
<!---FUN getRowByCondition-->
<tabs>
<tab title="Properties">
```kotlin
df.single { age == 45 }
df.first { weight != null }
df.minBy { age }
df.maxBy { name.firstName.length }
df.maxByOrNull { weight }
```
</tab>
<tab title="Strings">
```kotlin
df.single { "age"<Int>() == 45 }
df.first { it["weight"] != null }
df.minBy("weight")
df.maxBy { "name"["firstName"]<String>().length }
df.maxByOrNull("weight")
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Access.getRowByCondition.html" width="100%"/>
<!---END-->
+3
View File
@@ -0,0 +1,3 @@
[//]: # (title: Get values)
// TODO
+29
View File
@@ -0,0 +1,29 @@
[//]: # (title: group)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Group columns into [`ColumnsGroups`](DataColumn.md#columngroup).
```text
group { columns }
.into(groupName) | .into { groupNameExpression }
groupNameExpression = DataColumn.(DataColumn) -> String
```
**Reverse operation:** [`ungroup`](ungroup.md), [`flatten`](flatten.md)
It is a special case of [`move`](move.md) operation.
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
<!---FUN group-->
```kotlin
df.group { age and city }.into("info")
df.group { all() }.into { it.type().toString() }.print()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.group.html" width="100%"/>
<!---END-->
+281
View File
@@ -0,0 +1,281 @@
[//]: # (title: groupBy)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Analyze-->
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Splits the rows of [`DataFrame`](DataFrame.md) into groups using one or several columns as grouping keys.
```text
groupBy(moveToTop = true) { columns }
[ transformations ]
reducer | aggregator | pivot
transformations = [ .sortByCount() | .sortByCountAsc() | .sortBy { columns } | .sortByDesc { columns } ]
[ .updateGroups { frameExpression } ]
[ .add(column) { rowExpression } ]
reducer = .minBy { column } | .maxBy { column } | .first [ { rowCondition } ] | .last [ { rowCondition } ]
.concat() | .into([column]) [{ rowExpression }] | .values { valueColumns }
aggregator = .count() | .concat() | .into([column]) [{ rowExpression }] | .values { valueColumns } | .aggregate { aggregations } | .<stat> [ { columns } ]
pivot = .pivot { columns }
[ .default(defaultValue) ]
pivotReducer | pivotAggregator
```
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation,
[groupBy transformations](#transformation), [groupBy aggregations](#aggregation), and [pivot+groupBy](pivot.md#pivot-groupby).
<!---FUN groupBy-->
<tabs>
<tab title="Properties">
```kotlin
df.groupBy { name }
df.groupBy { city and name.lastName }
df.groupBy { age / 10 named "ageDecade" }
```
</tab>
<tab title="Strings">
```kotlin
df.groupBy("name")
df.groupBy { "city" and "name"["lastName"] }
df.groupBy { "age"<Int>() / 10 named "ageDecade" }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupBy.html" width="100%"/>
<!---END-->
Grouping columns can be created inplace:
<!---FUN groupByExpr-->
<tabs>
<tab title="Properties">
```kotlin
df.groupBy { expr { name.firstName.length + name.lastName.length } named "nameLength" }
```
</tab>
<tab title="Strings">
```kotlin
df.groupBy { expr { "name"["firstName"]<String>().length + "name"["lastName"]<String>().length } named "nameLength" }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByExpr.html" width="100%"/>
<!---END-->
With optional `moveToTop` parameter you can choose whether to make a selected *nested column* a top-level column:
<!---FUN groupByMoveToTop-->
```kotlin
df.groupBy(moveToTop = true) { name.lastName }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByMoveToTop.html" width="100%"/>
<!---END-->
or to keep it inside a `ColumnGroup`:
<!---FUN groupByMoveToTopFalse-->
```kotlin
df.groupBy(moveToTop = false) { name.lastName }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByMoveToTopFalse.html" width="100%"/>
<!---END-->
Returns `GroupBy` object.
## Transformation
`GroupBy DataFrame` is a [`DataFrame`](DataFrame.md) with one chosen [`FrameColumn`](DataColumn.md#framecolumn) containing data groups.
It supports the following operations:
* [`add`](add.md)
* [`sortBy`](sortBy.md)
* [`map`](map.md)
* [`pivot`](pivot.md#pivot-groupby)
* [`concat`](concat.md)
Any [`DataFrame`](DataFrame.md) with `FrameColumn` can be reinterpreted as `GroupBy DataFrame`:
<!---FUN dataFrameToGroupBy-->
```kotlin
val key by columnOf(1, 2) // create int column with name "key"
val data by columnOf(df[0..3], df[4..6]) // create frame column with name "data"
val df = dataFrameOf(key, data) // create dataframe with two columns
df.asGroupBy { data } // convert dataframe to GroupBy by interpreting 'data' column as groups
```
<!---END-->
And any [`GroupBy DataFrame`](groupBy.md#transformation) can be reinterpreted as [`DataFrame`](DataFrame.md) with `FrameColumn`:
<!---FUN groupByToFrame-->
```kotlin
df.groupBy { city }.toDataFrame()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByToFrame.html" width="100%"/>
<!---END-->
Use [`concat`](concat.md) to union all data groups of `GroupBy` into original [`DataFrame`](DataFrame.md) preserving new order of rows produced by grouping:
<!---FUN concatGroupBy-->
```kotlin
df.groupBy { name }.concat()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.concatGroupBy.html" width="100%"/>
<!---END-->
<!---TODO ## Reducing--->
To compute one or several [statistics](summaryStatistics.md) per every group of `GroupBy` use `aggregate` function.
Its body will be executed for every data group and has a receiver of type [`DataFrame`](DataFrame.md) that represents current data group being aggregated.
To add a new column to the resulting [`DataFrame`](DataFrame.md), pass the name of new column to infix function `into`:
<!---FUN groupByAggregations-->
<tabs>
<tab title="Properties">
```kotlin
df.groupBy { city }.aggregate {
count() into "total"
count { age > 18 } into "adults"
median { age } into "median age"
min { age } into "min age"
maxBy { age }.name into "oldest"
}
```
</tab>
<tab title="Strings">
```kotlin
df.groupBy("city").aggregate {
count() into "total"
count { "age"<Int>() > 18 } into "adults"
median("age") into "median age"
min("age") into "min age"
maxBy("age")["name"] into "oldest"
}
// or
df.groupBy("city").aggregate {
count() into "total"
count { "age"<Int>() > 18 } into "adults"
"age"<Int>().median() into "median age"
"age"<Int>().min() into "min age"
maxBy("age")["name"] into "oldest"
}
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByAggregations.html" width="100%"/>
<!---END-->
If only one aggregation function is used, column name can be omitted:
<!---FUN groupByAggregateWithoutInto-->
<tabs>
<tab title="Properties">
```kotlin
df.groupBy { city }.aggregate { maxBy { age }.name }
```
</tab>
<tab title="Strings">
```kotlin
df.groupBy("city").aggregate { maxBy("age")["name"] }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByAggregateWithoutInto.html" width="100%"/>
<!---END-->
Most common aggregation functions can be computed directly at [`GroupBy DataFrame`](groupBy.md#transformation) :
<!---FUN groupByDirectAggregations-->
<tabs>
<tab title="Properties">
```kotlin
df.groupBy { city }.max() // max for every column with mutually comparable values
df.groupBy { city }.mean() // mean for every numeric column
df.groupBy { city }.max { age } // max age into column "age"
df.groupBy { city }.sum("total weight") { weight } // sum of weights into column "total weight"
df.groupBy { city }.count() // number of rows into column "count"
df.groupBy { city }
.max { name.firstName.map { it.length } and name.lastName.map { it.length } } // maximum length of firstName or lastName into column "max"
df.groupBy { city }
.medianFor { age and weight } // median age into column "age", median weight into column "weight"
df.groupBy { city }
.minFor { (age into "min age") and (weight into "min weight") } // min age into column "min age", min weight into column "min weight"
df.groupBy { city }.meanOf("mean ratio") { weight?.div(age) } // mean of weight/age into column "mean ratio"
```
</tab>
<tab title="Strings">
```kotlin
df.groupBy("city").max() // max for every column with mutually comparable values
df.groupBy("city").mean() // mean for every numeric column
df.groupBy("city").max("age") // max age into column "age"
df.groupBy("city").sum("weight", name = "total weight") // sum of weights into column "total weight"
df.groupBy("city").count() // number of rows into column "count"
df.groupBy("city").max {
"name"["firstName"]<String>().map { it.length } and "name"["lastName"]<String>().map { it.length }
} // maximum length of firstName or lastName into column "max"
df.groupBy("city")
.medianFor("age", "weight") // median age into column "age", median weight into column "weight"
df.groupBy("city")
.minFor { ("age"<Int>() into "min age") and ("weight"<Int?>() into "min weight") } // min age into column "min age", min weight into column "min weight"
df.groupBy("city").meanOf("mean ratio") {
"weight"<Int?>()?.div("age"<Int>())
} // mean of weight/age into column "mean ratio"
```
</tab></tabs>
<!---END-->
To get all column values for every group without aggregation use `values` function:
* for [ValueColumn](DataColumn.md#valuecolumn) of type `T` it will gather group values into lists of type `List<T>`
* for [ColumnGroup](DataColumn.md#columngroup) it will gather group values into [`DataFrame`](DataFrame.md) and convert [ColumnGroup](DataColumn.md#columngroup) into [FrameColumn](DataColumn.md#framecolumn)
<!---FUN groupByWithoutAggregation-->
<tabs>
<tab title="Properties">
```kotlin
df.groupBy { city }.values()
df.groupBy { city }.values { name and age }
df.groupBy { city }.values { weight into "weights" }
```
</tab>
<tab title="Strings">
```kotlin
df.groupBy("city").values()
df.groupBy("city").values("name", "age")
df.groupBy("city").values { "weight" into "weights" }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.groupByWithoutAggregation.html" width="100%"/>
<!---END-->
+4
View File
@@ -0,0 +1,4 @@
[//]: # (title: GroupBy / concat rows)
* [`groupBy`](groupBy.md) — groups rows of [`DataFrame`](DataFrame.md) by given key columns.
* [`concat`](concat.md) — concatenates rows from several [`DataFrame`](DataFrame.md) objects into single [`DataFrame`](DataFrame.md).
@@ -0,0 +1,7 @@
[//]: # (title: Group / ungroup / flatten columns)
* [`group`](group.md) — groups given columns into [`ColumnGroups`](DataColumn.md#columngroup).
* [`ungroup`](ungroup.md) — ungroups given [`ColumnGroups`](DataColumn.md#columngroup) by replacing them with their children columns
* [`flatten`](flatten.md) — recursively removes all column groupings under given [`ColumnGroups`](DataColumn.md#columngroup), remaining only [`ValueColumns`](DataColumn.md#valuecolumn) and [`FrameColumns`](DataColumn.md#framecolumn)
These operations are special cases of general [`move`](move.md) operation.
@@ -0,0 +1,245 @@
# Kotlin DataFrame for SQL & Backend Developers
<web-summary>
Quickly transition from SQL to Kotlin DataFrame: load your datasets, perform essential transformations, and visualize your results — directly within a Kotlin Notebook.
</web-summary>
<card-summary>
Switching from SQL? Kotlin DataFrame makes it easy to load, process, analyze, and visualize your data — fully interactive and type-safe!
</card-summary>
<link-summary>
Explore Kotlin DataFrame as a SQL or ORM user: read your data, transform columns, group or join tables, and build insightful visualizations with Kotlin Notebook.
</link-summary>
This guide helps Kotlin backend developers with SQL experience quickly adapt to **Kotlin DataFrame**, mapping familiar
SQL and ORM operations to DataFrame concepts.
If you plan to work on a Gradle project without a Kotlin Notebook,
we recommend installing the library together with our [**experimental Kotlin compiler plugin**](Compiler-Plugin.md) (available since version 2.2.*).
This plugin generates type-safe schemas at compile time,
tracking schema changes throughout your data pipeline.
## Add Kotlin DataFrame Gradle dependency
You could read more about the setup of the Gradle build in the [Gradle Setup Guide](SetupGradle.md).
In your Gradle build file (`build.gradle` or `build.gradle.kts`), add the Kotlin DataFrame library as a dependency:
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%'
}
```
</tab>
</tabs>
---
## 1. What is a dataframe?
If youre used to SQL, a **dataframe** is conceptually like a **table**:
- **Rows**: ordered records of data
- **Columns**: named, typed fields
- **Schema**: a mapping of column names to types
Kotlin DataFrame also supports [**hierarchical, JSON-like data**](hierarchical.md) —
columns can contain *[nested dataframes](DataColumn.md#framecolumn)* or *column groups*,
allowing you to represent and transform tree-like structures without flattening.
Unlike a relational DB table:
- A DataFrame object **lives in memory** — theres no storage engine or transaction log
- Its **immutable** — each operation produces a *new* DataFrame
- There is **no concept of foreign keys or relations** between DataFrames
- It can be created from
*any* [source](Data-Sources.md): [CSV](CSV-TSV.md), [JSON](JSON.md), [SQL tables](SQL.md), [Apache Arrow](ApacheArrow.md),
in-memory objects
---
## 2. Reading Data From SQL
Kotlin DataFrame integrates with JDBC, so you can bring SQL data into memory for analysis.
| Approach | Example |
|----------------------------------|---------------------------------------------------------------------|
| **From a table** | `val df = DataFrame.readSqlTable(dbConfig, "customers")` |
| **From a SQL query** | `val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM orders")` |
| **From a JDBC Connection** | `val df = connection.readDataFrame("SELECT * FROM orders")` |
| **From a ResultSet (extension)** | `val df = resultSet.readDataFrame(connection)` |
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
val dbConfig = DbConnectionConfig(
url = "jdbc:postgresql://localhost:5432/mydb",
user = "postgres",
password = "secret"
)
// Table
val customers = DataFrame.readSqlTable(dbConfig, "customers")
// Query
val salesByRegion = DataFrame.readSqlQuery(
dbConfig, """
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
"""
)
// From JDBC connection
connection.readDataFrame("SELECT * FROM orders")
// From ResultSet
val rs = connection.createStatement().executeQuery("SELECT * FROM orders")
rs.readDataFrame(connection)
```
More information can be found [here](readSqlDatabases.md).
## 3. Why Its Not an ORM
Frameworks like **[Hibernate](https://hibernate.org/orm/)** or **[Exposed](https://github.com/JetBrains/Exposed)**:
- Map DB tables to Kotlin objects (entities)
- Track object changes and sync them back to the database
- Focus on **persistence** and **transactions**
Kotlin DataFrame:
- Has no persistence layer
- Doesnt try to map rows to mutable entities
- Focuses on **in-memory analytics**, **transformations**, and **type-safe pipelines**
- The **main idea** is that the schema *changes together with your transformations* — and the [**Compiler Plugin
**](Compiler-Plugin.md) updates the type-safe API automatically under the hood.
- You dont have to manually define or recreate schemas every time — the plugin infers them dynamically from the data or
transformations.
- In ORMs, the mapping layer is **frozen** — schema changes require manual model edits and migrations.
Think of Kotlin DataFrame as a **data analysis/ETL tool**, not an ORM.
---
## 4. Key Differences from SQL & ORMs
| Feature / Concept | SQL Databases (PostgreSQL, MySQL…) | ORM (Hibernate, Exposed…) | Kotlin DataFrame |
|----------------------------|------------------------------------|------------------------------------|---------------------------------------------------------------------|
| **Storage** | Persistent | Persistent | In-memory only |
| **Schema definition** | `CREATE TABLE` DDL | Defined in entity classes | Derived from data or transformations or defined manually |
| **Schema change** | `ALTER TABLE` | Manual migration of entity classes | Automatic via transformations + Compiler Plugin or defined manually |
| **Relations** | Foreign keys | Mapped via annotations | Not applicable |
| **Transactions** | Yes | Yes | Not applicable |
| **DB Indexes** | Yes | Yes (via DB) | Not applicable |
| **Data manipulation** | SQL DML (`INSERT`, `UPDATE`) | CRUD mapped to DB | Transformations only (immutable) |
| **Joins** | `JOIN` keyword | Eager/lazy loading | [`.join()` / `.leftJoin()` DSL](join.md) |
| **Grouping & aggregation** | `GROUP BY` | DB query with groupBy | [`.groupBy().aggregate()`](groupBy.md) |
| **Filtering** | `WHERE` | Criteria API / query DSL | [`.filter { ... }`](filter.md) |
| **Permissions** | `GRANT` / `REVOKE` | DB-level permissions | Not applicable |
| **Execution** | On DB engine | On DB engine | In JVM process |
---
## 5. SQL → Kotlin DataFrame Cheatsheet
### DDL Analogues
| SQL DDL Command / Example | Kotlin DataFrame Equivalent |
|---------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
| **Create table:**<br>`CREATE TABLE person (name text, age int);` | `@DataSchema`<br>`interface Person {`<br>` val name: String`<br>` val age: Int`<br>`}` |
| **Add column:**<br>`ALTER TABLE sales ADD COLUMN profit numeric GENERATED ALWAYS AS (revenue - cost) STORED;` | `.add("profit") { revenue - cost }` |
| **Rename column:**<br>`ALTER TABLE sales RENAME COLUMN old_name TO new_name;` | `.rename { old_name }.into("new_name")` |
| **Drop column:**<br>`ALTER TABLE sales DROP COLUMN old_col;` | `.remove { old_col }` |
| **Modify column type:**<br>`ALTER TABLE sales ALTER COLUMN amount TYPE numeric;` | `.convert { amount }.to<Double>()` |
---
### DML Analogues
| SQL DML Command / Example | Kotlin DataFrame Equivalent |
|--------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|
| `SELECT col1, col2` | `df.select { col1 and col2 }` |
| `WHERE amount > 100` | `df.filter { amount > 100 }` |
| `ORDER BY amount DESC` | `df.sortByDesc { amount }` |
| `GROUP BY region` | `df.groupBy { region }` |
| `SUM(amount)` | `.aggregate { sum { amount } }` |
| `JOIN` | `.join(otherDf) { id match right.id }` |
| `LIMIT 5` | `.take(5)` |
| **Pivot:** <br>`SELECT * FROM crosstab('SELECT region, year, SUM(amount) FROM sales GROUP BY region, year') AS ct(region text, y2023 int, y2024 int);` | `.groupBy { region }.pivot { year }. sum { amount }` |
| **Explode array column:** <br>`SELECT id, unnest(tags) AS tag FROM products;` | `.explode { tags }` |
| **Update column:** <br>`UPDATE sales SET amount = amount * 1.2;` | `.update { amount }.with { it * 1.2 }` |
## 6. Example: SQL vs. DataFrame Side-by-Side
**SQL (PostgreSQL):**
```sql
SELECT region, SUM(amount) AS total
FROM sales
WHERE amount > 0
GROUP BY region
ORDER BY total DESC LIMIT 5;
```
```kotlin
sales.filter { amount > 0 }
.groupBy { region }
.aggregate { sum { amount } into "total" }
.sortByDesc { total }
.take(5)
```
## In Conclusion
- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe
** and fully integrated into Kotlin.
- The main focus is **readability** and schema change safety via
the [Compiler Plugin](Compiler-Plugin.md).
- It is neither a database nor an ORM — a Kotlin DataFrame library does not store data or manage transactions but works as an in-memory
layer for analytics and transformations.
- It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working
with JSON-like structures and combining multiple data sources.
- Use Kotlin DataFrame as a **type-safe DSL** for post-processing, merging data sources, and analytics directly on the
JVM, while keeping your code easily refactorable and IDE-assisted.
- Use Kotlin DataFrame for small- and average-sized datasets, but for large datasets, consider using a more
**performant** database engine.
## What's Next?
If you're ready to go through a complete example, we recommend our **[Quickstart Guide](quickstart.md)**
— you'll learn the basics of reading data, transforming it, and creating visualization step-by-step.
Ready to go deeper? Check out whats next:
- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets,
API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.
- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do.
- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md).
- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
and make working with your data both convenient and type-safe.
- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
for auto-generated column access in your IntelliJ IDEA projects.
- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations
[Kandy Documentation](https://kotlin.github.io/kandy).
@@ -0,0 +1,120 @@
# Guides And Examples
<web-summary>
Browse a collection of guides and examples covering key features and real-world use cases of Kotlin DataFrame — from basics to advanced data analysis.
</web-summary>
<card-summary>
Explore Kotlin DataFrame with detailed user guides and real-world examples,
showcasing practical use cases and data workflows.
</card-summary>
<link-summary>
A curated list of Kotlin DataFrame guides and examples that walk you through common operations and data analysis patterns step by step.
</link-summary>
<!--- TODO: add more guides (migration from pandas and others) and replace GH notebooks with topics --->
## Guides
Explore our structured, in-depth guides to steadily improve your Kotlin DataFrame skills — step by step.
* [](quickstart.md) — get started with Kotlin DataFrame in a few simple steps:
load data, transform it, and visualize it.
<img src="quickstart_preview.png" border-effect="rounded" width="705"/>
* [](Guide-for-backend-SQL-developers.md) — migration guide for backend developers with SQL/ORM experience moving to Kotlin DataFrame
* [](extensionPropertiesApi.md) — learn about extension properties for [`DataFrame`](DataFrame.md)
and make working with your data both convenient and type-safe.
* [Enhanced Column Selection DSL](https://blog.jetbrains.com/kotlin/2024/07/enhanced-column-selection-dsl-in-kotlin-dataframe/)
— explore powerful DSL for typesafe and flexible column selection in Kotlin DataFrame.
* [](Kotlin-DataFrame-Features-in-Kotlin-Notebook.md)
— discover interactive Kotlin DataFrame outputs in
[Kotlin Notebook](https://kotlinlang.org/docs/kotlin-notebook-overview.html).
<img src="ktnb_features_preview.png" border-effect="rounded" width="705"/>
* [40 Puzzles](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/puzzles/40%20puzzles.ipynb)
— inspired by [100 pandas puzzles](https://github.com/ajcr/100-pandas-puzzles).
An interactive guide that takes you from simple tasks to complex challenges,
teaching you how to solve them using Kotlin DataFrame in a concise and elegant style.
* [Reading from files: CSV, JSON, ApacheArrow](read.md)
— read your data from various formats into `DataFrame`.
* [SQL Databases Interaction](readSqlDatabases.md)
— set up SQL database access and read query results efficiently into `DataFrame`.
* [Custom SQL Database Support](readSqlFromCustomDatabase.md)
— extend DataFrame library for custom SQL database support.
* [GeoDataFrame Guide](https://kotlin.github.io/kandy/geo-plotting-guide.html)
— explore the GeoDataFrame module that brings a convenient Kotlin DataFrame API to geospatial workflows,
enhanced with beautiful Kandy-Geo visualizations (*experimental*).
<img src="geoguide_preview.png" border-effect="rounded" width="705"/>
* [Using Unsupported Data Sources](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples):
— A guide by examples. While these might one day become proper integrations of DataFrame, for now,
we provide them as examples for how to make such integrations yourself.
* [Apache Spark Interop (With and Without Kotlin Spark API)](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/spark)
* [Multik Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/multik)
* [JetBrains Exposed Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/exposed)
* [Hibernate ORM](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/hibernate)
* [OpenAPI Guide](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/json/KeyValueAndOpenApi.ipynb)
— learn how to parse and explore [OpenAPI](https://swagger.io) JSON structures using Kotlin DataFrame,
enabling structured access and intuitive analysis of complex API schemas (*experimental*, supports OpenAPI 3.0.0).
## Examples
Explore our extensive collection of practical examples and real-world analytics workflows.
* [Kotlin DataFrame Compiler Plugin Gradle Example](https://github.com/Kotlin/dataframe/blob/master/examples/kotlin-dataframe-plugin-gradle-example)
— a simple Gradle project demonstrating the usage of the [compiler plugin](Compiler-Plugin.md),
showcasing DataFrame expressions with [extension properties](extensionPropertiesApi.md)
that are generated on-the-fly in the IDEA project.
* [Kotlin DataFrame Compiler Plugin Maven Example](https://github.com/Kotlin/dataframe/blob/master/examples/kotlin-dataframe-plugin-gradle-example)
— a simple Maven project demonstrating the usage of the [compiler plugin](Compiler-Plugin.md),
showcasing DataFrame expressions with [extension properties](extensionPropertiesApi.md)
that are generated on-the-fly in the IDEA project.
* [Titanic Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/titanic/Titanic.ipynb)
— discover the famous "Titanic"
dataset with the Kotlin DataFrame analysis toolkit
and [Kandy](https://kotlin.github.io/kandy/) visualizations.
* [Track and Analyze GitHub Star Growth](https://blog.jetbrains.com/kotlin/2024/08/track-and-analyze-github-star-growth-with-kandy-and-kotlin-dataframe/)
— query GitHubs API with the Kotlin Notebook Ktor client,
then analyze and visualize the data using Kotlin DataFrame and [Kandy](https://kotlin.github.io/kandy/).
* [GitHub Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/github/github.ipynb)
— a practical example of working with deeply nested, hierarchical DataFrames using GitHub data.
* [Netflix Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/netflix/netflix.ipynb)
— explore TV shows and movies from Netflix with the powerful Kotlin DataFrame API and beautiful
[Kandy](https://kotlin.github.io/kandy/) visualizations.
* [Top-12 German Companies Financial Analyze](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/top_12_german_companies)
— analyze key financial metrics for several major German companies.
* [Movies Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb)
— basic Kotlin DataFrame operations on data from [movielens](https://movielens.org/).
* [YouTube Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/youtube/Youtube.ipynb)
— explore YouTube videos with YouTube REST API and Kotlin DataFrame.
* [IMDb SQL Database Example](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples/blob/master/notebooks/imdb.ipynb)
— analyze IMDb data stored in MariaDB using Kotlin DataFrame
and visualize with [Kandy](https://kotlin.github.io/kandy/).
* [Reading Parquet files from Apache Spark](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/spark-parquet-dataframe)
— this project showcases how to export data and ML models from Apache Spark via reading from Parquet files.
Also, [Kandy](https://kotlin.github.io/kandy/) used to visualize the exported data and Linear Regression model.
See also [Kandy User Guides](https://kotlin.github.io/kandy/user-guide.html)
and [Examples Gallery](https://kotlin.github.io/kandy/examples.html)
for the best data visualizations using Kotlin DataFrame and Kandy together!
<img src="kandy_gallery_preview.png" border-effect="rounded" width="705"/>
@@ -0,0 +1,108 @@
# Kotlin DataFrame Features in Kotlin Notebook
<web-summary>
Discover how Kotlin DataFrame integrates with Kotlin Notebook for seamless interactive data analysis in IntelliJ IDEA.
</web-summary>
<card-summary>
Load, explore, and export your data interactively using Kotlin DataFrame in Kotlin Notebook.
</card-summary>
<link-summary>
Learn how to load, explore, drill into, export, and interact with data using Kotlin DataFrame in Kotlin Notebook.
</link-summary>
The [Kotlin Notebook Plugin for IntelliJ IDEA](https://plugins.jetbrains.com/plugin/16340-kotlin-notebook),
combined with Kotlin DataFrame, offers powerful data analysis capabilities within an interactive environment.
Here are the key features:
### Drag-and-Drop Data Files
You can quickly load data into `DataFrame` into a notebook by simply dragging and dropping a file
(.csv/.json/.xlsx and .geojson/.shp) directly into the notebook editor:
<video src="ktnb_drag_n_drop.mp4" controls=""/>
### Visual Data Exploration
**Page through your data**:
The pagination feature lets you move through your data one page at a time, making it possible to view large datasets.
**Sort by column with a single click**:
You can sort any column with a click.
This is a convenient alternative to using `sortBy` in separate cells.
**Go straight to the data you need**:
You can jump directly to a particular row or column if you want something specific.
This makes working with large datasets more straightforward.
<video src="https://github.com/user-attachments/assets/aeae1c79-9755-4558-bac4-420bf1331f39" controls=""/>
### Drill down into nested data
When your data has multiple layers, like a table within a table,
you can now click on a cell containing a nested table to view these details directly.
This makes it easy to go deeper into your data and then return to where you were.
<video src="https://github.com/user-attachments/assets/ef9509be-e19b-469c-9bad-0ce81eec36b0" controls=""/>
### Visualize multiple tables via tabs
You can open and visualize multiple tables in separate tabs.
This feature is tailored to those who need to compare, contrast, or monitor different datasets simultaneously.
<video src="https://github.com/user-attachments/assets/51b7a6e3-0187-49b3-bf5e-0c4d60f8b769" controls=""/>
### Exporting to files
You can export data directly from the dataframe into various file formats.
This simplifies sharing and further analysis.
The interface supports exporting data to JSON for web applications,
CSV for spreadsheet tools, and XML for data interchange.
<video src="https://github.com/user-attachments/assets/ec28c59a-1555-44ce-98f6-a60d8feae347" controls=""/>
### Convenient copying of data from tables
You can click and drag to select the data you need,
or you can use keyboard shortcuts for quicker selection
and then copy whats needed with a simple right-click or another shortcut.
Its designed to feel intuitive,
like copying text from a document, but with the structure and format of your data preserved.
<video src="https://github.com/user-attachments/assets/88e53dfb-361f-40f8-bffb-52a512cdd3cd" controls=""/>
### Rendering of images in the cell
Table widget can render `BufferedImage`s.
Given a column of images, right-click on the cell and click `View Image` in the context menu.
![ktnb_cell_image.png](ktnb_cell_image.png)
### Clickable URI links
String values starting with `https://`, `https://`, `file:/` are treated as clickable links that open, for example, your browser or file manager.
Click on the cell to trigger a toolbar to appear.
![ktnb_clickable_link.png](ktnb_clickable_link.png)
Clicking on `Open URL` or `Open File URI` for the first time triggers a notification with a link to `Settings``URL Click Settings`.
Choose what protocols should be allowed.
![ktnb_link_settings.png](ktnb_link_settings.png)
To get started, ensure you have the latest version of the Kotlin Notebook Plugin installed in IntelliJ IDEA,
and begin exploring your data using Kotlin DataFrame in your notebook cells.
## Related documentation
- [Kotlin for Data Analysis in notebooks](https://kotlinlang.org/docs/kotlin-notebook-overview.html):
Learn more about Kotlin Notebook capabilities for data analysis.
- [Kotlin Notebooks in IntelliJ IDEA](https://www.jetbrains.com/help/idea/kotlin-notebook.html):
Detailed documentation on working with Kotlin Notebooks in the IDE.
+361
View File
@@ -0,0 +1,361 @@
# Quickstart Guide
<web-summary>
Get started with Kotlin DataFrame in a few simple steps: load data, transform it, and visualize it — all in an interactive Kotlin Notebook.
</web-summary>
<card-summary>
Get started with Kotlin DataFrame right away — integrate it seamlessly and load process, analyze and visualize some data!
</card-summary>
<link-summary>
Learn the basics of Kotlin DataFrame: reading data, applying transformations, and building plots — with full interactivity in Kotlin Notebook.
</link-summary>
This guide shows how to quickly get started with **Kotlin DataFrame**:
you'll learn how to load data, perform basic transformations, and build a simple plot using Kandy.
We recommend [starting with **Kotlin Notebook**](SetupKotlinNotebook.md) for the best beginner experience —
everything works out of the box,
including interactivity and rich DataFrame and plots rendering.
You can instantly see the results of each operation: view the contents of your DataFrames after every transformation,
inspect individual rows and columns, and explore data step-by-step in a live and interactive way.
You can view this guide as a
[notebook on GitHub](https://github.com/Kotlin/dataframe/tree/master/examples/notebooks/quickstart/quickstart.ipynb)
or download <resource src="quickstart.ipynb"></resource>.
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.guides.QuickStartGuide-->
To start working with Kotlin DataFrame in a notebook, run the cell with the next code:
```kotlin
%useLatestDescriptors
%use dataframe
```
This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame
rendering. Learn more [here](SetupKotlinNotebook.md#integrate-kotlin-dataframe).
## Read DataFrame
Kotlin DataFrame supports all popular data formats, including CSV, JSON, and Excel, as well as reading from various
databases. Read a CSV with the "Jetbrains Repositories" dataset into `df` variable:
<!---FUN notebook_test_quickstart_2-->
```kotlin
val df = DataFrame.readCsv(
"https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
)
```
<!---END-->
## Display And Explore
To display your dataframe as a cell output, place it in the last line of the cell:
<!---FUN notebook_test_quickstart_3-->
```kotlin
df
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_3.html" width="705px" height="500px"></inline-frame>
Kotlin Notebook has special interactive outputs for `DataFrame`. Learn more about them here.
Use `.describe()` method to get dataset summaries — column types, number of nulls, and simple statistics.
<!---FUN notebook_test_quickstart_4-->
```kotlin
df.describe()
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_4.html" width="705px" height="500px"></inline-frame>
## Select Columns
Kotlin DataFrame features a typesafe Columns Selection DSL, enabling flexible and safe selection of any combination of
columns.
Column selectors are widely used across operations — one of the simplest examples is `.select { }`, which returns a new
DataFrame with only the columns chosen in Columns Selection expression.
*After executing the cell* where a `DataFrame` variable is declared,
[extension properties](extensionPropertiesApi.md) for its columns are automatically generated.
These properties can then be used in the Columns Selection DSL expression for typesafe and convenient column access.
Select some columns:
<!---FUN notebook_test_quickstart_5-->
```kotlin
// Select "full_name", "stargazers_count" and "topics" columns
val dfSelected = df.select { full_name and stargazers_count and topics }
dfSelected
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_5.html" width="705px" height="500px"></inline-frame>
> With a [Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md) enabled,
> you can use auto-generated properties in your IntelliJ IDEA projects.
## Row Filtering
Some operations use the [DataRow API](DataRow.md), with expressions and conditions
that apply for all `DataFrame` rows.
For example, `.filter { }` that returns a new `DataFrame` with rows that satisfy a condition given by row expression.
Inside a row expression, you can access the values of the current row by column names through auto-generated properties.
Similar to the [Columns Selection DSL](ColumnSelectors.md),
but in this case the properties represent actual values, not column references.
Filter rows by "stargazers_count" value:
<!---FUN notebook_test_quickstart_6-->
```kotlin
// Keep only rows where "stargazers_count" value is more than 1000
val dfFiltered = dfSelected.filter { stargazers_count >= 1000 }
dfFiltered
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_6.html" width="705px" height="500px"></inline-frame>
## Columns Rename
Columns can be renamed using the `.rename { }` operation, which also uses the Columns Selection DSL to select a column
to rename.
The `rename` operation does not perform the renaming immediately; instead, it creates an intermediate object that must
be finalized into a new `DataFrame` by calling the `.into()` function with the new column name.
Rename "full_name" and "stargazers_count" columns:
<!---FUN notebook_test_quickstart_7-->
```kotlin
// Rename "full_name" column into "name"
val dfRenamed = dfFiltered.rename { full_name }.into("name")
// And "stargazers_count" into "starsCount"
.rename { stargazers_count }.into("starsCount")
dfRenamed
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_7.html" width="705px" height="500px"></inline-frame>
## Modify Columns
Columns can be modified using the `update { }` and `convert { }` operations.
Both operations select columns to modify via the Columns Selection DSL and, similar to `rename`, create an intermediate
object that must be finalized to produce a new `DataFrame`.
The `update` operation preserves the original column types, while `convert` allows changing the type.
In both cases, column names and their positions remain unchanged.
Update "name" and convert "topics":
<!---FUN notebook_test_quickstart_8-->
```kotlin
val dfUpdated = dfRenamed
// Update "name" values with only its second part (after '/')
.update { name }.with { it.split("/")[1] }
// Convert "topics" `String` values into `List<String>` by splitting:
.convert { topics }.with { it.removePrefix("[").removeSuffix("]").split(", ") }
dfUpdated
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_8.html" width="705px" height="500px"></inline-frame>
Check the new "topics" type out:
<!---FUN notebook_test_quickstart_9-->
```kotlin
dfUpdated.topics.type()
```
<!---END-->
Output:
```
kotlin.collections.List<kotlin.String>
```
## Adding New Columns
The `.add { }` function allows creating a `DataFrame` with a new column, where the value for each row is computed based
on the existing values in that row. These values can be accessed within the row expressions.
Add a new `Boolean` column "isIntellij":
<!---FUN notebook_test_quickstart_10-->
```kotlin
// Add a `Boolean` column indicating whether the `name` contains the "intellij" substring
// or the topics include "intellij".
val dfWithIsIntellij = dfUpdated.add("isIntellij") {
name.contains("intellij") || "intellij" in topics
}
dfWithIsIntellij
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_10.html" width="705px" height="500px"></inline-frame>
## Grouping And Aggregating
A `DataFrame` can be grouped by column keys, meaning its rows are split into groups based on the values in the key
columns.
The `.groupBy { }` operation selects columns and groups the `DataFrame` by their values, using them as grouping keys.
The result is a `GroupBy` — a `DataFrame`-like structure that associates each key with the corresponding subset of the
original `DataFrame`.
Group `dfWithIsIntellij` by "isIntellij":
<!---FUN notebook_test_quickstart_11-->
```kotlin
val groupedByIsIntellij = dfWithIsIntellij.groupBy { isIntellij }
groupedByIsIntellij
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_11.html" width="705px" height="500px"></inline-frame>
A `GroupBy` can be aggregated — that is, you can compute one or several summary statistics for each group.
The result of the aggregation is a `DataFrame` containing the key columns along with new columns holding the computed
statistics for a corresponding group.
For example, `count()` computes size of group:
<!---FUN notebook_test_quickstart_12-->
```kotlin
groupedByIsIntellij.count()
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_12.html" width="705px" height="500px"></inline-frame>
Compute several statistics with `.aggregate { }` that provides an expression for aggregating:
<!---FUN notebook_test_quickstart_13-->
```kotlin
groupedByIsIntellij.aggregate {
// Compute sum and max of "starsCount" within each group into "sumStars" and "maxStars" columns
sumOf { starsCount } into "sumStars"
maxOf { starsCount } into "maxStars"
}
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_13.html" width="705px" height="500px"></inline-frame>
## Sorting Rows
`.sort {}`/`.sortByDesc` sortes rows by value in selected columns, returning a DataFrame with sorted rows. `take(n)`
returns a new `DataFrame` with the first `n` rows.
Combine them to get Top-10 repositories by number of stars:
<!---FUN notebook_test_quickstart_14-->
```kotlin
val dfTop10 = dfWithIsIntellij
// Sort by "starsCount" value descending
.sortByDesc { starsCount }.take(10)
dfTop10
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_14.html" width="705px" height="500px"></inline-frame>
## Plotting With Kandy
Kandy is a Kotlin plotting library designed to bring Kotlin DataFrame features into chart creation, providing a
convenient and typesafe way to build data visualizations.
Kandy can be loaded into notebook using `%use kandy`:
```kotlin
%use kandy
```
Build a simple bar chart with `.plot { }` extension for DataFrame, that allows to use extension properties inside Kandy
plotting DSL (plot will be rendered as an output after cell execution):
<!---FUN notebook_test_quickstart_16-->
```kotlin
dfTop10.plot {
bars {
x(name)
y(starsCount)
}
layout.title = "Top 10 JetBrains repositories by stars count"
}
```
<!---END-->
![notebook_test_quickstart_16](notebook_test_quickstart_16.svg)
## Write DataFrame
A `DataFrame` supports writing to all formats that it is capable of reading.
Write into Excel:
<!---FUN notebook_test_quickstart_17-->
```kotlin
dfWithIsIntellij.writeExcel("jb_repos.xlsx")
```
<!---END-->
## What's Next?
In this quickstart, we covered the basics — reading data, transforming it, and building a simple visualization.
Ready to go deeper? Check out whats next:
- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets,
API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.
- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do.
- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md).
- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
and make working with your data both convenient and type-safe.
- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
for auto-generated column access in your IntelliJ IDEA projects.
- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations learning
[Kandy Documentation](https://kotlin.github.io/kandy).
+16
View File
@@ -0,0 +1,16 @@
[//]: # (title: head)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Analyze-->
Returns [`DataFrame`](DataFrame.md) containing first `n` (default 5) rows.
<!---FUN head-->
```kotlin
df.head(3)
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.head.html" width="100%"/>
<!---END-->
Similar to [`take`](sliceRows.md#take).
+31
View File
@@ -0,0 +1,31 @@
[//]: # (title: implode)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Returns [`DataFrame`](DataFrame.md) where values in given columns are merged into lists grouped by other columns.
```text
implode(dropNA = false) [ { columns } ]
```
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
**Parameters:**
* `dropNA` — if `true`, removes `NA` values from merged lists.
**Reverse operation:** [`explode`](explode.md)
Imploded columns will change their types:
* `T` to `List<T>`
* [`DataRow`](DataRow.md) to [`DataFrame`](DataFrame.md)
Imploded [`ColumnGroup`](DataColumn.md#columngroup) will convert into [`FrameColumn`](DataColumn.md#framecolumn)
<!---FUN implode-->
```kotlin
df.implode { name and age and weight and isHappy }
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.implode.html" width="100%"/>
<!---END-->
+23
View File
@@ -0,0 +1,23 @@
[//]: # (title: Indexing)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
<!---FUN getCell-->
<tabs>
<tab title="Properties">
```kotlin
df.age[1]
df[1].age
```
</tab>
<tab title="Strings">
```kotlin
df["age"][1]
df[1]["age"]
```
</tab></tabs>
<!---END-->
+11
View File
@@ -0,0 +1,11 @@
[//]: # (title: inferType)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
Changes the type of the selected columns based on the runtime values stored in these columns.
The resulting type of the column will be the nearest common supertype of all column values.
```text
inferType [ { columns } ]
```
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
+13
View File
@@ -0,0 +1,13 @@
[//]: # (title: General info)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Analyze-->
General information about [`DataFrame`](DataFrame.md):
* [`count`](count.md) / [`rowsCount()`](rowsCount.md) — number of rows
* [`countDistinct()`](countDistinct.md) — number of distinct rows
* [`columnsCount()`](columnsCount.md) — number of columns
* [`columnNames()`](columnNames.md) — list of column names
* [`columnTypes()`](columnTypes.md) — list of column types
* [`head(n)`](head.md) — first n rows (default 5)
* [`schema()`](schema.md) — schema of columns
* [`describe()`](describe.md) — general statistics for every column
+58
View File
@@ -0,0 +1,58 @@
# tail
<web-summary>
Discover `tail` operation in Kotlin Dataframe.
</web-summary>
<card-summary>
Discover `tail` operation in Kotlin Dataframe.
</card-summary>
<link-summary>
Discover `tail` operation in Kotlin Dataframe.
</link-summary>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.info.TailSamples-->
Returns a [`DataFrame`](DataFrame.md) with the last five rows.
This is equivalent to calling [`takeLast`](sliceRows.md#takelast) with the same `numRows` argument.
By default, `numRows = 5`.
```kotlin
df.tail(numRows: Int = 5)
```
**Related operations**: [`head`](head.md), [`takeLast`](sliceRows.md#takelast), [`take`](sliceRows.md#take).
### Examples
<!---FUN notebook_test_tail_1-->
```kotlin
df
```
<!---END-->
<inline-frame src="./resources/notebook_test_tail_1.html" width="100%" height="500px"></inline-frame>
Default last 5 rows:
<!---FUN notebook_test_tail_2-->
```kotlin
df.tail()
```
<!---END-->
<inline-frame src="./resources/notebook_test_tail_2.html" width="100%" height="500px"></inline-frame>
Specify number of rows:
<!---FUN notebook_test_tail_3-->
```kotlin
df.tail(numRows = 2)
```
<!---END-->
<inline-frame src="./resources/notebook_test_tail_3.html" width="100%" height="500px"></inline-frame>

Some files were not shown because too many files have changed in this diff Show More