init research

This commit is contained in:
2026-02-08 11:20:43 -10:00
commit bdf064f54d
3041 changed files with 1592200 additions and 0 deletions
@@ -0,0 +1,24 @@
# asIterable
<web-summary>
Discover `asIterable` operation in Kotlin Dataframe.
</web-summary>
<card-summary>
Discover `asIterable` operation in Kotlin Dataframe.
</card-summary>
<link-summary>
Discover `asIterable` operation in Kotlin Dataframe.
</link-summary>
Returns values of this [`DataColumn`](DataColumn.md) as
[Iterable](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.collections/-iterable/).
```kotlin
col.asIterable()
```
**Related operation**: [](asSequenceColumn.md).
@@ -0,0 +1,24 @@
# asSequence
<web-summary>
Discover `asSequence` operation in Kotlin Dataframe.
</web-summary>
<card-summary>
Discover `asSequence` operation in Kotlin Dataframe.
</card-summary>
<link-summary>
Discover `asSequence` operation in Kotlin Dataframe.
</link-summary>
Returns values of this [`DataColumn`](DataColumn.md) as
[Sequence](https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.sequences/-sequence/).
```kotlin
col.asSequence()
```
**Related operation**: [](asIterable.md).
@@ -0,0 +1,57 @@
# between
<web-summary>
Return a Boolean DataColumn indicating whether each value lies between two bounds.
</web-summary>
<card-summary>
Return a Boolean DataColumn indicating whether each value lies between two bounds.
</card-summary>
<link-summary>
Return a Boolean DataColumn indicating whether each value lies between two bounds.
</link-summary>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.column.BetweenSamples-->
Returns a [`DataColumn`](DataColumn.md) of `Boolean` values indicating whether each element in this column
lies between the given lower and upper boundaries.
If `includeBoundaries` is `true` (default), values equal to the lower or upper boundary are also considered in range.
```kotlin
col.between(left, right, includeBoundaries)
```
### Examples
<!---FUN notebook_test_between_1-->
```kotlin
df
```
<!---END-->
<inline-frame src="./resources/notebook_test_between_1.html" width="100%" height="500px"></inline-frame>
Check ages are between 18 and 25 inclusive:
<!---FUN notebook_test_between_2-->
```kotlin
df.age.between(left = 18, right = 25)
```
<!---END-->
<inline-frame src="./resources/notebook_test_between_2.html" width="100%" height="500px"></inline-frame>
Strictly between 18 and 25 (excluding boundaries):
<!---FUN notebook_test_between_3-->
```kotlin
df.age.between(left = 18, right = 25, includeBoundaries = false)
```
<!---END-->
<inline-frame src="./resources/notebook_test_between_3.html" width="100%" height="500px"></inline-frame>
@@ -0,0 +1,397 @@
[//]: # (title: join)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.multiple.JoinSamples-->
Joins two [`DataFrame`](DataFrame.md) objects by join columns.
A *join* creates a new dataframe by combining rows from two input dataframes according to one or more key columns.
Rows are merged when the values in the join columns match.
If there is no match, whether the row is included and how missing values are filled depends on the type of join (e.g., inner, left, right, full).
Returns a new [`DataFrame`](DataFrame.md) that contains the merged rows and columns from both inputs.
```kotlin
join(otherDf, type = JoinType.Inner) [ { joinColumns } ]
joinColumns: JoinDsl.(LeftDataFrame) -> Columns
interface JoinDsl: LeftDataFrame {
val right: RightDataFrame
fun DataColumn.match(rightColumn: DataColumn)
}
```
`joinColumns` is a special case of [columns selector](ColumnSelectors.md) that defines column mapping for join.
Related operations: [](multipleDataFrames.md)
## Examples
### Join with explicit keys (with different names) {collapsible="true"}
Use the Join DSL when the key column names differ:
- access the right `DataFrame` via `right`;
- define the join condition with **`match`**.
<!---FUN notebook_test_join_3-->
```kotlin
dfAges
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_3.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_5-->
```kotlin
dfCities
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_5.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_6-->
```kotlin
// INNER JOIN on differently named keys:
// Merge a row when dfAges.firstName == dfCities.name.
// With the given data all 3 names match → all rows merge.
dfAges.join(dfCities) { firstName match right.name }
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_6.html" width="100%" height="500px"></inline-frame>
### Join with explicit keys (with the same names) {collapsible="true"}
If mapped columns have the same name, just select join columns (one or several) from the left [`DataFrame`](DataFrame.md):
<!---FUN notebook_test_join_8-->
```kotlin
dfLeft
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_8.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_10-->
```kotlin
dfRight
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_10.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_11-->
```kotlin
// INNER JOIN on "name" only:
// Merge when left.name == right.name.
// Duplicate keys produce multiple merged rows (one per pairing).
dfLeft.join(dfRight) { name }
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_11.html" width="100%" height="500px"></inline-frame>
> In this example, the "city" columns from the left and right dataframes do not match to each other.
> After joining, the "city" column from the right dataframe is included in the result dataframe
> with the name **"city1"** to avoid a name conflict.
> { style = "note" }
### Join with implicit keys (all columns with the same name) {collapsible="true"}
If `joinColumns` is not specified, columns with the same name from both [`DataFrame`](DataFrame.md)
objects will be used as join columns:
<!---FUN dfLeftImplicit-->
```kotlin
dfLeft
```
<!---END-->
<inline-frame src="./resources/dfLeftImplicit.html" width="100%" height="500px"></inline-frame>
<!---FUN dfRightImplicit-->
```kotlin
dfRight
```
<!---END-->
<inline-frame src="./resources/dfRightImplicit.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_12-->
```kotlin
// INNER JOIN on all same-named columns ("name" and "city"):
// Merge when BOTH name AND city are equal; otherwise the row is dropped.
dfLeft.join(dfRight)
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_12.html" width="100%" height="500px"></inline-frame>
## Join types
Supported join types:
* `Inner` (default) — only matched rows from left and right [`DataFrame`](DataFrame.md) objects
* `Filter` — only matched rows from left [`DataFrame`](DataFrame.md)
* `Left` — all rows from left [`DataFrame`](DataFrame.md), mismatches from right [`DataFrame`](DataFrame.md) filled with `null`
* `Right` — all rows from right [`DataFrame`](DataFrame.md), mismatches from left [`DataFrame`](DataFrame.md) filled with `null`
* `Full` — all rows from left and right [`DataFrame`](DataFrame.md) objects, any mismatches filled with `null`
* `Exclude` — only mismatched rows from left [`DataFrame`](DataFrame.md)
For every join type there is a shortcut operation:
```kotlin
df.innerJoin(otherDf) [ { joinColumns } ]
df.filterJoin(otherDf) [ { joinColumns } ]
df.leftJoin(otherDf) [ { joinColumns } ]
df.rightJoin(otherDf) [ { joinColumns } ]
df.fullJoin(otherDf) [ { joinColumns } ]
df.excludeJoin(otherDf) [ { joinColumns } ]
```
### Examples {id="examples_1"}
#### Inner {collapsible="true"}
<!---FUN notebook_test_join_13-->
```kotlin
dfLeft
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_13.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_14-->
```kotlin
dfRight
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_14.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_15-->
```kotlin
// INNER JOIN:
// Combines columns from the left and right dataframes
// and keep only rows where (name, city) matches on both sides.
dfLeft.innerJoin(dfRight) { name and city }
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_15.html" width="100%" height="500px"></inline-frame>
#### Filter {collapsible="true"}
<!---FUN notebook_test_join_13-->
```kotlin
dfLeft
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_13.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_14-->
```kotlin
dfRight
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_14.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_16-->
```kotlin
// FILTER JOIN:
// Keep ONLY left rows that have ANY match on (name, city).
// No right-side columns are added.
dfLeft.filterJoin(dfRight) { name and city }
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_16.html" width="100%" height="500px"></inline-frame>
#### Left {collapsible="true"}
<!---FUN notebook_test_join_13-->
```kotlin
dfLeft
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_13.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_14-->
```kotlin
dfRight
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_14.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_17-->
```kotlin
// LEFT JOIN:
// Keep ALL left rows and add columns from the right dataframe.
// If (name, city) matches, attach right columns values from
// the corresponding row in the right dataframe;
// if not (e.g. ("Bob", "Dubai") row), fill them with `null`.
dfLeft.leftJoin(dfRight) { name and city }
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_17.html" width="100%" height="500px"></inline-frame>
#### Right {collapsible="true"}
<!---FUN notebook_test_join_13-->
```kotlin
dfLeft
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_13.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_14-->
```kotlin
dfRight
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_14.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_18-->
```kotlin
// RIGHT JOIN:
// Keep ALL right rows and add columns from the left dataframe.
// If (name, city) matches, attach left columns values from
// the corresponding row in the left dataframe;
// if not (e.g. ("Bob", "Tokyo") row), fill them with `null`.
dfLeft.rightJoin(dfRight) { name and city }
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_18.html" width="100%" height="500px"></inline-frame>
#### Full {collapsible="true"}
<!---FUN notebook_test_join_13-->
```kotlin
dfLeft
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_13.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_14-->
```kotlin
dfRight
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_14.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_19-->
```kotlin
// FULL JOIN:
// Keep ALL rows from both sides. Where there's no match on (name, city),
// the other side is filled with nulls.
dfLeft.fullJoin(dfRight) { name and city }
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_19.html" width="100%" height="500px"></inline-frame>
#### Exclude {collapsible="true"}
<!---FUN notebook_test_join_13-->
```kotlin
dfLeft
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_13.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_14-->
```kotlin
dfRight
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_14.html" width="100%" height="500px"></inline-frame>
<!---FUN notebook_test_join_20-->
```kotlin
// EXCLUDE JOIN:
// Keep ONLY left rows that have NO match on (name, city).
// Useful to find "unpaired" left rows.
dfLeft.excludeJoin(dfRight) { name and city }
```
<!---END-->
<inline-frame src="./resources/notebook_test_join_20.html" width="100%" height="500px"></inline-frame>
@@ -0,0 +1,22 @@
# Utility functions
<web-summary>
Overview of common utility operations in Kotlin Dataframe.
</web-summary>
<card-summary>
Overview of common utility operations in Kotlin Dataframe.
</card-summary>
<link-summary>
Overview of common utility operations in Kotlin Dataframe.
</link-summary>
Explore frequently used helpers for querying and transforming your data:
- [`all`](all.md) — Check whether all rows satisfy a predicate.
- [`any`](any.md) — Check whether any row satisfies a predicate.
- [`chunked`](chunked.md) — Split a [`DataFrame`](DataFrame.md) into consecutive chunks and return them as a
[`FrameColumn`](DataColumn.md#framecolumn).
- [`shuffle`](shuffle.md) — Randomly reorder rows.
@@ -0,0 +1,70 @@
# all
<web-summary>
Discover `all` operation in Kotlin Dataframe.
</web-summary>
<card-summary>
Discover `all` operation in Kotlin Dataframe.
</card-summary>
<link-summary>
Discover `all` operation in Kotlin Dataframe.
</link-summary>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.utils.AllSamples-->
Checks if all rows in the [](DataFrame.md) satisfy the predicate.
Returns `Boolean``true` if every row satisfies the predicate, `false` otherwise.
```kotlin
all { rowCondition }
rowCondition: (DataRow) -> Boolean
```
**Related operations**: [](any.md), [](filter.md), [](single.md), [](count.md).
### Examples
<!---FUN notebook_test_all_3-->
```kotlin
df
```
<!---END-->
<inline-frame src="./resources/notebook_test_all_3.html" width="100%" height="500px"></inline-frame>
Check if all persons' `age` is greater than 21:
<!---FUN notebook_test_all_4-->
```kotlin
df.all { age > 21 }
```
<!---END-->
Output:
```text
false
```
Check if all persons have `age` greater or equal to 15:
<!---FUN notebook_test_all_5-->
```kotlin
df.all { name.first().isUpperCase() && age >= 15 }
```
<!---END-->
Output:
```text
true
```
@@ -0,0 +1,70 @@
# any
<web-summary>
Discover `any` operation in Kotlin Dataframe.
</web-summary>
<card-summary>
Discover `any` operation in Kotlin Dataframe.
</card-summary>
<link-summary>
Discover `any` operation in Kotlin Dataframe.
</link-summary>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.utils.AnySamples-->
Checks if there is at least one row in the [](DataFrame.md) that satisfies the predicate.
Returns `Boolean``true` if there is at least one row that satisfies the predicate, `false` otherwise.
```kotlin
df.any { rowCondition }
rowCondition: (DataRow) -> Boolean
```
**Related operations**: [](all.md), [](filter.md), [](single.md), [](count.md).
### Examples
<!---FUN notebook_test_any_3-->
```kotlin
df
```
<!---END-->
<inline-frame src="./resources/notebook_test_any_3.html" width="100%" height="500px"></inline-frame>
Check if any person `age` is greater than 21:
<!---FUN notebook_test_any_4-->
```kotlin
df.any { age > 21 }
```
<!---END-->
Output:
```text
false
```
Check if there is any person with `age` equal to 15 and `name` equal to "Alice":
<!---FUN notebook_test_any_5-->
```kotlin
df.any { age == 15 && name == "Alice" }
```
<!---END-->
Output:
```text
true
```
@@ -0,0 +1,64 @@
# chunked
<web-summary>
Discover `chunked` operation in Kotlin Dataframe.
</web-summary>
<card-summary>
Discover `chunked` operation in Kotlin Dataframe.
</card-summary>
<link-summary>
Discover `chunked` operation in Kotlin Dataframe.
</link-summary>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.utils.ChunkedSamples-->
Splits a [`DataFrame`](DataFrame.md) into consecutive sub-dataframes (chunks) and returns them as a
[`FrameColumn`](DataColumn.md#framecolumn). Chunks are formed in order and do not overlap.
Each chunk contains at most the specified number of rows.
The resulting `FrameColumn`s name can be customized; by default, it is "groups."
`DataFrame` can be split into chunks in two ways:
- By fixed size: split into chunks of up to the given size.
- By start indices: split using custom zero-based start indices for each chunk; each chunk ends right before the next start index or the end of the DataFrame.
```kotlin
df.chunked(size: Int, name: String)
df.chunked(startIndices: List<Int>, name: String)
```
### Examples
<!---FUN notebook_test_chunked_1-->
```kotlin
df
```
<!---END-->
<inline-frame src="./resources/notebook_test_chunked_1.html" width="100%" height="500px"></inline-frame>
Fixed size chunks:
<!---FUN notebook_test_chunked_2-->
```kotlin
df.chunked(size = 2)
```
<!---END-->
<inline-frame src="./resources/notebook_test_chunked_2.html" width="100%" height="500px"></inline-frame>
Custom start indices:
<!---FUN notebook_test_chunked_3-->
```kotlin
df.chunked(startIndices = listOf(0, 1, 3), name = "segments")
```
<!---END-->
<inline-frame src="./resources/notebook_test_chunked_3.html" width="100%" height="500px"></inline-frame>
@@ -0,0 +1,47 @@
# shuffle
<web-summary>
Discover `shuffle` operation in Kotlin Dataframe.
</web-summary>
<card-summary>
Discover `shuffle` operation in Kotlin Dataframe.
</card-summary>
<link-summary>
Discover `shuffle` operation in Kotlin Dataframe.
</link-summary>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.utils.ShuffleSamples-->
Returns a new [`DataFrame`](DataFrame.md) with rows in random order.
You can supply a [kotlin.random.Random](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.random/-random/)
instance with a fixed seed for reproducible results.
```Kotlin
df.shuffle()
df.shuffle(random: Random)
```
### Examples
<!---FUN notebook_test_shuffle_1-->
```kotlin
df
```
<!---END-->
<inline-frame src="./resources/notebook_test_shuffle_1.html" width="100%" height="500px"></inline-frame>
Deterministic shuffle using a fixed seed:
<!---FUN notebook_test_shuffle_2-->
```kotlin
df.shuffle(Random(42))
```
<!---END-->
<inline-frame src="./resources/notebook_test_shuffle_2.html" width="100%" height="500px"></inline-frame>