init research

2026-02-08 11:20:43 -10:00
commit bdf064f54d
3041 changed files with 1592200 additions and 0 deletions
@@ -0,0 +1,350 @@
+[//]: # (title: pivot)
+
+<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Analyze-->
+
+Splits the rows of a [`DataFrame`](DataFrame.md) and groups them horizontally into new columns based on values from one or several columns of the original [`DataFrame`](DataFrame.md).
+
+```text
+pivot (inward = true) { pivotColumns }
+  [ .groupBy { indexColumns } | .groupByOther() ]
+  [ .default(defaultValue) ]
+     reducer | aggregator
+    
+reducer = .minBy { column } | .maxBy { column } | .first [ { rowCondition } ] | .last [ { rowCondition } ] 
+          .with { rowExpression } | .values { valueColumns }
+
+aggregator = .count() | .matches() | .frames() | .with { rowExpression } | .values { valueColumns } | .aggregate { aggregations } | .<stat> [ { columns } ]
+```
+
+See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
+
+**Parameters:**
+* `inward` — if `true` generated columns are nested inside the original column, otherwise they will be top-level
+* `pivotColumns` — columns with values for horizontal data grouping and generation of new columns
+* `indexColumns` — columns with values for vertical data grouping
+* `defaultValue` — value to fill mismatched pivot-index column pairs
+* `valueColumns` — columns with output values
+
+<!---FUN pivot-->
+<tabs>
+<tab title="Properties">
+
+```kotlin
+df.pivot { city }
+```
+
+</tab>
+<tab title="Strings">
+
+```kotlin
+df.pivot("city")
+```
+
+</tab></tabs>
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivot.html" width="100%"/>
+<!---END-->
+
+To pivot several columns at once, you can combine them using `and` or `then` infix function:
+* `and` will pivot columns independently
+* `then` will create column hierarchy from combinations of values from pivoted columns
+
+<!---FUN pivot2-->
+<tabs>
+<tab title="Properties">
+
+```kotlin
+df.pivot { city and name.firstName }
+df.pivot { city then name.firstName }
+```
+
+</tab>
+<tab title="Strings">
+
+```kotlin
+df.pivot { "city" and "name"["firstName"] }
+df.pivot { "city" then "name"["firstName"] }
+```
+
+</tab></tabs>
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivot2.html" width="100%"/>
+<!---END-->
+
+## pivot + groupBy
+
+To create a matrix table that is expanded both horizontally and vertically,
+apply [`groupBy`](groupBy.md) transformation passing the columns for vertical grouping. 
+Reversed order of `pivot` and [`groupBy`](groupBy.md) will produce the same result.
+
+<!---FUN pivotGroupBy-->
+<tabs>
+<tab title="Properties">
+
+```kotlin
+df.pivot { city }.groupBy { name }
+// same as
+df.groupBy { name }.pivot { city }
+```
+
+</tab>
+<tab title="Strings">
+
+```kotlin
+df.pivot("city").groupBy("name")
+// same as
+df.groupBy("name").pivot("city")
+```
+
+</tab></tabs>
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotGroupBy.html" width="100%"/>
+<!---END-->
+
+To group by all columns except pivoted use `groupByOther`:
+
+<!---FUN pivotGroupByOther-->
+
+```kotlin
+df.pivot { city }.groupByOther()
+```
+
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotGroupByOther.html" width="100%"/>
+<!---END-->
+
+<!---TODO ## Reducing--->
+
+## Aggregation
+
+To aggregate data groups with one or several statistics use `aggregate`:
+
+<!---FUN pivotAggregate-->
+<tabs>
+<tab title="Properties">
+
+```kotlin
+df.pivot { city }.aggregate { minBy { age }.name }
+```
+
+</tab>
+<tab title="Strings">
+
+```kotlin
+df.pivot("city").aggregate { minBy("age")["name"] }
+```
+
+</tab></tabs>
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotAggregate.html" width="100%"/>
+<!---END-->
+
+<!---FUN pivotAggregate1-->
+<tabs>
+<tab title="Properties">
+
+```kotlin
+df.pivot { city }.groupBy { name.firstName }.aggregate {
+    meanFor { age and weight } into "means"
+    stdFor { age and weight } into "stds"
+    maxByOrNull { weight }?.name?.lastName into "biggest"
+}
+```
+
+</tab>
+<tab title="Strings">
+
+```kotlin
+df.pivot("city").groupBy { "name"["firstName"] }.aggregate {
+    meanFor("age", "weight") into "means"
+    stdFor("age", "weight") into "stds"
+    maxByOrNull("weight")?.getColumnGroup("name")?.get("lastName") into "biggest"
+}
+```
+
+</tab></tabs>
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotAggregate1.html" width="100%"/>
+<!---END-->
+
+Shortcuts for common aggregation functions are also available:
+
+<!---FUN pivotCommonAggregations-->
+<tabs>
+<tab title="Properties">
+
+```kotlin
+df.pivot { city }.maxFor { age and weight }
+df.groupBy { name }.pivot { city }.median { age }
+```
+
+</tab>
+<tab title="Strings">
+
+```kotlin
+df.pivot("city").maxFor("age", "weight")
+df.groupBy("name").pivot("city").median("age")
+```
+
+</tab></tabs>
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotCommonAggregations.html" width="100%"/>
+<!---END-->
+
+By default, when aggregation function produces several values for single data group, 
+column hierarchy in resulting [`DataFrame`](DataFrame.md) 
+will be indexed first by pivot keys and then by the names of aggregated values.
+To reverse this order so that resulting columns will be indexed first by names of aggregated values and then by pivot keys, 
+use `separate=true` flag that is available in multi-result aggregation operations, such as `aggregate` or `<stat>For`:
+
+<!---FUN pivotSeparate-->
+<tabs>
+<tab title="Properties">
+
+```kotlin
+df.pivot { city }.maxFor(separate = true) { age and weight }
+df.pivot { city }.aggregate(separate = true) {
+    min { age } into "min age"
+    maxOrNull { weight } into "max weight"
+}
+```
+
+</tab>
+<tab title="Strings">
+
+```kotlin
+df.pivot("city").maxFor("age", "weight", separate = true)
+df.pivot("city").aggregate(separate = true) {
+    min("age") into "min age"
+    maxOrNull("weight") into "max weight"
+}
+```
+
+</tab></tabs>
+<!---END-->
+
+By default, any aggregation function will result in `null` value for those matrix cells, where intersection of column and row keys produced an empty data group.
+You can specify default value for any aggregation by `default` infix function. This value will replace all `null` results of aggregation function over non-empty data groups as well.
+To use one default value for all aggregation functions, use `default()` before aggregation.
+
+<!---FUN pivotDefault-->
+<tabs>
+<tab title="Properties">
+
+```kotlin
+df.pivot { city }.groupBy { name }.aggregate { min { age } default 0 }
+df.pivot { city }.groupBy { name }.default(0).min()
+```
+
+</tab>
+<tab title="Strings">
+
+```kotlin
+df.pivot("city").groupBy("name").aggregate { min("age") default 0 }
+df.pivot("city").groupBy("name").default(0).min()
+```
+
+</tab></tabs>
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotDefault.html" width="100%"/>
+<!---END-->
+
+<!---FUN pivotDefault1-->
+<tabs>
+<tab title="Properties">
+
+```kotlin
+df.pivot { city }.groupBy { name }.aggregate {
+    median { age } into "median age" default 0
+    minOrNull { weight } into "min weight" default 100
+}
+```
+
+</tab>
+<tab title="Strings">
+
+```kotlin
+df.pivot("city").groupBy("name").aggregate {
+    median("age") into "median age" default 0
+    minOrNull("weight") into "min weight" default 100
+}
+```
+
+</tab></tabs>
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotDefault1.html" width="100%"/>
+<!---END-->
+
+### Pivot inside aggregate
+
+pivot transformation can be used inside [`aggregate`](groupBy.md#aggregation) function of [`groupBy`](groupBy.md). 
+This allows combining column pivoting with other [`groupBy`](groupBy.md) aggregations:
+
+<!---FUN pivotInAggregate-->
+<tabs>
+<tab title="Properties">
+
+```kotlin
+df.groupBy { name.firstName }.aggregate {
+    pivot { city }.aggregate(separate = true) {
+        mean { age } into "mean age"
+        count() into "count"
+    }
+    count() into "total"
+}
+```
+
+</tab>
+<tab title="Strings">
+
+```kotlin
+df.groupBy { "name"["firstName"] }.aggregate {
+    pivot("city").aggregate(separate = true) {
+        mean("age") into "mean age"
+        count() into "count"
+    }
+    count() into "total"
+}
+```
+
+</tab></tabs>
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotInAggregate.html" width="100%"/>
+<!---END-->
+
+### pivotCounts
+
+Pivots with [`count`](count.md) statistics one or several columns preserving all other columns of [`DataFrame`](DataFrame.md) 
+or [`GroupBy DataFrame`](groupBy.md#transformation).
+
+<!---FUN pivotCounts-->
+
+```kotlin
+df.pivotCounts { city }
+// same as
+df.pivot { city }.groupByOther().count()
+
+df.groupBy { name }.pivotCounts { city }
+// same as
+df.groupBy { name }.pivot { city }.count()
+// same as
+df.groupBy { name }.aggregate {
+    pivotCounts { city }
+}
+```
+
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotCounts.html" width="100%"/>
+<!---END-->
+
+### pivotMatches
+
+Pivots with `Boolean` statistics one or several columns preserving all other columns of [`DataFrame`](DataFrame.md).
+
+<!---FUN pivotMatches-->
+
+```kotlin
+df.pivotMatches { city }
+// same as
+df.pivot { city }.groupByOther().matches()
+
+df.groupBy { name }.pivotMatches { city }
+// same as
+df.groupBy { name }.pivot { city }.matches()
+// same as
+df.groupBy { name }.aggregate {
+    pivotMatches { city }
+}
+```
+
+<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotMatches.html" width="100%"/>
+<!---END-->