Files
2026-02-08 11:20:43 -10:00

351 lines
9.2 KiB
Markdown
Vendored

[//]: # (title: pivot)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Analyze-->
Splits the rows of a [`DataFrame`](DataFrame.md) and groups them horizontally into new columns based on values from one or several columns of the original [`DataFrame`](DataFrame.md).
```text
pivot (inward = true) { pivotColumns }
[ .groupBy { indexColumns } | .groupByOther() ]
[ .default(defaultValue) ]
reducer | aggregator
reducer = .minBy { column } | .maxBy { column } | .first [ { rowCondition } ] | .last [ { rowCondition } ]
.with { rowExpression } | .values { valueColumns }
aggregator = .count() | .matches() | .frames() | .with { rowExpression } | .values { valueColumns } | .aggregate { aggregations } | .<stat> [ { columns } ]
```
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
**Parameters:**
* `inward` — if `true` generated columns are nested inside the original column, otherwise they will be top-level
* `pivotColumns` — columns with values for horizontal data grouping and generation of new columns
* `indexColumns` — columns with values for vertical data grouping
* `defaultValue` — value to fill mismatched pivot-index column pairs
* `valueColumns` — columns with output values
<!---FUN pivot-->
<tabs>
<tab title="Properties">
```kotlin
df.pivot { city }
```
</tab>
<tab title="Strings">
```kotlin
df.pivot("city")
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivot.html" width="100%"/>
<!---END-->
To pivot several columns at once, you can combine them using `and` or `then` infix function:
* `and` will pivot columns independently
* `then` will create column hierarchy from combinations of values from pivoted columns
<!---FUN pivot2-->
<tabs>
<tab title="Properties">
```kotlin
df.pivot { city and name.firstName }
df.pivot { city then name.firstName }
```
</tab>
<tab title="Strings">
```kotlin
df.pivot { "city" and "name"["firstName"] }
df.pivot { "city" then "name"["firstName"] }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivot2.html" width="100%"/>
<!---END-->
## pivot + groupBy
To create a matrix table that is expanded both horizontally and vertically,
apply [`groupBy`](groupBy.md) transformation passing the columns for vertical grouping.
Reversed order of `pivot` and [`groupBy`](groupBy.md) will produce the same result.
<!---FUN pivotGroupBy-->
<tabs>
<tab title="Properties">
```kotlin
df.pivot { city }.groupBy { name }
// same as
df.groupBy { name }.pivot { city }
```
</tab>
<tab title="Strings">
```kotlin
df.pivot("city").groupBy("name")
// same as
df.groupBy("name").pivot("city")
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotGroupBy.html" width="100%"/>
<!---END-->
To group by all columns except pivoted use `groupByOther`:
<!---FUN pivotGroupByOther-->
```kotlin
df.pivot { city }.groupByOther()
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotGroupByOther.html" width="100%"/>
<!---END-->
<!---TODO ## Reducing--->
## Aggregation
To aggregate data groups with one or several statistics use `aggregate`:
<!---FUN pivotAggregate-->
<tabs>
<tab title="Properties">
```kotlin
df.pivot { city }.aggregate { minBy { age }.name }
```
</tab>
<tab title="Strings">
```kotlin
df.pivot("city").aggregate { minBy("age")["name"] }
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotAggregate.html" width="100%"/>
<!---END-->
<!---FUN pivotAggregate1-->
<tabs>
<tab title="Properties">
```kotlin
df.pivot { city }.groupBy { name.firstName }.aggregate {
meanFor { age and weight } into "means"
stdFor { age and weight } into "stds"
maxByOrNull { weight }?.name?.lastName into "biggest"
}
```
</tab>
<tab title="Strings">
```kotlin
df.pivot("city").groupBy { "name"["firstName"] }.aggregate {
meanFor("age", "weight") into "means"
stdFor("age", "weight") into "stds"
maxByOrNull("weight")?.getColumnGroup("name")?.get("lastName") into "biggest"
}
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotAggregate1.html" width="100%"/>
<!---END-->
Shortcuts for common aggregation functions are also available:
<!---FUN pivotCommonAggregations-->
<tabs>
<tab title="Properties">
```kotlin
df.pivot { city }.maxFor { age and weight }
df.groupBy { name }.pivot { city }.median { age }
```
</tab>
<tab title="Strings">
```kotlin
df.pivot("city").maxFor("age", "weight")
df.groupBy("name").pivot("city").median("age")
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotCommonAggregations.html" width="100%"/>
<!---END-->
By default, when aggregation function produces several values for single data group,
column hierarchy in resulting [`DataFrame`](DataFrame.md)
will be indexed first by pivot keys and then by the names of aggregated values.
To reverse this order so that resulting columns will be indexed first by names of aggregated values and then by pivot keys,
use `separate=true` flag that is available in multi-result aggregation operations, such as `aggregate` or `<stat>For`:
<!---FUN pivotSeparate-->
<tabs>
<tab title="Properties">
```kotlin
df.pivot { city }.maxFor(separate = true) { age and weight }
df.pivot { city }.aggregate(separate = true) {
min { age } into "min age"
maxOrNull { weight } into "max weight"
}
```
</tab>
<tab title="Strings">
```kotlin
df.pivot("city").maxFor("age", "weight", separate = true)
df.pivot("city").aggregate(separate = true) {
min("age") into "min age"
maxOrNull("weight") into "max weight"
}
```
</tab></tabs>
<!---END-->
By default, any aggregation function will result in `null` value for those matrix cells, where intersection of column and row keys produced an empty data group.
You can specify default value for any aggregation by `default` infix function. This value will replace all `null` results of aggregation function over non-empty data groups as well.
To use one default value for all aggregation functions, use `default()` before aggregation.
<!---FUN pivotDefault-->
<tabs>
<tab title="Properties">
```kotlin
df.pivot { city }.groupBy { name }.aggregate { min { age } default 0 }
df.pivot { city }.groupBy { name }.default(0).min()
```
</tab>
<tab title="Strings">
```kotlin
df.pivot("city").groupBy("name").aggregate { min("age") default 0 }
df.pivot("city").groupBy("name").default(0).min()
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotDefault.html" width="100%"/>
<!---END-->
<!---FUN pivotDefault1-->
<tabs>
<tab title="Properties">
```kotlin
df.pivot { city }.groupBy { name }.aggregate {
median { age } into "median age" default 0
minOrNull { weight } into "min weight" default 100
}
```
</tab>
<tab title="Strings">
```kotlin
df.pivot("city").groupBy("name").aggregate {
median("age") into "median age" default 0
minOrNull("weight") into "min weight" default 100
}
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotDefault1.html" width="100%"/>
<!---END-->
### Pivot inside aggregate
pivot transformation can be used inside [`aggregate`](groupBy.md#aggregation) function of [`groupBy`](groupBy.md).
This allows combining column pivoting with other [`groupBy`](groupBy.md) aggregations:
<!---FUN pivotInAggregate-->
<tabs>
<tab title="Properties">
```kotlin
df.groupBy { name.firstName }.aggregate {
pivot { city }.aggregate(separate = true) {
mean { age } into "mean age"
count() into "count"
}
count() into "total"
}
```
</tab>
<tab title="Strings">
```kotlin
df.groupBy { "name"["firstName"] }.aggregate {
pivot("city").aggregate(separate = true) {
mean("age") into "mean age"
count() into "count"
}
count() into "total"
}
```
</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotInAggregate.html" width="100%"/>
<!---END-->
### pivotCounts
Pivots with [`count`](count.md) statistics one or several columns preserving all other columns of [`DataFrame`](DataFrame.md)
or [`GroupBy DataFrame`](groupBy.md#transformation).
<!---FUN pivotCounts-->
```kotlin
df.pivotCounts { city }
// same as
df.pivot { city }.groupByOther().count()
df.groupBy { name }.pivotCounts { city }
// same as
df.groupBy { name }.pivot { city }.count()
// same as
df.groupBy { name }.aggregate {
pivotCounts { city }
}
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotCounts.html" width="100%"/>
<!---END-->
### pivotMatches
Pivots with `Boolean` statistics one or several columns preserving all other columns of [`DataFrame`](DataFrame.md).
<!---FUN pivotMatches-->
```kotlin
df.pivotMatches { city }
// same as
df.pivot { city }.groupByOther().matches()
df.groupBy { name }.pivotMatches { city }
// same as
df.groupBy { name }.pivot { city }.matches()
// same as
df.groupBy { name }.aggregate {
pivotMatches { city }
}
```
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Analyze.pivotMatches.html" width="100%"/>
<!---END-->