179 lines
4.8 KiB
Markdown
Vendored
179 lines
4.8 KiB
Markdown
Vendored
[//]: # (title: split)
|
|
|
|
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
|
|
|
|
This operation splits every value in the given columns into several values
|
|
and optionally spreads them horizontally or vertically.
|
|
|
|
```text
|
|
df.split { columns }
|
|
[.cast<Type>()]
|
|
[.by(delimiters|regex [,trim=true][,ignoreCase=true][,limit=0]) | .by { splitter } | .match(regex)] // how to split cell value
|
|
[.default(value)] // how to fill nulls
|
|
.into(columnNames) [ { columnNamesGenerator } ] | .inward(columnNames) [ { columnNamesGenerator } | .inplace() | .intoRows() | .intoColumns() ] // where to store results
|
|
|
|
splitter = DataRow.(T) -> Iterable<Any>
|
|
columnNamesGenerator = DataColumn.(columnIndex: Int) -> String
|
|
```
|
|
The following types of columns can be split easily:
|
|
* `String`: for instance, by `","`
|
|
* `List`: splits into elements, no `by` required!
|
|
* [`DataFrame`](DataFrame.md): splits into rows, no `by` required!
|
|
|
|
**Related operations**: [](splitMerge.md)
|
|
|
|
See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.
|
|
|
|
## Split in place
|
|
|
|
Stores split values as lists in their original columns.
|
|
|
|
Use the `.inplace()` terminal operation in your `split` configuration to spread split values in place:
|
|
|
|
<!---FUN splitInplace-->
|
|
<tabs>
|
|
<tab title="Properties">
|
|
|
|
```kotlin
|
|
df.split { name.firstName }.by { it.asIterable() }.inplace()
|
|
```
|
|
|
|
</tab>
|
|
<tab title="Strings">
|
|
|
|
```kotlin
|
|
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.inplace()
|
|
```
|
|
|
|
</tab></tabs>
|
|
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.splitInplace.html" width="100%"/>
|
|
<!---END-->
|
|
|
|
## Split horizontally
|
|
|
|
Stores split values in new columns.
|
|
* `into(col1, col2, ... )` — stores split values in new top-level columns
|
|
* `inward(col1, col2, ...)` — stores split values in new columns nested inside the original column
|
|
* `intoColumns` — splits [`FrameColumns`](DataColumn.md#framecolumn) into [`ColumnGroups`](DataColumn.md#columngroup) storing in every cell in a `List` of the original values per column
|
|
|
|
**Reverse operation:** [`merge`](merge.md)
|
|
|
|
`columnNamesGenerator` is used to generate names for additional columns when the list of explicitly specified `columnNames` is not long enough.
|
|
`columnIndex` starts with `1` for the first additional column name.
|
|
|
|
The default `columnNamesGenerator` generates column names like `split1`, `split2`, etc.
|
|
|
|
Some examples:
|
|
|
|
<!---FUN split-->
|
|
<tabs>
|
|
<tab title="Properties">
|
|
|
|
```kotlin
|
|
df.split { name.lastName }.by { it.asIterable() }.into("char1", "char2")
|
|
```
|
|
|
|
</tab>
|
|
<tab title="Strings">
|
|
|
|
```kotlin
|
|
df.split { "name"["lastName"]<String>() }.by { it.asIterable() }.into("char1", "char2")
|
|
```
|
|
|
|
</tab></tabs>
|
|
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.split.html" width="100%"/>
|
|
<!---END-->
|
|
|
|
<!---FUN split1-->
|
|
<tabs>
|
|
<tab title="Properties">
|
|
|
|
```kotlin
|
|
df.split { name.lastName }
|
|
.by { it.asIterable() }.default(' ')
|
|
.inward { "char$it" }
|
|
```
|
|
|
|
</tab>
|
|
<tab title="Strings">
|
|
|
|
```kotlin
|
|
df.split { "name"["lastName"]<String>() }
|
|
.by { it.asIterable() }.default(' ')
|
|
.inward { "char$it" }
|
|
```
|
|
|
|
</tab></tabs>
|
|
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.split1.html" width="100%"/>
|
|
<!---END-->
|
|
|
|
`String` columns can also be split into group matches of [`Regex`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/-regex/) patterns:
|
|
|
|
<!---FUN splitRegex1-->
|
|
|
|
```kotlin
|
|
merged.split { "name"<String>() }
|
|
.match("""(.*) \((.*)\)""")
|
|
.inward("firstName", "lastName")
|
|
```
|
|
|
|
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.splitRegex1.html" width="100%"/>
|
|
<!---END-->
|
|
|
|
[`FrameColumn`](DataColumn.md#framecolumn) can be split into columns:
|
|
|
|
<!---FUN splitFrameColumn-->
|
|
|
|
```kotlin
|
|
val df1 = dataFrameOf("a", "b", "c")(
|
|
1, 2, 3,
|
|
4, 5, 6,
|
|
)
|
|
val df2 = dataFrameOf("a", "b")(
|
|
5, 6,
|
|
7, 8,
|
|
9, 10,
|
|
)
|
|
val df = dataFrameOf(
|
|
"id" to columnOf("x", "y"),
|
|
"group" to columnOf(df1, df2)
|
|
)
|
|
|
|
df.split { "group"<AnyFrame>() }.intoColumns()
|
|
```
|
|
|
|
<!---END-->
|
|
|
|
## Split vertically
|
|
|
|
Stores split values in new rows, duplicating values in other columns.
|
|
|
|
**Reverse operation:** [`implode`](implode.md)
|
|
|
|
Use the `.intoRows()` terminal operation in your `split` configuration to spread split values vertically:
|
|
|
|
<!---FUN splitIntoRows-->
|
|
<tabs>
|
|
<tab title="Properties">
|
|
|
|
```kotlin
|
|
df.split { name.firstName }.by { it.asIterable() }.intoRows()
|
|
|
|
df.split { name }.by { it.values() }.intoRows()
|
|
```
|
|
|
|
</tab>
|
|
<tab title="Strings">
|
|
|
|
```kotlin
|
|
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.intoRows()
|
|
|
|
df.split { colGroup("name") }.by { it.values() }.intoRows()
|
|
```
|
|
|
|
</tab></tabs>
|
|
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.splitIntoRows.html" width="100%"/>
|
|
<!---END-->
|
|
|
|
Equals to `split { column }...inplace().explode { column }`. See [`explode`](explode.md) for details.
|