df-research/dataframe/docs/StardustDocs/topics/split.md

[//]: # (title: split)

<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->

This operation splits every value in the given columns into several values
and optionally spreads them horizontally or vertically.

```text
df.split { columns }
    [.cast<Type>()]
    [.by(delimiters|regex [,trim=true][,ignoreCase=true][,limit=0]) | .by { splitter } | .match(regex)] // how to split cell value
    [.default(value)] // how to fill nulls
    .into(columnNames) [ { columnNamesGenerator } ] | .inward(columnNames) [ { columnNamesGenerator } | .inplace() | .intoRows() | .intoColumns() ] // where to store results

splitter = DataRow.(T) -> Iterable<Any>
columnNamesGenerator = DataColumn.(columnIndex: Int) -> String
```
The following types of columns can be split easily:
* `String`: for instance, by `","`
* `List`: splits into elements, no `by` required!
* [`DataFrame`](DataFrame.md): splits into rows, no `by` required!

**Related operations**: [](splitMerge.md)

See [column selectors](ColumnSelectors.md) for how to select the columns for this operation.

## Split in place

Stores split values as lists in their original columns.

Use the `.inplace()` terminal operation in your `split` configuration to spread split values in place:

<!---FUN splitInplace-->
<tabs>
<tab title="Properties">

```kotlin
df.split { name.firstName }.by { it.asIterable() }.inplace()
```

</tab>
<tab title="Strings">

```kotlin
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.inplace()
```

</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.splitInplace.html" width="100%"/>
<!---END-->

## Split horizontally

Stores split values in new columns.
* `into(col1, col2, ... )` — stores split values in new top-level columns
* `inward(col1, col2, ...)` — stores split values in new columns nested inside the original column
* `intoColumns` — splits [`FrameColumns`](DataColumn.md#framecolumn) into [`ColumnGroups`](DataColumn.md#columngroup) storing in every cell in a `List` of the original values per column

**Reverse operation:** [`merge`](merge.md)

`columnNamesGenerator` is used to generate names for additional columns when the list of explicitly specified `columnNames` is not long enough.
`columnIndex` starts with `1` for the first additional column name.

The default `columnNamesGenerator` generates column names like `split1`, `split2`, etc.

Some examples:

<!---FUN split-->
<tabs>
<tab title="Properties">

```kotlin
df.split { name.lastName }.by { it.asIterable() }.into("char1", "char2")
```

</tab>
<tab title="Strings">

```kotlin
df.split { "name"["lastName"]<String>() }.by { it.asIterable() }.into("char1", "char2")
```

</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.split.html" width="100%"/>
<!---END-->

<!---FUN split1-->
<tabs>
<tab title="Properties">

```kotlin
df.split { name.lastName }
    .by { it.asIterable() }.default(' ')
    .inward { "char$it" }
```

</tab>
<tab title="Strings">

```kotlin
df.split { "name"["lastName"]<String>() }
    .by { it.asIterable() }.default(' ')
    .inward { "char$it" }
```

</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.split1.html" width="100%"/>
<!---END-->

`String` columns can also be split into group matches of [`Regex`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/-regex/) patterns:

<!---FUN splitRegex1-->

```kotlin
merged.split { "name"<String>() }
    .match("""(.*) \((.*)\)""")
    .inward("firstName", "lastName")
```

<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.splitRegex1.html" width="100%"/>
<!---END-->

[`FrameColumn`](DataColumn.md#framecolumn) can be split into columns:

<!---FUN splitFrameColumn-->

```kotlin
val df1 = dataFrameOf("a", "b", "c")(
    1, 2, 3,
    4, 5, 6,
)
val df2 = dataFrameOf("a", "b")(
    5, 6,
    7, 8,
    9, 10,
)
val df = dataFrameOf(
    "id" to columnOf("x", "y"),
    "group" to columnOf(df1, df2)
)

df.split { "group"<AnyFrame>() }.intoColumns()
```

<!---END-->

## Split vertically

Stores split values in new rows, duplicating values in other columns.

**Reverse operation:** [`implode`](implode.md)

Use the `.intoRows()` terminal operation in your `split` configuration to spread split values vertically:

<!---FUN splitIntoRows-->
<tabs>
<tab title="Properties">

```kotlin
df.split { name.firstName }.by { it.asIterable() }.intoRows()

df.split { name }.by { it.values() }.intoRows()
```

</tab>
<tab title="Strings">

```kotlin
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.intoRows()

df.split { colGroup("name") }.by { it.values() }.intoRows()
```

</tab></tabs>
<inline-frame src="resources/org.jetbrains.kotlinx.dataframe.samples.api.Modify.splitIntoRows.html" width="100%"/>
<!---END-->

Equals to `split { column }...inplace().explode { column }`. See [`explode`](explode.md) for details.