126 lines
4.4 KiB
Markdown
Vendored
126 lines
4.4 KiB
Markdown
Vendored
[//]: # (title: Interop with Collections)
|
|
|
|
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access-->
|
|
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Collections-->
|
|
|
|
_Kotlin DataFrame_ and _Kotlin Collection_ represent two different approaches to data storage:
|
|
* [`DataFrame`](DataFrame.md) stores data by fields/columns
|
|
* `Collection` stores data by records/rows
|
|
|
|
Although [`DataFrame`](DataFrame.md)
|
|
doesn't implement the [`Collection`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-collection/#kotlin.collections.Collection)
|
|
or [`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/)
|
|
interface, it has many similar operations,
|
|
such as [`filter`](filter.md), [`take`](sliceRows.md#take),
|
|
[`first`](first.md), [`map`](map.md), [`groupBy`](groupBy.md) etc.
|
|
|
|
[`DataFrame`](DataFrame.md) has two-way compatibility with [`Map`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-map/) and [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/):
|
|
* `List<T>` -> `DataFrame<T>`: [toDataFrame](createDataFrame.md#dataframe-from-iterable-t)
|
|
* `DataFrame<T>` -> `List<T>`: [toList](toList.md)
|
|
* `Map<String, List<*>>` -> `DataFrame<*>`: [toDataFrame](createDataFrame.md#dataframe-from-map-string-list)
|
|
* `DataFrame<*>` -> `Map<String, List<*>>`: [toMap](toMap.md)
|
|
* `List<List<T>>` -> `DataFrame<*>`: [toDataFrame](createDataFrame.md#dataframe-from-list-list-t)
|
|
|
|
Columns, rows, and values of [`DataFrame`](DataFrame.md)
|
|
can be accessed as [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/),
|
|
[`Iterable`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-iterable/)
|
|
and [`Sequence`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.sequences/-sequence/) accordingly:
|
|
|
|
<!---FUN getRowsColumns-->
|
|
|
|
```kotlin
|
|
df.columns() // List<DataColumn>
|
|
df.rows() // Iterable<DataRow>
|
|
df.values() // Sequence<Any?>
|
|
```
|
|
|
|
<!---END-->
|
|
|
|
## Interop with data classes
|
|
|
|
[`DataFrame`](DataFrame.md) can be used as an intermediate object for transformation from one data structure to another.
|
|
|
|
Assume you have a list of instances of some [data class](https://kotlinlang.org/docs/data-classes.html) that you need to transform into some other format.
|
|
|
|
<!---FUN listInterop1-->
|
|
|
|
```kotlin
|
|
data class Input(val a: Int, val b: Int)
|
|
|
|
val list = listOf(Input(1, 2), Input(3, 4))
|
|
```
|
|
|
|
<!---END-->
|
|
|
|
You can convert this list into [`DataFrame`](DataFrame.md) using [`toDataFrame()`](createDataFrame.md#todataframe) extension:
|
|
|
|
<!---FUN listInterop2-->
|
|
|
|
```kotlin
|
|
val df = list.toDataFrame()
|
|
```
|
|
|
|
<!---END-->
|
|
|
|
Mark the original data class with [`DataSchema`](schemas.md)
|
|
annotation to get [extension properties](extensionPropertiesApi.md) and perform data transformations.
|
|
|
|
<!---FUN listInterop3-->
|
|
|
|
```kotlin
|
|
@DataSchema
|
|
data class Input(val a: Int, val b: Int)
|
|
|
|
val df2 = df.add("c") { a + b }
|
|
```
|
|
|
|
<!---END-->
|
|
|
|
<tip>
|
|
|
|
To enable extension properties generation, you should use the [DataFrame plugin](schemasGradle.md)
|
|
for Gradle or the [Kotlin Jupyter kernel](SetupJupyter.md)
|
|
|
|
</tip>
|
|
|
|
After your data is transformed, [`DataFrame`](DataFrame.md) instances can be exported eagerly
|
|
into [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/) of another data class using [toList](toList.md) or [toListOf](toList.md#tolistof) extensions:
|
|
|
|
<!---FUN listInterop4-->
|
|
|
|
```kotlin
|
|
data class Output(val a: Int, val b: Int, val c: Int)
|
|
|
|
val result = df2.toListOf<Output>()
|
|
```
|
|
|
|
<!---END-->
|
|
|
|
```kotlin
|
|
data class Output(val a: Int, val b: Int, val c: Int)
|
|
|
|
val result = df2.toListOf<Output>()
|
|
```
|
|
|
|
Alternatively, one can create lazy [`Sequence`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-sequence/) objects.
|
|
This avoids holding the entire list of objects in memory as objects are created on the fly as needed.
|
|
|
|
<!---FUN listInterop5-->
|
|
|
|
```kotlin
|
|
val df = dataFrameOf("name", "lastName", "age")("John", "Doe", 21)
|
|
.group("name", "lastName").into("fullName")
|
|
|
|
data class FullName(val name: String, val lastName: String)
|
|
|
|
data class Person(val fullName: FullName, val age: Int)
|
|
|
|
val persons = df.toListOf<Person>() // [Person(fullName = FullName(name = "John", lastName = "Doe"), age = 21)]
|
|
```
|
|
|
|
<!---END-->
|
|
|
|
### Converting columns with object instances to ColumnGroup
|
|
|
|
[unfold](unfold.md) can be used as [`toDataFrame()`](createDataFrame.md#todataframe) analogue for specific columns inside existing dataframes
|