264 lines
13 KiB
Markdown
Vendored
264 lines
13 KiB
Markdown
Vendored
# Kotlin DataFrame: typesafe in-memory structured data processing for JVM
|
|
|
|
[](https://confluence.jetbrains.com/display/ALL/JetBrains+on+GitHub)
|
|
[](https://kotlinlang.org/docs/components-stability.html)
|
|
[](http://kotlinlang.org)
|
|
[](https://search.maven.org/artifact/org.jetbrains.kotlinx/dataframe)
|
|
[](https://search.maven.org/artifact/org.jetbrains.kotlinx/dataframe)
|
|
[](http://www.apache.org/licenses/LICENSE-2.0)
|
|
[](https://mybinder.org/v2/gh/Kotlin/dataframe/HEAD)
|
|
|
|
Kotlin DataFrame aims to reconcile Kotlin's static typing with the dynamic nature of data by utilizing both the full
|
|
power of the Kotlin language and the opportunities provided by intermittent code execution in Jupyter notebooks and
|
|
REPL.
|
|
|
|
* **Hierarchical** — represents hierarchical data structures, such as JSON or a tree of JVM objects.
|
|
* **Functional** — the data processing pipeline is organized in a chain of `DataFrame` transformation operations.
|
|
* **Immutable** — every operation returns a new instance of `DataFrame` reusing underlying storage wherever it's
|
|
possible.
|
|
* **Readable** — data transformation operations are defined in DSL close to natural language.
|
|
* **Practical** — provides simple solutions for common problems and the ability to perform complex tasks.
|
|
* **Minimalistic** — simple, yet powerful data model of three column kinds.
|
|
* **Interoperable** — convertable with Kotlin data classes and collections. This also means conversion to/from other
|
|
libraries' data structures is usually quite straightforward!
|
|
* **Generic** — can store objects of any type, not only numbers or strings.
|
|
* **Typesafe** —
|
|
on-the-fly [generation of extension properties](https://kotlin.github.io/dataframe/extensionpropertiesapi.html) for
|
|
type safe data access with Kotlin-style care for null safety.
|
|
* **Polymorphic** — type compatibility derives from column schema compatibility. You can define a function that requires
|
|
a special subset of columns in a dataframe but doesn't care about other columns.
|
|
In notebooks this works out-of-the-box. In ordinary projects this requires casting (for now).
|
|
|
|
Integrates with [Kotlin Notebook](https://kotlinlang.org/docs/kotlin-notebook-overview.html).
|
|
Inspired by [krangl](https://github.com/holgerbrandl/krangl), Kotlin Collections
|
|
and [pandas](https://pandas.pydata.org/)
|
|
|
|
## 🚀 Quickstart
|
|
|
|
Looking for a fast and simple way to learn the basics?
|
|
Get started in minutes with our [Quickstart Guide](https://kotlin.github.io/dataframe/quickstart.html).
|
|
|
|
It walks you through the core features of Kotlin DataFrame with minimal setup and clear examples
|
|
— perfect for getting up to speed in just a few minutes.
|
|
|
|
[](https://kotlin.github.io/dataframe/quickstart.html)
|
|
|
|
## Documentation
|
|
|
|
Explore [**documentation**](https://kotlin.github.io/dataframe) for details.
|
|
|
|
You could find the following articles there:
|
|
|
|
* [Guides and Examples](https://kotlin.github.io/dataframe/guides-and-examples.html)
|
|
* [Get started with Kotlin DataFrame](https://kotlin.github.io/dataframe/setup.html)
|
|
* [Working with Data Schemas](https://kotlin.github.io/dataframe/schemas.html)
|
|
* [Setup compiler plugin in Gradle project](https://kotlin.github.io/dataframe/compiler-plugin.html)
|
|
* [Full list of all supported operations](https://kotlin.github.io/dataframe/operations.html)
|
|
* [Reading from SQL databases](https://kotlin.github.io/dataframe/readsqldatabases.html)
|
|
* [Reading/writing from/to different file formats like JSON, CSV, Apache Arrow](https://kotlin.github.io/dataframe/read.html)
|
|
* [Joining dataframes](https://kotlin.github.io/dataframe/join.html)
|
|
* [GroupBy operation](https://kotlin.github.io/dataframe/groupby.html)
|
|
* [Rendering to HTML](https://kotlin.github.io/dataframe/tohtml.html#jupyter-notebooks)
|
|
|
|
### What's new
|
|
|
|
1.0.0-Beta4: [Release notes](https://github.com/Kotlin/dataframe/releases/tag/v1.0.0-Beta4)
|
|
|
|
Check out this [notebook with new features](examples/notebooks/feature_overviews/0.15/new_features.ipynb) in v0.15.
|
|
|
|
## Setup
|
|
|
|
> For more detailed instructions on how to get started with Kotlin DataFrame, refer to the
|
|
> [Getting Started](https://kotlin.github.io/dataframe/setup.html).
|
|
|
|
### Kotlin Notebook
|
|
|
|
You can use Kotlin DataFrame in [Kotlin Notebook](https://kotlinlang.org/docs/kotlin-notebook-overview.html),
|
|
or other interactive environment with [Kotlin Jupyter Kernel](https://github.com/Kotlin/kotlin-jupyter) support,
|
|
such as [Datalore](https://datalore.jetbrains.com/),
|
|
and [Jupyter Notebook](https://jupyter.org/).
|
|
|
|
You can include all the necessary dependencies and imports in the notebook using *line magic*:
|
|
|
|
```
|
|
%use dataframe
|
|
```
|
|
|
|
This will add the `dataframe` of the version bundled in the selected Kotlin Jupyter kernel.
|
|
You can use `%useLatestDescriptors`
|
|
to get the latest stable version without updating the Kotlin kernel:
|
|
|
|
```
|
|
%useLatestDescriptors
|
|
%use dataframe
|
|
```
|
|
|
|
Or manually specify the version:
|
|
|
|
```
|
|
%use dataframe(1.0.0-Beta4n)
|
|
```
|
|
|
|
> [!WARNING]
|
|
> Please, use `0.16.0-736` Kotlin Jupyter kernel version or higher for descriptor compatibility
|
|
>
|
|
> Use specified `1.0.0-Beta4n` version in Kotlin Notebook.
|
|
> Due to [an known issue](https://github.com/Kotlin/dataframe/issues/1116),
|
|
> common `dataframe:1.0.0-Beta4` version works incorrectly in Notebook.
|
|
>
|
|
> If you use [`kandy`](https://github.com/Kotlin/kandy) in your notebook, add it after the `dataframe`:
|
|
> ```kotlin
|
|
> %useLatestDescriptors
|
|
> %use dataframe, kandy
|
|
> ```
|
|
|
|
Refer to the
|
|
[Setup Kotlin DataFrame in Kotlin Notebook](https://kotlin.github.io/dataframe/setupkotlinnotebook.html)
|
|
for details.
|
|
|
|
### Gradle
|
|
|
|
Add dependencies in the `build.gradle.kts` script:
|
|
|
|
```kotlin
|
|
dependencies {
|
|
implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta4")
|
|
}
|
|
```
|
|
|
|
Make sure that you have `mavenCentral()` in the list of repositories:
|
|
|
|
```kotlin
|
|
repositories {
|
|
mavenCentral()
|
|
}
|
|
```
|
|
|
|
Refer to
|
|
[Get started with Kotlin DataFrame on Gradle](https://kotlin.github.io/dataframe/setupgradle.html)
|
|
for detailed setup instructions (including Groovy DSL).
|
|
|
|
* You can also check the [Custom Gradle Configuration](https://kotlin.github.io/dataframe/setupcustomgradle.html) if you don't need certain formats as dependencies.
|
|
* For Android projects, see [Setup Kotlin DataFrame on Android](https://kotlin.github.io/dataframe/setupandroid.html).
|
|
* See [IDEA Gradle example projects](examples/idea-examples)
|
|
and [the Gradle project with the Kotlin DataFrame Compiler plugin](examples/kotlin-dataframe-plugin-gradle-example).
|
|
|
|
Refer to the
|
|
[Setup Kotlin DataFrame in Kotlin Notebook](https://kotlin.github.io/dataframe/setupkotlinnotebook.html)
|
|
for details.
|
|
|
|
### Maven
|
|
|
|
Add dependencies in the `pom.xml` configuration file:
|
|
|
|
```xml
|
|
<dependency>
|
|
<groupId>org.jetbrains.kotlinx</groupId>
|
|
<artifactId>dataframe</artifactId>
|
|
<version>1.0.0-Beta4</version>
|
|
</dependency>
|
|
```
|
|
|
|
Make sure that you have `mavenCentral` in the list of repositories:
|
|
|
|
```xml
|
|
<repositories>
|
|
<repository>
|
|
<id>mavenCentral</id>
|
|
<url>https://repo1.maven.org/maven2/</url>
|
|
</repository>
|
|
</repositories>
|
|
```
|
|
|
|
Refer to
|
|
[Get started with Kotlin DataFrame on Maven](https://kotlin.github.io/dataframe/setupmaven.html).
|
|
|
|
* See [the Maven project with the Kotlin DataFrame Compiler plugin](examples/kotlin-dataframe-plugin-gradle-example).
|
|
|
|
|
|
## Code example
|
|
|
|
This example of Kotlin DataFrame code with
|
|
the [Compiler Plugin](https://kotlin.github.io/dataframe/compiler-plugin.html) enabled.
|
|
See [the full project](https://github.com/Kotlin/dataframe/tree/master/examples/kotlin-dataframe-plugin-gradle-example).
|
|
See also
|
|
[this example in Kotlin Notebook](https://github.com/Kotlin/dataframe/tree/master/examples/notebooks/readme_example.ipynb).
|
|
|
|
```kotlin
|
|
val df = DataFrame
|
|
// Read DataFrame from the CSV file.
|
|
.readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
|
|
// And convert it to match the `Repositories` schema.
|
|
.convertTo<Repositories>()
|
|
|
|
// Update the DataFrame.
|
|
val reposUpdated = repos
|
|
// Rename columns to CamelCase.
|
|
.renameToCamelCase()
|
|
// Rename "stargazersCount" column to "stars".
|
|
.rename { stargazersCount }.into("stars")
|
|
// Filter by the number of stars:
|
|
.filter { stars > 50 }
|
|
// Convert values in the "topic" column (which were `String` initially)
|
|
// to the list of topics.
|
|
.convert { topics }.with {
|
|
val inner = it.removeSurrounding("[", "]")
|
|
if (inner.isEmpty()) emptyList() else inner.split(',').map(String::trim)
|
|
}
|
|
// Add a new column with the number of topics.
|
|
.add("topicCount") { topics.size }
|
|
|
|
// Write the updated DataFrame to a CSV file.
|
|
reposUpdated.writeCsv("jetbrains_repositories_new.csv")
|
|
```
|
|
|
|
Explore [**more examples here**](https://kotlin.github.io/dataframe/guides-and-examples.html).
|
|
|
|
## Data model
|
|
|
|
* `DataFrame` is a list of columns with equal sizes and distinct names.
|
|
* `DataColumn` is a named list of values. Can be one of three kinds:
|
|
* `ValueColumn` — contains data
|
|
* `ColumnGroup` — contains columns
|
|
* `FrameColumn` — contains dataframes
|
|
|
|
## Visualizations
|
|
|
|
[Kandy](https://kotlin.github.io/kandy/welcome.html) plotting library provides seamless visualizations
|
|
for your dataframes.
|
|
|
|

|
|
|
|
## Kotlin, Kotlin Jupyter, Arrow, and JDK versions
|
|
|
|
This table shows the mapping between main library component versions and minimum supported Java versions, along with
|
|
other recommended versions.
|
|
|
|
| Kotlin DataFrame Version | Minimum Java Version | Kotlin Version | Kotlin Jupyter Version | Apache Arrow Version | Compiler Plugin Version | Compatible Kandy version |
|
|
|--------------------------|----------------------|----------------|------------------------|----------------------|-------------------------|--------------------------|
|
|
| 0.10.0 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 | | |
|
|
| 0.10.1 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 | | |
|
|
| 0.11.0 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 | | |
|
|
| 0.11.1 | 8 | 1.8.20 | 0.11.0-358 | 11.0.0 | | |
|
|
| 0.12.0 | 8 | 1.9.0 | 0.11.0-358 | 11.0.0 | | |
|
|
| 0.12.1 | 8 | 1.9.0 | 0.11.0-358 | 11.0.0 | | |
|
|
| 0.13.1 | 8 | 1.9.22 | 0.12.0-139 | 15.0.0 | | |
|
|
| 0.14.1 | 8 | 2.0.20 | 0.12.0-139 | 17.0.0 | | |
|
|
| 0.15.0 | 8 | 2.0.20 | 0.12.0-139 | 18.1.0 | | 0.8.0 |
|
|
| 1.0.0-Beta2 | 8 / 11 | 2.0.20 | 0.12.0-383 | 18.1.0 | 2.2.20-dev-3524 | 0.8.1-dev-66 |
|
|
| 1.0.0-Beta3n (notebooks) | 8 / 11 | 2.2.20 | 0.15.0-587 (K1 only) | 18.3.0 | - | 0.8.1n |
|
|
| 1.0.0-Beta3 | 8 / 11 | 2.2.20 | 0.15.0-587 | 18.3.0 | 2.2.20 / IDEA 2025.2+ | 0.8.1 |
|
|
| 1.0.0-Beta4n (notebooks) | 8 / 11 | 2.2.21 | 0.16.0-736 | 18.3.0 | - | 0.8.3 |
|
|
| 1.0.0-Beta4 | 8 / 11 | 2.2.21 | 0.16.0-736 | 18.3.0 | 2.2.21 / IDEA 2025.2+ | 0.8.3 |
|
|
|
|
## Code of Conduct
|
|
|
|
This project and the corresponding community are governed by
|
|
the [JetBrains Open Source and Community Code of Conduct](https://confluence.jetbrains.com/display/ALL/JetBrains+Open+Source+and+Community+Code+of+Conduct).
|
|
Please make sure you read it.
|
|
|
|
## License
|
|
|
|
Kotlin DataFrame is licensed under the [Apache 2.0 License](LICENSE).
|