Files
2026-02-08 11:20:43 -10:00

615 lines
16 KiB
Markdown
Vendored
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Modules
Kotlin DataFrame is composed of modules, allowing you to include only the functionality you need.
In addition, Kotlin DataFrame provides several [plugins](#plugins)
that significantly enhance the development experience
— making it more convenient, powerful, and enjoyable to work with.
<table>
<thead>
<tr>
<th>Module</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr><td colspan="2"><strong><a href="#dataframe-general">General</a></strong></td></tr>
<tr>
<td><a href="#dataframe-general"><code>dataframe</code></a></td>
<td>General artifact combines all core and IO artifacts except experimental ones.</td>
</tr>
<tr>
<td colspan="2"><strong><a href="#core-modules">Core</a></strong></td></tr>
<tr>
<td><code><a href="#dataframe-core">dataframe-core</a></code></td>
<td>The <a href="DataFrame.md">DataFrame</a> API and its implementation.</td>
</tr>
<tr><td colspan="2"><strong><a href="#io-modules">IO</a></strong></td></tr>
<tr>
<td><code><a href="#dataframe-json">dataframe-json</a></code></td>
<td>Provides support for JSON format writing and reading.</td>
</tr>
<tr>
<td><code><a href="#dataframe-csv">dataframe-csv</a></code></td>
<td>Provides support for CSV format writing and reading.</td>
</tr>
<tr>
<td><code><a href="#dataframe-excel">dataframe-excel</a></code></td>
<td>Provides support for XSL/XLSX format writing and reading.</td>
</tr>
<tr>
<td><code><a href="#dataframe-jdbc">dataframe-jdbc</a></code></td>
<td>Provides support for JDBC data sources reading.</td>
</tr>
<tr>
<td><code><a href="#dataframe-arrow">dataframe-arrow</a></code></td>
<td>Provides support for <a href="https://arrow.apache.org">Apache Arrow</a> format writing and reading. Provides support for reading <a href="https://parquet.apache.org/">Apache Parquet</a> files through <a href="https://arrow.apache.org/docs/java/dataset.html">Apache Arrow Dataset</a></td>
</tr>
<tr><td colspan="2"><strong><a href="#experimental-modules">Experimental modules</a></strong></td></tr>
<tr>
<td><code><a href="#dataframe-geo">dataframe-geo</a></code></td>
<td>Provides a new API for working with geospatial data and IO for geographic formats (GeoJSON, Shapefile).</td>
</tr>
<tr>
<td><code><a href="#dataframe-openapi">dataframe-openapi</a></code></td>
<td>Provides support for <a href="https://www.openapis.org">OpenAPI JSON format</a> reading and writing.</td>
</tr>
<tr>
<td><code><a href="#dataframe-openapi-generator">dataframe-openapi-generator</a></code></td>
<td>Provides <a href="schemas.md">schema generation</a> from OpenAPI specifications. Requires <code>dataframe-openapi</code>.</td>
</tr>
<tr><td colspan="2"><strong><a href="#plugins">Plugins</a></strong></td></tr>
<tr>
<td><code><a href="#kotlin.plugin.dataframe">kotlin.plugin.dataframe</a></code></td>
<td>Kotlin compiler plugin. Provides compile-time <a href="extensionPropertiesApi.md">extension properties</a> generation.
</td>
</tr>
<tr>
<td><code><a href="#kotlinx.dataframe">kotlinx.dataframe</a></code></td>
<td>Gradle plugin. Provides <a href="schemas.md">schemas generation</a> using Gradle.</td>
</tr>
<tr>
<td><code><a href="#ksp-plugin">kotlinx.dataframe:symbol-processor-all</a></code></td>
<td>KSP plugin. Provides <a href="schemas.md">schemas generation</a> using KSP.</td>
</tr>
</tbody>
</table>
## Configure the repository
All Kotlin DataFrame modules are available from the Maven Central repository.
To use them, add the appropriate dependency into your repositories mapping:
<tabs>
<tab title="Kotlin DSL">
```kotlin
repositories {
mavenCentral()
}
```
</tab>
<tab title="Groovy DSL">
```kotlin
repositories {
mavenCentral()
}
```
</tab>
</tabs>
## `dataframe` - general Kotlin DataFrame dependency {id="dataframe-general"}
General-purpose artifact that includes all [core](#core-modules) and [IO](#io-modules) modules.
Does **not** include any [experimental modules](#experimental-modules).
Recommended if you dont need fine-grained control over individual module dependencies.
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%'
}
```
</tab>
</tabs>
## Core Kotlin DataFrame modules {id="core-modules"}
#### `dataframe-core`
The core [DataFrame](DataFrame.md) API and its implementation.
Includes all core functionality for working with data structures, expressions, schema management, and operations.
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe-core:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe-core:%dataFrameVersion%'
}
```
</tab>
</tabs>
## IO Kotlin DataFrame modules {id="io-modules"}
#### `dataframe-json` {id="dataframe-json"}
Provides all logic for DataFrame to be able to work with
JSON data sources; [reading](https://kotlin.github.io/dataframe/read.html#read-from-json)
and [writing](https://kotlin.github.io/dataframe/write.html#writing-to-json).
It's based on [Kotlinx Serialization](https://github.com/Kotlin/kotlinx.serialization).
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe-json:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe-json:%dataFrameVersion%'
}
```
</tab>
</tabs>
#### `dataframe-csv` {id="dataframe-csv"}
Provides support for reading and writing CSV files.
Supports standard CSV format features such as delimiters, headers, and quotes.
Based on high-performance [Deephaven CSV](https://github.com/deephaven/deephaven-csv).
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe-csv:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe-csv:%dataFrameVersion%'
}
```
</tab>
</tabs>
Note that `dataframe-json` is included with `dataframe-csv` by default.
This is to support JSON structures inside CSV files.
If you don't need this functionality, you can exclude it like so:
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe-csv:%dataFrameVersion%") {
exclude("org.jetbrains.kotlinx", "dataframe-json")
}
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation('org.jetbrains.kotlinx:dataframe-csv:%dataFrameVersion%') {
exclude group: 'org.jetbrains.kotlinx', module: 'dataframe-json'
}
}
```
</tab>
</tabs>
#### `dataframe-excel` {id="dataframe-excel"}
Provides support for reading and writing Excel files (`.xls` and `.xlsx`).
Compatible with standard spreadsheet editors and supports embedded structured data.
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe-excel:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe-excel:%dataFrameVersion%'
}
```
</tab>
</tabs>
Note that `dataframe-json` is included with `dataframe-excel` by default.
This is to support JSON structures inside Excel files.
If you don't need this functionality, you can exclude it like so:
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe-excel:%dataFrameVersion%") {
exclude("org.jetbrains.kotlinx", "dataframe-json")
}
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation('org.jetbrains.kotlinx:dataframe-excel:%dataFrameVersion%') {
exclude group: 'org.jetbrains.kotlinx', module: 'dataframe-json'
}
}
```
</tab>
</tabs>
#### `dataframe-jdbc` {id="dataframe-jdbc"}
Provides all logic for DataFrame to be able to work with
SQL databases that implement the JDBC protocol.
See [Read from SQL databases](https://kotlin.github.io/dataframe/readsqldatabases.html) for more information
about how to use it.
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe-jdbc:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe-jdbc:%dataFrameVersion%'
}
```
</tab>
</tabs>
#### `dataframe-arrow` {id="dataframe-arrow"}
Provides all logic and tests for DataFrame to be able to work with
[Apache Arrow](https://arrow.apache.org).
See [Read Apache Arrow formats](https://kotlin.github.io/dataframe/read.html#read-apache-arrow-formats) and
[Writing to Apache Arrow formats](https://kotlin.github.io/dataframe/write.html#writing-to-apache-arrow-formats)
for more information about how to use it.
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe-arrow:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe-arrow:%dataFrameVersion%'
}
```
</tab>
</tabs>
## Experimental Kotlin DataFrame modules {id="experimental-modules"}
These modules are experimental and may be unstable.
#### `dataframe-geo`
Provides a new API for working with geospatial data,
including reading and writing geospatial formats (GeoJSON, Shapefile),
and performing geometry-aware operations.
See [Geo guide](https://kotlin.github.io/kandy/geo-plotting-guide.html) for more details and examples.
Requires [OSGeo Repository](https://repo.osgeo.org).
<tabs>
<tab title="Kotlin DSL">
```kotlin
repositories {
maven("https://repo.osgeo.org/repository/release")
}
dependencies {
implementation("org.jetbrains.kotlinx:dataframe-geo:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
repositories {
maven {
url 'https://repo.osgeo.org/repository/release'
}
}
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe-geo:%dataFrameVersion%'
}
```
</tab>
</tabs>
#### `dataframe-openapi`
Provides functionality to support auto-generated data schemas from OpenAPI 3.0.0 specifications.
This module is a companion to [`dataframe-openapi-generator`](#dataframe-openapi-generator):
- `dataframe-openapi-generator` is used internally by the Gradle plugin and Jupyter integration
to generate data schemas from OpenAPI specs.
In the Gradle plugin, it powers the `dataschemas {}` DSL and the `@file:ImportDataSchema()` annotation.
In Jupyter, it enables the `importDataSchema()` function.
- `dataframe-openapi` must be added as a dependency to the user project in order to use those generated data schemas.
See:
- [Import OpenAPI Schemas in Gradle project](https://kotlin.github.io/dataframe/schemasimportopenapigradle.html)
- [Import Data Schemas, e.g. from OpenAPI, in Jupyter](https://kotlin.github.io/dataframe/schemasimportopenapijupyter.html)
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe-openapi:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe-openapi:%dataFrameVersion%'
}
```
</tab>
</tabs>
#### `dataframe-openapi-generator`
Provides the logic and tooling necessary to import OpenAPI 3.0.0 specifications
as auto-generated data schemas for Kotlin DataFrame.
This module works in conjunction with [`dataframe-openapi`](#dataframe-openapi):
- `dataframe-openapi-generator` is used internally by the Gradle plugin and Jupyter integration
to generate data schemas from OpenAPI specifications.
- In Gradle, it enables the `dataschemas {}` DSL and the `@file:ImportDataSchema()` annotation.
- In Jupyter, it powers the `importDataSchema()` function.
- `dataframe-openapi` must be added as a dependency to the user project to actually use the generated schemas.
See:
- [Import OpenAPI Schemas in Gradle project](https://kotlin.github.io/dataframe/schemasimportopenapigradle.html)
- [Import Data Schemas, e.g. from OpenAPI, in Jupyter](https://kotlin.github.io/dataframe/schemasimportopenapijupyter.html)
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe-openapi-generator:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe-openapi-generator:%dataFrameVersion%'
}
```
</tab>
</tabs>
## Plugins
<!-- TODO improve compiler plugin setup after release--->
#### `kotlin.plugin.dataframe` — Kotlin DataFrame Compiler Plugin {id="kotlin.plugin.dataframe"}
[Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md) enables automatic generation of
[extension properties](extensionPropertiesApi.md) and updates [data schemas](schemas.md)
on-the-fly in Gradle projects, making development with Kotlin DataFrame faster,
more convenient, and fully type- and name-safe.
> Requires Kotlin 2.2.20-Beta1 or higher.
> { style = "note" }
To enable the plugin in your Gradle project, add it to the `plugins` section:
<tabs>
<tab title="Kotlin DSL">
```kotlin
plugins {
kotlin("plugin.dataframe") version "%compilerPluginKotlinVersion%"
}
```
</tab>
<tab title="Groovy DSL">
```groovy
plugins {
id 'org.jetbrains.kotlin.plugin.dataframe' version '%compilerPluginKotlinVersion%'
}
```
</tab>
</tabs>
Due to [this issue](https://youtrack.jetbrains.com/issue/KT-66735), incremental compilation must be disabled for now.
Add the following line to your `gradle.properties` file:
```properties
kotlin.incremental=false
```
Published as a Kotlin official plugin.
[Source code is available in the Kotlin repository](https://github.com/JetBrains/kotlin/tree/master/plugins/kotlin-dataframe).
#### `kotlinx.dataframe` Gradle Plugin {id="kotlinx.dataframe"}
> The current Gradle plugin is **under consideration for deprecation**
> and may be officially marked as deprecated in future releases.
>
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
{style="warning"}
The Gradle plugin allows generating [data schemas](schemas.md) from samples of data
(of supported formats) like JSON, CSV, Excel files, or URLs, as well as from data fetched from SQL databases
using Gradle.
See the [Gradle Plugin Reference](Gradle-Plugin.md) for installation
and usage instructions in Gradle projects.
> By default, the Gradle plugin also applies the [KSP plugin](#ksp-plugin).
<tabs>
<tab title="Kotlin DSL">
```kotlin
plugins {
id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
}
```
</tab>
<tab title="Groovy DSL">
```groovy
plugins {
id 'org.jetbrains.kotlinx.dataframe' version '%dataFrameVersion%'
}
```
</tab>
</tabs>
#### `kotlinx.dataframe:symbol-processor-all` KSP Plugin {id="ksp-plugin"}
> The KSP plugin is **not compatible with [KSP2](https://github.com/google/ksp?tab=readme-ov-file#ksp2-is-here)**
> and may **not work properly with Kotlin 2.1 or newer**.
>
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
{style="warning"}
The Gradle plugin allows generating [data schemas](schemas.md) from samples of data
(of supported formats) like JSON, CSV, Excel files, or URLs, as well as from data fetched from SQL databases
using Kotlin Symbol Processing (KSP).
This is useful for projects where you prefer or require schema generation at the source level.
See [Data Schemas in Gradle Projects](schemasGradle.md) for usage details.
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
ksp("org.jetbrains.kotlinx.dataframe:symbol-processor-all:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
ksp 'org.jetbrains.kotlinx.dataframe:symbol-processor-all:%dataFrameVersion%'
}
```
</tab>
</tabs>