Files
2026-02-08 11:20:43 -10:00

158 lines
4.3 KiB
Markdown
Vendored

[//]: # (title: Data Schemas in Gradle projects)
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
>
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
{style="warning"}
In Gradle projects, the Kotlin DataFrame library provides
1. Annotation processing for generation of extension properties
2. Annotation processing for [`DataSchema`](schemas.md) inference from datasets.
3. Gradle task for [`DataSchema`](schemas.md) inference from datasets.
### Configuration
To use the [extension properties API](extensionPropertiesApi.md) in Gradle project add the `dataframe` plugin as follows:
<tabs>
<tab title="Kotlin DSL">
```kotlin
plugins {
id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
}
dependencies {
implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
plugins {
id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
}
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%'
}
```
</tab>
</tabs>
### Annotation processing
Declare data schemas in your code and use them to access data in [`DataFrame`](DataFrame.md) objects.
A data schema is a class or interface annotated with [`@DataSchema`](schemas.md):
```kotlin
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
@DataSchema
interface Person {
val name: String
val age: Int
}
```
#### Execute the `assemble` task to generate type-safe accessors for schemas:
<!---FUN useProperties-->
```kotlin
val df = dataFrameOf("name", "age")(
"Alice", 15,
"Bob", 20,
).cast<Person>()
// age only available after executing `build` or `kspKotlin`!
val teens = df.filter { age in 10..19 }
teens.print()
```
<!---END-->
### Schema inference
Specify schema with preferred method and execute the `assemble` task.
<tabs>
<tab title="Method 1. Annotation processing">
`@ImportDataSchema` annotation must be above package directive.
You can import schemas from a URL or from the relative path of a file.
Relative path by default is resolved to the project root directory.
You can configure it by [passing](https://kotlinlang.org/docs/ksp-quickstart.html#pass-options-to-processors) `dataframe.resolutionDir`
option to preprocessor.
For example:
```kotlin
ksp {
arg("dataframe.resolutionDir", file("data").absolutePath)
}
```
**Note that due to incremental processing, imported schema will be re-generated only if some source code has changed
from the previous invocation, at least one character.**
For the following configuration, file `Repository.Generated.kt` will be generated to `build/generated/ksp/` folder in
the same package as file containing the annotation.
```kotlin
@file:ImportDataSchema(
"Repository",
"https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
)
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
import org.jetbrains.kotlinx.dataframe.api.*
```
See KDocs for `@ImportDataSchema` in IDE
or [GitHub](https://github.com/Kotlin/dataframe/blob/master/core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/annotations/ImportDataSchema.kt)
for more details.
</tab>
<tab title="Method 2. Gradle task">
Put this in `build.gradle` or `build.gradle.kts`
For the following configuration, file `Repository.Generated.kt` will be generated
to `build/generated/dataframe/org/example` folder.
```kotlin
dataframes {
schema {
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
name = "org.example.Repository"
}
}
```
See [reference](Gradle-Plugin.md) and [examples](Gradle-Plugin.md#examples) for more details.
</tab>
</tabs>
After `assemble`, the following code should compile and run:
<!---FUN useInferredSchema-->
```kotlin
// Repository.readCsv() has argument 'path' with default value https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv
val df = Repository.readCsv()
// Use generated properties to access data in rows
df.maxBy { stargazersCount }.print()
// Or to access columns in dataframe.
print(df.fullName.count { it.contains("kotlin") })
```
<!---END-->