158 lines
4.3 KiB
Markdown
Vendored
158 lines
4.3 KiB
Markdown
Vendored
[//]: # (title: Data Schemas in Gradle projects)
|
|
|
|
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
|
|
|
|
> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
|
|
>
|
|
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
|
|
{style="warning"}
|
|
|
|
In Gradle projects, the Kotlin DataFrame library provides
|
|
|
|
1. Annotation processing for generation of extension properties
|
|
2. Annotation processing for [`DataSchema`](schemas.md) inference from datasets.
|
|
3. Gradle task for [`DataSchema`](schemas.md) inference from datasets.
|
|
|
|
### Configuration
|
|
|
|
To use the [extension properties API](extensionPropertiesApi.md) in Gradle project add the `dataframe` plugin as follows:
|
|
|
|
<tabs>
|
|
<tab title="Kotlin DSL">
|
|
|
|
```kotlin
|
|
plugins {
|
|
id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
|
|
}
|
|
|
|
dependencies {
|
|
implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
|
|
}
|
|
```
|
|
|
|
</tab>
|
|
|
|
<tab title="Groovy DSL">
|
|
|
|
```groovy
|
|
plugins {
|
|
id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
|
|
}
|
|
|
|
dependencies {
|
|
implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%'
|
|
}
|
|
```
|
|
|
|
</tab>
|
|
|
|
</tabs>
|
|
|
|
### Annotation processing
|
|
|
|
Declare data schemas in your code and use them to access data in [`DataFrame`](DataFrame.md) objects.
|
|
A data schema is a class or interface annotated with [`@DataSchema`](schemas.md):
|
|
|
|
```kotlin
|
|
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
|
|
|
|
@DataSchema
|
|
interface Person {
|
|
val name: String
|
|
val age: Int
|
|
}
|
|
```
|
|
|
|
#### Execute the `assemble` task to generate type-safe accessors for schemas:
|
|
|
|
<!---FUN useProperties-->
|
|
|
|
```kotlin
|
|
val df = dataFrameOf("name", "age")(
|
|
"Alice", 15,
|
|
"Bob", 20,
|
|
).cast<Person>()
|
|
// age only available after executing `build` or `kspKotlin`!
|
|
val teens = df.filter { age in 10..19 }
|
|
teens.print()
|
|
```
|
|
|
|
<!---END-->
|
|
|
|
### Schema inference
|
|
|
|
Specify schema with preferred method and execute the `assemble` task.
|
|
|
|
<tabs>
|
|
<tab title="Method 1. Annotation processing">
|
|
|
|
`@ImportDataSchema` annotation must be above package directive.
|
|
You can import schemas from a URL or from the relative path of a file.
|
|
Relative path by default is resolved to the project root directory.
|
|
You can configure it by [passing](https://kotlinlang.org/docs/ksp-quickstart.html#pass-options-to-processors) `dataframe.resolutionDir`
|
|
option to preprocessor.
|
|
For example:
|
|
|
|
```kotlin
|
|
ksp {
|
|
arg("dataframe.resolutionDir", file("data").absolutePath)
|
|
}
|
|
```
|
|
|
|
**Note that due to incremental processing, imported schema will be re-generated only if some source code has changed
|
|
from the previous invocation, at least one character.**
|
|
|
|
For the following configuration, file `Repository.Generated.kt` will be generated to `build/generated/ksp/` folder in
|
|
the same package as file containing the annotation.
|
|
|
|
```kotlin
|
|
@file:ImportDataSchema(
|
|
"Repository",
|
|
"https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
|
|
)
|
|
|
|
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
|
|
import org.jetbrains.kotlinx.dataframe.api.*
|
|
```
|
|
|
|
See KDocs for `@ImportDataSchema` in IDE
|
|
or [GitHub](https://github.com/Kotlin/dataframe/blob/master/core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/annotations/ImportDataSchema.kt)
|
|
for more details.
|
|
|
|
</tab>
|
|
|
|
<tab title="Method 2. Gradle task">
|
|
|
|
Put this in `build.gradle` or `build.gradle.kts`
|
|
For the following configuration, file `Repository.Generated.kt` will be generated
|
|
to `build/generated/dataframe/org/example` folder.
|
|
|
|
```kotlin
|
|
dataframes {
|
|
schema {
|
|
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
|
|
name = "org.example.Repository"
|
|
}
|
|
}
|
|
```
|
|
|
|
See [reference](Gradle-Plugin.md) and [examples](Gradle-Plugin.md#examples) for more details.
|
|
|
|
</tab>
|
|
</tabs>
|
|
|
|
After `assemble`, the following code should compile and run:
|
|
|
|
<!---FUN useInferredSchema-->
|
|
|
|
```kotlin
|
|
// Repository.readCsv() has argument 'path' with default value https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv
|
|
val df = Repository.readCsv()
|
|
// Use generated properties to access data in rows
|
|
df.maxBy { stargazersCount }.print()
|
|
// Or to access columns in dataframe.
|
|
print(df.fullName.count { it.contains("kotlin") })
|
|
```
|
|
|
|
<!---END-->
|