init research
This commit is contained in:
@@ -0,0 +1,157 @@
|
||||
[//]: # (title: Data Schemas in Gradle projects)
|
||||
|
||||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
|
||||
|
||||
> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
|
||||
>
|
||||
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
|
||||
{style="warning"}
|
||||
|
||||
In Gradle projects, the Kotlin DataFrame library provides
|
||||
|
||||
1. Annotation processing for generation of extension properties
|
||||
2. Annotation processing for [`DataSchema`](schemas.md) inference from datasets.
|
||||
3. Gradle task for [`DataSchema`](schemas.md) inference from datasets.
|
||||
|
||||
### Configuration
|
||||
|
||||
To use the [extension properties API](extensionPropertiesApi.md) in Gradle project add the `dataframe` plugin as follows:
|
||||
|
||||
<tabs>
|
||||
<tab title="Kotlin DSL">
|
||||
|
||||
```kotlin
|
||||
plugins {
|
||||
id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
|
||||
}
|
||||
|
||||
dependencies {
|
||||
implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
|
||||
}
|
||||
```
|
||||
|
||||
</tab>
|
||||
|
||||
<tab title="Groovy DSL">
|
||||
|
||||
```groovy
|
||||
plugins {
|
||||
id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
|
||||
}
|
||||
|
||||
dependencies {
|
||||
implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%'
|
||||
}
|
||||
```
|
||||
|
||||
</tab>
|
||||
|
||||
</tabs>
|
||||
|
||||
### Annotation processing
|
||||
|
||||
Declare data schemas in your code and use them to access data in [`DataFrame`](DataFrame.md) objects.
|
||||
A data schema is a class or interface annotated with [`@DataSchema`](schemas.md):
|
||||
|
||||
```kotlin
|
||||
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
|
||||
|
||||
@DataSchema
|
||||
interface Person {
|
||||
val name: String
|
||||
val age: Int
|
||||
}
|
||||
```
|
||||
|
||||
#### Execute the `assemble` task to generate type-safe accessors for schemas:
|
||||
|
||||
<!---FUN useProperties-->
|
||||
|
||||
```kotlin
|
||||
val df = dataFrameOf("name", "age")(
|
||||
"Alice", 15,
|
||||
"Bob", 20,
|
||||
).cast<Person>()
|
||||
// age only available after executing `build` or `kspKotlin`!
|
||||
val teens = df.filter { age in 10..19 }
|
||||
teens.print()
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
### Schema inference
|
||||
|
||||
Specify schema with preferred method and execute the `assemble` task.
|
||||
|
||||
<tabs>
|
||||
<tab title="Method 1. Annotation processing">
|
||||
|
||||
`@ImportDataSchema` annotation must be above package directive.
|
||||
You can import schemas from a URL or from the relative path of a file.
|
||||
Relative path by default is resolved to the project root directory.
|
||||
You can configure it by [passing](https://kotlinlang.org/docs/ksp-quickstart.html#pass-options-to-processors) `dataframe.resolutionDir`
|
||||
option to preprocessor.
|
||||
For example:
|
||||
|
||||
```kotlin
|
||||
ksp {
|
||||
arg("dataframe.resolutionDir", file("data").absolutePath)
|
||||
}
|
||||
```
|
||||
|
||||
**Note that due to incremental processing, imported schema will be re-generated only if some source code has changed
|
||||
from the previous invocation, at least one character.**
|
||||
|
||||
For the following configuration, file `Repository.Generated.kt` will be generated to `build/generated/ksp/` folder in
|
||||
the same package as file containing the annotation.
|
||||
|
||||
```kotlin
|
||||
@file:ImportDataSchema(
|
||||
"Repository",
|
||||
"https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
|
||||
)
|
||||
|
||||
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
|
||||
import org.jetbrains.kotlinx.dataframe.api.*
|
||||
```
|
||||
|
||||
See KDocs for `@ImportDataSchema` in IDE
|
||||
or [GitHub](https://github.com/Kotlin/dataframe/blob/master/core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/annotations/ImportDataSchema.kt)
|
||||
for more details.
|
||||
|
||||
</tab>
|
||||
|
||||
<tab title="Method 2. Gradle task">
|
||||
|
||||
Put this in `build.gradle` or `build.gradle.kts`
|
||||
For the following configuration, file `Repository.Generated.kt` will be generated
|
||||
to `build/generated/dataframe/org/example` folder.
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
schema {
|
||||
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
|
||||
name = "org.example.Repository"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
See [reference](Gradle-Plugin.md) and [examples](Gradle-Plugin.md#examples) for more details.
|
||||
|
||||
</tab>
|
||||
</tabs>
|
||||
|
||||
After `assemble`, the following code should compile and run:
|
||||
|
||||
<!---FUN useInferredSchema-->
|
||||
|
||||
```kotlin
|
||||
// Repository.readCsv() has argument 'path' with default value https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv
|
||||
val df = Repository.readCsv()
|
||||
// Use generated properties to access data in rows
|
||||
df.maxBy { stargazersCount }.print()
|
||||
// Or to access columns in dataframe.
|
||||
print(df.fullName.count { it.contains("kotlin") })
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
Reference in New Issue
Block a user