init research
This commit is contained in:
+180
@@ -0,0 +1,180 @@
|
||||
[//]: # (title: Data Schemas in Kotlin Notebook)
|
||||
|
||||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
|
||||
|
||||
After execution of a cell
|
||||
|
||||
<!---FUN createDfNullable-->
|
||||
|
||||
```kotlin
|
||||
val df = dataFrameOf("name", "age")(
|
||||
"Alice", 15,
|
||||
"Bob", null,
|
||||
)
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
the following actions take place:
|
||||
|
||||
1. Columns in `df` are analyzed to extract data schema
|
||||
2. Empty interface with [`DataSchema`](schema.md) annotation is generated:
|
||||
|
||||
```kotlin
|
||||
@DataSchema
|
||||
interface DataFrameType
|
||||
```
|
||||
|
||||
3. Extension properties for this [`DataSchema`](schema.md) are generated:
|
||||
|
||||
```kotlin
|
||||
val ColumnsContainer<DataFrameType>.age: DataColumn<Int?> @JvmName("DataFrameType_age") get() = this["age"] as DataColumn<Int?>
|
||||
val DataRow<DataFrameType>.age: Int? @JvmName("DataFrameType_age") get() = this["age"] as Int?
|
||||
val ColumnsContainer<DataFrameType>.name: DataColumn<String> @JvmName("DataFrameType_name") get() = this["name"] as DataColumn<String>
|
||||
val DataRow<DataFrameType>.name: String @JvmName("DataFrameType_name") get() = this["name"] as String
|
||||
```
|
||||
|
||||
Every column produces two extension properties:
|
||||
|
||||
* Property for `ColumnsContainer<DataFrameType>` returns column
|
||||
* Property for `DataRow<DataFrameType>` returns cell value
|
||||
|
||||
4. `df` variable is typed by schema interface:
|
||||
|
||||
```kotlin
|
||||
val temp = df
|
||||
```
|
||||
|
||||
```kotlin
|
||||
val df = temp.cast<DataFrameType>()
|
||||
```
|
||||
|
||||
> _Note, that object instance after casting remains the same. See [cast](cast.md).
|
||||
|
||||
To log all these additional code executions, use cell magic
|
||||
|
||||
```
|
||||
%trackExecution -all
|
||||
```
|
||||
|
||||
## Custom Data Schemas
|
||||
|
||||
You can define your own [`DataSchema`](schema.md) interfaces and use them in functions and classes to represent [`DataFrame`](DataFrame.md) with
|
||||
a specific set of columns:
|
||||
|
||||
```kotlin
|
||||
@DataSchema
|
||||
interface Person {
|
||||
val name: String
|
||||
val age: Int
|
||||
}
|
||||
```
|
||||
|
||||
After execution of this cell in notebook or annotation processing in IDEA, extension properties for data access will be
|
||||
generated. Now we can use these properties to create functions for typed [`DataFrame`](DataFrame.md):
|
||||
|
||||
```kotlin
|
||||
fun DataFrame<Person>.splitName() = split { name }.by(",").into("firstName", "lastName")
|
||||
fun DataFrame<Person>.adults() = filter { age > 18 }
|
||||
```
|
||||
|
||||
In Kotlin Notebook these functions will work automatically for any [`DataFrame`](DataFrame.md) that matches `Person` schema:
|
||||
|
||||
<!---FUN extendedDf-->
|
||||
|
||||
```kotlin
|
||||
val df = dataFrameOf("name", "age", "weight")(
|
||||
"Merton, Alice", 15, 60.0,
|
||||
"Marley, Bob", 20, 73.5,
|
||||
)
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
Schema of `df` is compatible with `Person`, so auto-generated schema interface will inherit from it:
|
||||
|
||||
```kotlin
|
||||
@DataSchema(isOpen = false)
|
||||
interface DataFrameType : Person
|
||||
|
||||
val ColumnsContainer<DataFrameType>.weight: DataColumn<Double> get() = this["weight"] as DataColumn<Double>
|
||||
val DataRow<DataFrameType>.weight: Double get() = this["weight"] as Double
|
||||
```
|
||||
|
||||
Despite `df` has additional column `weight`, previously defined functions for `DataFrame<Person>` will work for it:
|
||||
|
||||
<!---FUN splitNameWorks-->
|
||||
|
||||
```kotlin
|
||||
df.splitName()
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
```text
|
||||
firstName lastName age weight
|
||||
Merton Alice 15 60.000
|
||||
Marley Bob 20 73.125
|
||||
```
|
||||
|
||||
<!---FUN adultsWorks-->
|
||||
|
||||
```kotlin
|
||||
df.adults()
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
```text
|
||||
name age weight
|
||||
Marley, Bob 20 73.5
|
||||
```
|
||||
|
||||
## Use external Data Schemas
|
||||
|
||||
Sometimes it is convenient to extract reusable code from Kotlin Notebook into the Kotlin JVM library.
|
||||
Schema interfaces should also be extracted if this code uses [Custom Data Schemas](#custom-data-schemas).
|
||||
|
||||
In order to enable support them in Kotlin, you should register them in
|
||||
library [integration class](https://github.com/Kotlin/kotlin-jupyter/blob/master/docs/libraries.md) with `useSchema`
|
||||
function:
|
||||
|
||||
```kotlin
|
||||
@DataSchema
|
||||
interface Person {
|
||||
val name: String
|
||||
val age: Int
|
||||
}
|
||||
|
||||
fun DataFrame<Person>.countAdults() = count { it[Person::age] > 18 }
|
||||
|
||||
@JupyterLibrary
|
||||
internal class Integration : JupyterIntegration() {
|
||||
|
||||
override fun Builder.onLoaded() {
|
||||
onLoaded {
|
||||
useSchema<Person>()
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
After loading this library into the notebook, schema interfaces for all [`DataFrame`](DataFrame.md) variables that match `Person`
|
||||
schema will derive from `Person`
|
||||
|
||||
<!---FUN createDf-->
|
||||
|
||||
```kotlin
|
||||
val df = dataFrameOf("name", "age")(
|
||||
"Alice", 15,
|
||||
"Bob", 20,
|
||||
)
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
Now `df` is assignable to `DataFrame<Person>` and `countAdults` is available:
|
||||
|
||||
```kotlin
|
||||
df.countAdults()
|
||||
```
|
||||
@@ -0,0 +1,193 @@
|
||||
# Data Schemas Generation From Existing DataFrame
|
||||
|
||||
<web-summary>
|
||||
Generate useful Kotlin definitions based on your DataFrame structure.
|
||||
</web-summary>
|
||||
|
||||
<card-summary>
|
||||
Generate useful Kotlin definitions based on your DataFrame structure.
|
||||
</card-summary>
|
||||
|
||||
<link-summary>
|
||||
Generate useful Kotlin definitions based on your DataFrame structure.
|
||||
</link-summary>
|
||||
|
||||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Generate-->
|
||||
|
||||
Special utility functions that generate code of useful Kotlin definitions (returned as a `String`)
|
||||
based on the current `DataFrame` schema.
|
||||
|
||||
## generateDataClasses
|
||||
|
||||
```kotlin
|
||||
inline fun <reified T> DataFrame<T>.generateDataClasses(
|
||||
markerName: String? = null,
|
||||
extensionProperties: Boolean = false,
|
||||
visibility: MarkerVisibility = MarkerVisibility.IMPLICIT_PUBLIC,
|
||||
useFqNames: Boolean = false,
|
||||
nameNormalizer: NameNormalizer = NameNormalizer.default,
|
||||
): CodeString
|
||||
```
|
||||
|
||||
Generates Kotlin data classes corresponding to the `DataFrame` schema
|
||||
(including all nested `DataFrame` columns and column groups).
|
||||
|
||||
Useful when you want to:
|
||||
|
||||
- Work with the data as regular Kotlin data classes.
|
||||
- Convert a dataframe to instantiated data classes with `df.toListOf<DataClassType>()`.
|
||||
- Work with data classes serialization.
|
||||
- Extract structured types for further use in your application.
|
||||
|
||||
### Arguments {id="generateDataClasses-arguments"}
|
||||
|
||||
* `markerName`: `String?` — The base name to use for generated data classes.
|
||||
If `null`, uses the `T` type argument of `DataFrame` simple name.
|
||||
Default: `null`.
|
||||
* `extensionProperties`: `Boolean` – Whether to generate [extension properties](extensionPropertiesApi.md)
|
||||
in addition to `interface` declarations.
|
||||
Useful if you don't use the [compiler plugin](Compiler-Plugin.md), otherwise they are not needed;
|
||||
the [compiler plugin](Compiler-Plugin.md), [notebooks](SetupKotlinNotebook.md),
|
||||
and older [Gradle/KSP plugin](schemasGradle.md) generate them automatically.
|
||||
Default: `false`.
|
||||
* `visibility`: `MarkerVisibility` – Visibility modifier for the generated declarations.
|
||||
Default: `MarkerVisibility.IMPLICIT_PUBLIC`.
|
||||
* `useFqNames`: `Boolean` – If `true`, fully qualified type names will be used in generated code.
|
||||
Default: `false`.
|
||||
* `nameNormalizer`: `NameNormalizer` – Strategy for converting column names (with spaces, underscores, etc.) to
|
||||
Kotlin-style identifiers.
|
||||
Generated properties will still refer to columns by their actual name using the `@ColumnName` annotation.
|
||||
Default: `NameNormalizer.default`.
|
||||
|
||||
### Returns {id="generateDataClasses-returns"}
|
||||
|
||||
* `CodeString` – A value class wrapper for `String`, containing
|
||||
the generated Kotlin code of `data class` declarations and optionally [extension properties](extensionPropertiesApi.md).
|
||||
|
||||
### Examples {id="generateDataClasses-examples"}
|
||||
|
||||
<!---FUN notebook_test_generate_docs_4-->
|
||||
|
||||
```kotlin
|
||||
df.generateDataClasses("Customer")
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
Output:
|
||||
|
||||
```kotlin
|
||||
@DataSchema
|
||||
data class Customer1(
|
||||
val amount: Double,
|
||||
val orderId: Int
|
||||
)
|
||||
|
||||
@DataSchema
|
||||
data class Customer(
|
||||
val orders: List<Customer1>,
|
||||
val user: String
|
||||
)
|
||||
```
|
||||
|
||||
Add these classes to your project and convert the DataFrame to a list of typed objects:
|
||||
|
||||
<!---FUN notebook_test_generate_docs_5-->
|
||||
|
||||
```kotlin
|
||||
val customers: List<Customer> = df.cast<Customer>().toList()
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
## generateInterfaces
|
||||
|
||||
```kotlin
|
||||
inline fun <reified T> DataFrame<T>.generateInterfaces(): CodeString
|
||||
|
||||
fun <T> DataFrame<T>.generateInterfaces(markerName: String): CodeString
|
||||
```
|
||||
|
||||
Generates [`@DataSchema`](schemas.md) interfaces for this `DataFrame`
|
||||
(including all nested `DataFrame` columns and column groups) as Kotlin interfaces.
|
||||
|
||||
This is useful when working with the [compiler plugin](Compiler-Plugin.md)
|
||||
in cases where the schema cannot be inferred automatically from the source.
|
||||
|
||||
### Arguments {id="generateInterfaces-arguments"}
|
||||
|
||||
* `markerName`: `String?` — The base name to use for generated interfaces.
|
||||
If `null`, uses the `T` type argument of `DataFrame` simple name.
|
||||
Default: `null`.
|
||||
* `extensionProperties`: `Boolean` – Whether to generate [extension properties](extensionPropertiesApi.md)
|
||||
in addition to `interface` declarations.
|
||||
Useful if you don't use the [compiler plugin](Compiler-Plugin.md), otherwise they are not needed;
|
||||
the [compiler plugin](Compiler-Plugin.md), [notebooks](SetupKotlinNotebook.md),
|
||||
and older [Gradle/KSP plugin](schemasGradle.md) generate them automatically.
|
||||
Default: `false`.
|
||||
* `visibility`: `MarkerVisibility` – Visibility modifier for the generated declarations.
|
||||
Default: `MarkerVisibility.IMPLICIT_PUBLIC`.
|
||||
* `useFqNames`: `Boolean` – If `true`, fully qualified type names will be used in generated code.
|
||||
Default: `false`.
|
||||
* `nameNormalizer`: `NameNormalizer` – Strategy for converting column names (with spaces, underscores, etc.) to
|
||||
Kotlin-style identifiers.
|
||||
Generated properties will still refer to columns by their actual name using the `@ColumnName` annotation.
|
||||
Default: `NameNormalizer.default`.
|
||||
|
||||
### Returns {id="generateInterfaces-returns"}
|
||||
|
||||
* `CodeString` – A value class wrapper for `String`, containing
|
||||
the generated Kotlin code of `@DataSchema` interfaces without [extension properties](extensionPropertiesApi.md).
|
||||
|
||||
### Examples {id="generateInterfaces-examples"}
|
||||
|
||||
<!---FUN notebook_test_generate_docs_1-->
|
||||
|
||||
```kotlin
|
||||
df
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_generate_docs_1.html" width="100%" height="500px"></inline-frame>
|
||||
|
||||
<!---FUN notebook_test_generate_docs_2-->
|
||||
|
||||
```kotlin
|
||||
df.generateInterfaces()
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
Output:
|
||||
|
||||
```kotlin
|
||||
@DataSchema(isOpen = false)
|
||||
interface _DataFrameType11 {
|
||||
val amount: kotlin.Double
|
||||
val orderId: kotlin.Int
|
||||
}
|
||||
|
||||
@DataSchema
|
||||
interface _DataFrameType1 {
|
||||
val orders: List<_DataFrameType11>
|
||||
val user: kotlin.String
|
||||
}
|
||||
```
|
||||
|
||||
By adding these interfaces to your project with the [compiler plugin](Compiler-Plugin.md) enabled,
|
||||
you'll gain full support for the [extension properties API](extensionPropertiesApi.md) and type-safe operations.
|
||||
|
||||
Use [`cast`](cast.md) to apply the generated schema to a `DataFrame`:
|
||||
|
||||
<!---FUN notebook_test_generate_docs_3-->
|
||||
|
||||
```kotlin
|
||||
df.cast<_DataFrameType1>().filter { orders.all { orderId >= 102 } }
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<!--inline-frame src="./resources/notebook_test_generate_docs_3.html" width="100%" height="500px"></inline-frame>-->
|
||||
|
||||
|
||||
@@ -0,0 +1,53 @@
|
||||
# Migration from Gradle/KSP Plugin
|
||||
|
||||
Gradle and KSP plugins were useful tools in earlier versions of Kotlin DataFrame.
|
||||
However, they are now being phased out. This section provides an overview of their current state and migration guidance.
|
||||
|
||||
## Gradle Plugin
|
||||
|
||||
> Do not confuse this with the [compiler plugin](Compiler-Plugin.md), which is a Kotlin compiler plugin
|
||||
> and has a different plugin ID.
|
||||
> {style="note"}
|
||||
|
||||
1. **Generation of [data schemas](schemas.md)** from data sources
|
||||
(files, databases, or external URLs).
|
||||
- You could copy already generated schemas from `build/generate` into your project sources.
|
||||
- To generate a `DataSchema` for a [`DataFrame`](DataFrame.md) now, use
|
||||
the [`generate..()` methods](DataSchemaGenerationMethods.md).
|
||||
|
||||
2. **Generation of [extension properties](extensionPropertiesApi.md)** from data schemas
|
||||
This is now handled by the [compiler plugin](Compiler-Plugin.md), which:
|
||||
- Generates extension properties for declared data schemas.
|
||||
- Automatically updates the schema and regenerates properties after structural DataFrame operations.
|
||||
|
||||
> The Gradle plugin still works and may be helpful for generating schemas from data sources.
|
||||
> However, it is planned for deprecation, and **we do not recommend using it going forward**.
|
||||
> {style="warning"}
|
||||
|
||||
If you still choose to use Gradle plugin, make sure to disable the automatic KSP plugin dependency
|
||||
to avoid compatibility issues with Kotlin 2.1+ by adding this line to `gradle.properties`:
|
||||
|
||||
```properties
|
||||
kotlin.dataframe.add.ksp=false
|
||||
```
|
||||
|
||||
## KSP Plugin
|
||||
|
||||
- **Generation of [data schemas](schemas.md)** from data sources
|
||||
(files, databases, or external URLs).
|
||||
- You could copy already generated schemas from `build/generate/ksp` into your project sources.
|
||||
- To generate a `DataSchema` for a [`DataFrame`](DataFrame.md) now, use the
|
||||
[`generate..()` methods](DataSchemaGenerationMethods.md) instead.
|
||||
|
||||
> The KSP plugin is **not compatible with [KSP2](https://github.com/google/ksp?tab=readme-ov-file#ksp2-is-here)**
|
||||
> and may **not work properly with Kotlin 2.1 or newer**.
|
||||
> It is planned for deprecation or major changes, and **we do not recommend using it at this time**.
|
||||
> {style="warning"}
|
||||
|
||||
If you still choose to use the KSP plugin with Kotlin 2.1+,
|
||||
disable [KSP2](https://github.com/google/ksp?tab=readme-ov-file#ksp2-is-here)
|
||||
by adding this line to `gradle.properties`:
|
||||
|
||||
```properties
|
||||
ksp.useKSP2=false
|
||||
```
|
||||
@@ -0,0 +1,234 @@
|
||||
[//]: # (title: Gradle Plugin (deprecated))
|
||||
|
||||
> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
|
||||
>
|
||||
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
|
||||
{style="warning"}
|
||||
|
||||
This page describes the Gradle plugin that generates `@DataSchema` from data samples.
|
||||
```Kotlin
|
||||
id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
|
||||
```
|
||||
|
||||
It's different from the DataFrame compiler plugin:
|
||||
```kotlin
|
||||
kotlin("plugin.dataframe") version "%compilerPluginKotlinVersion%"
|
||||
```
|
||||
|
||||
Gradle plugin by default adds a KSP annotation processor to your build:
|
||||
|
||||
```kotlin
|
||||
ksp("org.jetbrains.kotlinx.dataframe:symbol-processor-all:%dataFrameVersion%")
|
||||
```
|
||||
|
||||
You should disable it if you want to use the Gradle plugin together with the compiler plugin.
|
||||
|
||||
Add this to `gradle.properties`:
|
||||
```properties
|
||||
kotlin.dataframe.add.ksp=false
|
||||
```
|
||||
|
||||
## Examples
|
||||
In the best scenario, your schema could be defined as simple as this:
|
||||
```kotlin
|
||||
dataframes {
|
||||
// output: build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt
|
||||
schema {
|
||||
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
|
||||
}
|
||||
}
|
||||
```
|
||||
Note that the name of the file and the interface are normalized: split by '_' and ' ' and joined to CamelCase.
|
||||
You can set parsing options for CSV:
|
||||
```kotlin
|
||||
dataframes {
|
||||
// output: build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt
|
||||
schema {
|
||||
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
|
||||
csvOptions {
|
||||
delimiter = ','
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
In this case, the output path will depend on your directory structure.
|
||||
For project with package `org.example` path will be `build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt
|
||||
`.
|
||||
|
||||
Note that the name of the Kotlin file is derived from the name of the data file with the suffix
|
||||
`.Generated` and the package
|
||||
is derived from the directory structure with child directory `dataframe`.
|
||||
|
||||
The name of the **data schema** itself is `JetbrainsRepositories`.
|
||||
You could specify it explicitly:
|
||||
|
||||
```kotlin
|
||||
schema {
|
||||
// output: build/generated/dataframe/main/kotlin/org/example/dataframe/MyName.Generated.kt
|
||||
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
|
||||
name = "MyName"
|
||||
}
|
||||
```
|
||||
|
||||
If you want to change the default package for all schemas:
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
packageName = "org.example"
|
||||
// Schemas...
|
||||
}
|
||||
```
|
||||
|
||||
Then you can set packageName for specific schema exclusively:
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
// output: build/generated/dataframe/main/kotlin/org/example/data/OtherName.Generated.kt
|
||||
schema {
|
||||
packageName = "org.example.data"
|
||||
data = file("path/to/data.csv")
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If you want non-default name and package, consider using fully qualified name:
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
// output: build/generated/dataframe/main/kotlin/org/example/data/OtherName.Generated.kt
|
||||
schema {
|
||||
name = "org.example.data.OtherName"
|
||||
data = file("path/to/data.csv")
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
By default, the plugin will generate output in a specified source set.
|
||||
Source set could be specified for all schemas or for specific schema:
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
packageName = "org.example"
|
||||
sourceSet = "test"
|
||||
// output: build/generated/dataframe/test/kotlin/org/example/Data.Generated.kt
|
||||
schema {
|
||||
data = file("path/to/data.csv")
|
||||
}
|
||||
// output: build/generated/dataframe/integrationTest/kotlin/org/example/Data.Generated.kt
|
||||
schema {
|
||||
sourceSet = "integrationTest"
|
||||
data = file("path/to/data.csv")
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If you need the generated files to be put in another directory, set `src`:
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
// output: schemas/org/example/test/OtherName.Generated.kt
|
||||
schema {
|
||||
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
|
||||
name = "org.example.test.OtherName"
|
||||
src = file("schemas")
|
||||
}
|
||||
}
|
||||
```
|
||||
## Schema Definitions from SQL Databases
|
||||
|
||||
To generate a schema for an existing SQL table,
|
||||
you need to define a few parameters to establish a JDBC connection:
|
||||
URL (passing to `data` field), username, and password.
|
||||
|
||||
Also, the `tableName` parameter should be specified to convert the data from the table with that name to the dataframe.
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
schema {
|
||||
data = "jdbc:mariadb://localhost:3306/imdb"
|
||||
name = "org.example.imdb.Actors"
|
||||
jdbcOptions {
|
||||
user = "root"
|
||||
password = "pass"
|
||||
tableName = "actors"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To generate a schema for the result of an SQL query,
|
||||
you need to define the same parameters as before together with the SQL query to establish connection.
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
schema {
|
||||
data = "jdbc:mariadb://localhost:3306/imdb"
|
||||
name = "org.example.imdb.TarantinoFilms"
|
||||
jdbcOptions {
|
||||
user = "root"
|
||||
password = "pass"
|
||||
sqlQuery = """
|
||||
SELECT name, year, rank,
|
||||
GROUP_CONCAT (genre) as "genres"
|
||||
FROM movies JOIN movies_directors ON movie_id = movies.id
|
||||
JOIN directors ON directors.id=director_id LEFT JOIN movies_genres ON movies.id = movies_genres.movie_id
|
||||
WHERE directors.first_name = "Quentin" AND directors.last_name = "Tarantino"
|
||||
GROUP BY name, year, rank
|
||||
ORDER BY year
|
||||
"""
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Find full example code [here](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples/blob/master/src/main/kotlin/Example_3_Import_schema_via_Gradle.kt).
|
||||
|
||||
**NOTE:** This is an experimental functionality and, for now,
|
||||
we only support these databases: MariaDB, MySQL, PostgreSQL, SQLite, MS SQL, and DuckDB.
|
||||
|
||||
Additionally, support for JSON and date-time types is limited.
|
||||
Please take this into consideration when using these functions.
|
||||
|
||||
## DSL reference
|
||||
Inside `dataframes` you can configure parameters that will apply to all schemas.
|
||||
Configuration inside `schema` will override these defaults for a specific schema.
|
||||
Here is the full DSL for declaring data schemas:
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
sourceSet = "mySources" // [optional; default: "main"]
|
||||
packageName = "org.jetbrains.data" // [optional; default: common package under source set]
|
||||
|
||||
visibility = // [optional; default: if explicitApiMode enabled then EXPLICIT_PUBLIC, else IMPLICIT_PUBLIC]
|
||||
// KOTLIN SCRIPT: DataSchemaVisibility.INTERNAL DataSchemaVisibility.IMPLICIT_PUBLIC, DataSchemaVisibility.EXPLICIT_PUBLIC
|
||||
// GROOVY SCRIPT: 'internal', 'implicit_public', 'explicit_public'
|
||||
|
||||
withoutDefaultPath() // disable a default path for all schemas
|
||||
// i.e., plugin won't copy "data" property of the schemas to generated companion objects
|
||||
|
||||
// split property names by delimiters (arguments of this method), lowercase parts and join to camel case
|
||||
// enabled by default
|
||||
withNormalizationBy('_') // [optional: default: ['\t', '_', ' ']]
|
||||
withoutNormalization() // disable property names normalization
|
||||
|
||||
schema {
|
||||
sourceSet /* String */ = "…" // [optional; override default]
|
||||
packageName /* String */ = "…" // [optional; override default]
|
||||
visibility /* DataSchemaVisibility */ = "…" // [optional; override default]
|
||||
src /* File */ = file("…") // [optional; default: file("build/generated/dataframe/$sourceSet/kotlin")]
|
||||
|
||||
data /* URL | File | String */ = "…" // Data in JSON or CSV formats
|
||||
name = "org.jetbrains.data.Person" // [optional; default: from filename]
|
||||
csvOptions {
|
||||
delimiter /* Char */ = ';' // [optional; default: ',']
|
||||
}
|
||||
|
||||
// See names normalization
|
||||
withNormalizationBy('_') // enable property names normalization for this schema and use these delimiters
|
||||
withoutNormalization() // disable property names normalization for this schema
|
||||
|
||||
withoutDefaultPath() // disable the default path for this schema
|
||||
withDefaultPath() // enable the default path for this schema
|
||||
}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,157 @@
|
||||
[//]: # (title: Data Schemas in Gradle projects)
|
||||
|
||||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
|
||||
|
||||
> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
|
||||
>
|
||||
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
|
||||
{style="warning"}
|
||||
|
||||
In Gradle projects, the Kotlin DataFrame library provides
|
||||
|
||||
1. Annotation processing for generation of extension properties
|
||||
2. Annotation processing for [`DataSchema`](schemas.md) inference from datasets.
|
||||
3. Gradle task for [`DataSchema`](schemas.md) inference from datasets.
|
||||
|
||||
### Configuration
|
||||
|
||||
To use the [extension properties API](extensionPropertiesApi.md) in Gradle project add the `dataframe` plugin as follows:
|
||||
|
||||
<tabs>
|
||||
<tab title="Kotlin DSL">
|
||||
|
||||
```kotlin
|
||||
plugins {
|
||||
id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
|
||||
}
|
||||
|
||||
dependencies {
|
||||
implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
|
||||
}
|
||||
```
|
||||
|
||||
</tab>
|
||||
|
||||
<tab title="Groovy DSL">
|
||||
|
||||
```groovy
|
||||
plugins {
|
||||
id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
|
||||
}
|
||||
|
||||
dependencies {
|
||||
implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%'
|
||||
}
|
||||
```
|
||||
|
||||
</tab>
|
||||
|
||||
</tabs>
|
||||
|
||||
### Annotation processing
|
||||
|
||||
Declare data schemas in your code and use them to access data in [`DataFrame`](DataFrame.md) objects.
|
||||
A data schema is a class or interface annotated with [`@DataSchema`](schemas.md):
|
||||
|
||||
```kotlin
|
||||
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
|
||||
|
||||
@DataSchema
|
||||
interface Person {
|
||||
val name: String
|
||||
val age: Int
|
||||
}
|
||||
```
|
||||
|
||||
#### Execute the `assemble` task to generate type-safe accessors for schemas:
|
||||
|
||||
<!---FUN useProperties-->
|
||||
|
||||
```kotlin
|
||||
val df = dataFrameOf("name", "age")(
|
||||
"Alice", 15,
|
||||
"Bob", 20,
|
||||
).cast<Person>()
|
||||
// age only available after executing `build` or `kspKotlin`!
|
||||
val teens = df.filter { age in 10..19 }
|
||||
teens.print()
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
### Schema inference
|
||||
|
||||
Specify schema with preferred method and execute the `assemble` task.
|
||||
|
||||
<tabs>
|
||||
<tab title="Method 1. Annotation processing">
|
||||
|
||||
`@ImportDataSchema` annotation must be above package directive.
|
||||
You can import schemas from a URL or from the relative path of a file.
|
||||
Relative path by default is resolved to the project root directory.
|
||||
You can configure it by [passing](https://kotlinlang.org/docs/ksp-quickstart.html#pass-options-to-processors) `dataframe.resolutionDir`
|
||||
option to preprocessor.
|
||||
For example:
|
||||
|
||||
```kotlin
|
||||
ksp {
|
||||
arg("dataframe.resolutionDir", file("data").absolutePath)
|
||||
}
|
||||
```
|
||||
|
||||
**Note that due to incremental processing, imported schema will be re-generated only if some source code has changed
|
||||
from the previous invocation, at least one character.**
|
||||
|
||||
For the following configuration, file `Repository.Generated.kt` will be generated to `build/generated/ksp/` folder in
|
||||
the same package as file containing the annotation.
|
||||
|
||||
```kotlin
|
||||
@file:ImportDataSchema(
|
||||
"Repository",
|
||||
"https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
|
||||
)
|
||||
|
||||
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
|
||||
import org.jetbrains.kotlinx.dataframe.api.*
|
||||
```
|
||||
|
||||
See KDocs for `@ImportDataSchema` in IDE
|
||||
or [GitHub](https://github.com/Kotlin/dataframe/blob/master/core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/annotations/ImportDataSchema.kt)
|
||||
for more details.
|
||||
|
||||
</tab>
|
||||
|
||||
<tab title="Method 2. Gradle task">
|
||||
|
||||
Put this in `build.gradle` or `build.gradle.kts`
|
||||
For the following configuration, file `Repository.Generated.kt` will be generated
|
||||
to `build/generated/dataframe/org/example` folder.
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
schema {
|
||||
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
|
||||
name = "org.example.Repository"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
See [reference](Gradle-Plugin.md) and [examples](Gradle-Plugin.md#examples) for more details.
|
||||
|
||||
</tab>
|
||||
</tabs>
|
||||
|
||||
After `assemble`, the following code should compile and run:
|
||||
|
||||
<!---FUN useInferredSchema-->
|
||||
|
||||
```kotlin
|
||||
// Repository.readCsv() has argument 'path' with default value https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv
|
||||
val df = Repository.readCsv()
|
||||
// Use generated properties to access data in rows
|
||||
df.maxBy { stargazersCount }.print()
|
||||
// Or to access columns in dataframe.
|
||||
print(df.fullName.count { it.contains("kotlin") })
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
+75
@@ -0,0 +1,75 @@
|
||||
[//]: # (title: Import OpenAPI Schemas in Gradle project (Experimental))
|
||||
|
||||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
|
||||
|
||||
> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
|
||||
>
|
||||
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
|
||||
{style="warning"}
|
||||
|
||||
<warning>
|
||||
OpenAPI 3.0.0 schema support is marked as experimental. It might change or be removed in the future.
|
||||
</warning>
|
||||
|
||||
JSON schema inference is great, but it's not perfect. However, more and more APIs offer
|
||||
[OpenAPI (Swagger)](https://swagger.io/) specifications.
|
||||
|
||||
Aside from API endpoints, they also hold
|
||||
[Data Models](https://swagger.io/docs/specification/data-models/) which include all the information about the types
|
||||
that can be returned from or supplied to the API.
|
||||
|
||||
Why should we reinvent the wheel and write our own schema inference
|
||||
when we can use the one provided by the API?
|
||||
|
||||
Not only will we now get the proper names of the types, but we will also
|
||||
get enums, correct inheritance and overall better type safety.
|
||||
|
||||
First of all, you will need the extra dependency:
|
||||
|
||||
```kotlin
|
||||
implementation("org.jetbrains.kotlinx:dataframe-openapi:$dataframe_version")
|
||||
```
|
||||
|
||||
OpenAPI type schemas can be generated using both methods described above:
|
||||
|
||||
```kotlin
|
||||
@file:ImportDataSchema(
|
||||
path = "https://petstore3.swagger.io/api/v3/openapi.json",
|
||||
name = "PetStore",
|
||||
enableExperimentalOpenApi = true,
|
||||
)
|
||||
|
||||
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
|
||||
```
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
schema {
|
||||
data = "https://petstore3.swagger.io/api/v3/openapi.json"
|
||||
name = "PetStore"
|
||||
}
|
||||
enableExperimentalOpenApi = true
|
||||
}
|
||||
```
|
||||
|
||||
The only difference is that the name provided is now irrelevant, since the type names are provided by the OpenAPI spec.
|
||||
(If you were wondering, yes, the Kotlin DataFrame library can tell the difference between an OpenAPI spec and normal JSON data)
|
||||
|
||||
After importing the data schema, you can now start to import any JSON data you like using the generated schemas.
|
||||
For instance, one of the types in the schema above is `PetStore.Pet` (which can also be
|
||||
explored [here](https://petstore3.swagger.io/)),
|
||||
so let's parse some Pets:
|
||||
|
||||
```kotlin
|
||||
val df: DataFrame<PetStore.Pet> =
|
||||
PetStore.Pet.readJson("https://petstore3.swagger.io/api/v3/pet/findByStatus?status=available")
|
||||
```
|
||||
|
||||
Now you will have a correctly typed [`DataFrame`](DataFrame.md)!
|
||||
|
||||
You can also always ctrl+click on the `PetStore.Pet` type to see all the generated schemas.
|
||||
|
||||
If you experience any issues with the OpenAPI support (since there are many gotchas and edge-cases when converting
|
||||
something as
|
||||
type-fluid as JSON to a strongly typed language), please open an issue on
|
||||
the [GitHub repo](https://github.com/Kotlin/dataframe/issues).
|
||||
@@ -0,0 +1,141 @@
|
||||
[//]: # (title: Import SQL Metadata as a Schema in Gradle Project)
|
||||
|
||||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
|
||||
|
||||
> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
|
||||
>
|
||||
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
|
||||
{style="warning"}
|
||||
|
||||
Each SQL database contains the metadata for all the tables.
|
||||
This metadata could be used for the schema generation.
|
||||
|
||||
**NOTE:** Visit this [page](readSqlDatabases.md) to see how to set up all Gradle dependencies for your project.
|
||||
|
||||
### With `@file:ImportDataSchema`
|
||||
|
||||
To generate schema for existing SQL table,
|
||||
you need to define a few parameters to establish JDBC connection:
|
||||
URL, username, and password.
|
||||
|
||||
Also, the `tableName` parameter could be specified.
|
||||
|
||||
You should also specify the name of the generated Kotlin class
|
||||
as the first parameter of the annotation `@file:ImportDataSchema`.
|
||||
|
||||
```kotlin
|
||||
@file:ImportDataSchema(
|
||||
"Directors",
|
||||
URL,
|
||||
jdbcOptions = JdbcOptions(USER_NAME, PASSWORD, tableName = TABLE_NAME_DIRECTORS)
|
||||
)
|
||||
|
||||
package org.jetbrains.kotlinx.dataframe.examples.jdbc
|
||||
|
||||
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
|
||||
```
|
||||
|
||||
```kotlin
|
||||
const val URL = "jdbc:mariadb://localhost:3306/imdb"
|
||||
|
||||
const val USER_NAME = "root"
|
||||
|
||||
const val PASSWORD = "pass"
|
||||
|
||||
const val TABLE_NAME_DIRECTORS = "directors"
|
||||
```
|
||||
To generate schema for the result of an SQL query,
|
||||
you need to define the SQL query itself
|
||||
and the same parameters to establish connection with the database.
|
||||
|
||||
You should also specify the name of the generated Kotlin class
|
||||
as a first parameter of annotation `@file:ImportDataSchema`.
|
||||
|
||||
```kotlin
|
||||
@file:ImportDataSchema(
|
||||
"NewActors",
|
||||
URL,
|
||||
jdbcOptions = JdbcOptions(USER_NAME, PASSWORD, sqlQuery = ACTORS_IN_LATEST_MOVIES)
|
||||
)
|
||||
|
||||
package org.jetbrains.kotlinx.dataframe.examples.jdbc
|
||||
|
||||
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
|
||||
```
|
||||
|
||||
```kotlin
|
||||
const val URL = "jdbc:mariadb://localhost:3306/imdb"
|
||||
|
||||
const val USER_NAME = "root"
|
||||
|
||||
const val PASSWORD = "pass"
|
||||
|
||||
const val ACTORS_IN_LATEST_MOVIES = """
|
||||
SELECT a.first_name, a.last_name, r.role, m.name AS movie_name, m.year
|
||||
FROM actors a
|
||||
INNER JOIN roles r ON a.id = r.actor_id
|
||||
INNER JOIN movies m ON m.id = r.movie_id
|
||||
WHERE m.year > 2000
|
||||
"""
|
||||
```
|
||||
|
||||
Find full example code [here](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples/blob/master/src/main/kotlin/Example_2_Import_schema_annotation.kt).
|
||||
|
||||
### With Gradle Task
|
||||
|
||||
To generate a schema for an existing SQL table,
|
||||
you need to define a few parameters to establish a JDBC connection:
|
||||
URL (passing to `data` field), username, and password.
|
||||
|
||||
Also, the `tableName` parameter should be specified to convert the data from the table with that name to the [`DataFrame`](DataFrame.md).
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
schema {
|
||||
data = "jdbc:mariadb://localhost:3306/imdb"
|
||||
name = "org.example.imdb.Actors"
|
||||
jdbcOptions {
|
||||
user = "root"
|
||||
password = "pass"
|
||||
tableName = "actors"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To generate a schema for the result of an SQL query,
|
||||
you need to define the same parameters as before together with the SQL query to establish connection.
|
||||
|
||||
|
||||
```kotlin
|
||||
dataframes {
|
||||
schema {
|
||||
data = "jdbc:mariadb://localhost:3306/imdb"
|
||||
name = "org.example.imdb.TarantinoFilms"
|
||||
jdbcOptions {
|
||||
user = "root"
|
||||
password = "pass"
|
||||
sqlQuery = """
|
||||
SELECT name, year, rank,
|
||||
GROUP_CONCAT (genre) as "genres"
|
||||
FROM movies JOIN movies_directors ON movie_id = movies.id
|
||||
JOIN directors ON directors.id=director_id LEFT JOIN movies_genres ON movies.id = movies_genres.movie_id
|
||||
WHERE directors.first_name = "Quentin" AND directors.last_name = "Tarantino"
|
||||
GROUP BY name, year, rank
|
||||
ORDER BY year
|
||||
"""
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Find full example code [here](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples/blob/master/src/main/kotlin/Example_3_Import_schema_via_Gradle.kt).
|
||||
|
||||
After importing the data schema, you can start to import any data from SQL table or as a result of an SQL query
|
||||
you like using the generated schemas.
|
||||
|
||||
Now you will have a correctly typed [`DataFrame`](DataFrame.md)!
|
||||
|
||||
If you experience any issues with the SQL databases support (since there are many edge-cases when converting
|
||||
SQL types from different databases to Kotlin types), please open an issue on
|
||||
the [GitHub repo](https://github.com/Kotlin/dataframe/issues), specifying the database and the problem.
|
||||
@@ -0,0 +1,147 @@
|
||||
[//]: # (title: Data Schemas)
|
||||
|
||||
The Kotlin DataFrame library provides typed data access via
|
||||
[generation of extension properties](extensionPropertiesApi.md) for the type
|
||||
[`DataFrame<T>`](DataFrame.md) (as well as for [`DataRow<T>`](DataRow.md)), where
|
||||
`T` is a marker class representing the `DataSchema` of the [`DataFrame`](DataFrame.md).
|
||||
|
||||
A *schema* of a [`DataFrame`](DataFrame.md) is a mapping from column names to column types.
|
||||
This data schema can be expressed as a Kotlin class or interface.
|
||||
If the DataFrame is hierarchical — contains a [column group](DataColumn.md#columngroup) or a
|
||||
[column of dataframes](DataColumn.md#framecolumn) — the data schema reflects this structure,
|
||||
with a separate class representing the schema of each column group or nested `DataFrame`.
|
||||
|
||||
For example, consider a simple hierarchical DataFrame from
|
||||
<resource src="example.csv"></resource>.
|
||||
|
||||
This DataFrame consists of two columns:
|
||||
- `name`, which is a `String` column
|
||||
- `info`, which is a [column group](DataColumn.md#columngroup) containing two nested [value columns](DataColumn.md#valuecolumn):
|
||||
- `age` of type `Int`
|
||||
- `height` of type `Double`
|
||||
|
||||
<table width="705">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>name</th>
|
||||
<th colspan="2">info</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<th></th>
|
||||
<th>age</th>
|
||||
<th>height</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>Alice</td>
|
||||
<td>23</td>
|
||||
<td>175.5</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Bob</td>
|
||||
<td>27</td>
|
||||
<td>160.2</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
The data schema corresponding to this DataFrame can be represented as:
|
||||
|
||||
```kotlin
|
||||
// Data schema of the "info" column group
|
||||
@DataSchema
|
||||
data class Info(
|
||||
val age: Int,
|
||||
val height: Float
|
||||
)
|
||||
|
||||
// Data schema of the entire DataFrame
|
||||
@DataSchema
|
||||
data class Person(
|
||||
val info: Info,
|
||||
val name: String
|
||||
)
|
||||
```
|
||||
|
||||
[Extension properties](extensionPropertiesApi.md) for `DataFrame<Person>`
|
||||
are generated based on this schema and allow accessing columns
|
||||
or using them in operations:
|
||||
|
||||
```kotlin
|
||||
// Assuming `df` has type `DataFrame<Person>`
|
||||
|
||||
// Get "age" column from "info" group
|
||||
df.info.age
|
||||
|
||||
// Select "name" and "height" columns
|
||||
df.select { name and info.height }
|
||||
|
||||
// Filter rows by "age"
|
||||
df.filter { age >= 18 }
|
||||
```
|
||||
|
||||
See [](extensionPropertiesApi.md) for more information.
|
||||
|
||||
|
||||
## Schema Retrieving
|
||||
|
||||
Defining a data schema manually can be difficult, especially for dataframes with many columns or deeply nested
|
||||
structures, and may lead to mistakes in column names or types.
|
||||
Kotlin DataFrame provides several methods for generating data schemas.
|
||||
|
||||
* [**`generate..()` methods**](DataSchemaGenerationMethods.md) are extensions for [`DataFrame`](DataFrame.md)
|
||||
(or for its [`schema`](schema.md)) that generate a code string representing its `DataSchema`.
|
||||
|
||||
* [**Kotlin DataFrame Compiler Plugin**](Compiler-Plugin.md) **cannot automatically infer** a
|
||||
data schema from external sources such as files or URLs.
|
||||
However, it **can** infer the schema if you construct the [`DataFrame`](DataFrame.md)
|
||||
manually — that is, by explicitly declaring the columns using the API.
|
||||
It will also **automatically update** the schema during operations that modify the structure of the DataFrame.
|
||||
|
||||
> For best results when working with the Compiler Plugin, it's recommended to
|
||||
> generate the initial schema using one of
|
||||
> the [`generate..()` methods](DataSchemaGenerationMethods.md).
|
||||
> Once generated, the Compiler Plugin will automatically keep the schema up to date
|
||||
> after any operations that change the structure of the DataFrame.
|
||||
|
||||
### Plugins
|
||||
|
||||
> The current Gradle plugin is **under consideration for deprecation** and
|
||||
> may be officially marked as deprecated in future releases.
|
||||
>
|
||||
> The KSP plugin is **not compatible with [KSP2](https://github.com/google/ksp?tab=readme-ov-file#ksp2-is-here)**
|
||||
> and may **not work properly with Kotlin 2.1 or newer**.
|
||||
>
|
||||
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugins.
|
||||
{style="warning"}
|
||||
|
||||
* The [Gradle plugin](Gradle-Plugin.md) allows generating a data schema automatically by specifying a source file path in the Gradle build script.
|
||||
|
||||
* The KSP plugin allows generating a data schema automatically using
|
||||
[Kotlin Symbol Processing](https://kotlinlang.org/docs/ksp-overview.html) by specifying
|
||||
a source file path in your code file.
|
||||
|
||||
## Extension Properties Generation
|
||||
|
||||
Once you have a data schema, you can generate [extension properties](extensionPropertiesApi.md).
|
||||
|
||||
The easiest and most convenient way is to use the [**Kotlin DataFrame Compiler Plugin**](Compiler-Plugin.md),
|
||||
which generates extension properties on the fly for declared data schemas
|
||||
and automatically keeps them up to date after operations
|
||||
that modify the structure of the [`DataFrame`](DataFrame.md).
|
||||
|
||||
> Extension properties generation was deprecated from the Gradle plugin in favor of the Compiler Plugin.
|
||||
> {style="warning"}
|
||||
|
||||
* When using Kotlin DataFrame inside [Kotlin Notebook](SetupKotlinNotebook.md),
|
||||
the schema and extension properties
|
||||
are generated automatically after each cell execution for all `DataFrame` variables declared in that cell.
|
||||
See [extension properties example in Kotlin Notebook](extensionPropertiesApi.md#example).
|
||||
|
||||
> Compiler Plugin is coming to Kotlin Notebook soon.
|
||||
|
||||
* If you're not using the Compiler Plugin, you can still generate
|
||||
[extension properties](extensionPropertiesApi.md) for a [`DataFrame`](DataFrame.md)
|
||||
manually by calling one of the [`generate..()` methods](DataSchemaGenerationMethods.md)
|
||||
with the `extensionProperties = true` argument.
|
||||
@@ -0,0 +1,33 @@
|
||||
[//]: # (title: Import Data Schemas, e.g. from OpenAPI, in Kotlin Notebook)
|
||||
|
||||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
|
||||
|
||||
<warning>
|
||||
OpenAPI 3.0.0 schema support is marked as experimental. It might change or be removed in the future.
|
||||
</warning>
|
||||
|
||||
Similar to [importing OpenAPI Data Schemas in Gradle projects](schemasImportOpenApiGradle.md),
|
||||
you can also do this in Kotlin Notebook.
|
||||
This requires enabling the `enableExperimentalOpenApi` setting, like:
|
||||
```
|
||||
%use dataframe(..., enableExperimentalOpenApi=true)
|
||||
```
|
||||
|
||||
There is only a slight difference in notation:
|
||||
|
||||
Import the schema using any path (`String`), `URL`, or `File`:
|
||||
|
||||
```kotlin
|
||||
val PetStore = importDataSchema("https://petstore3.swagger.io/api/v3/openapi.json")
|
||||
```
|
||||
|
||||
and then from the next cell you run and onwards, you can call, for example:
|
||||
|
||||
```kotlin
|
||||
val df = PetStore.Pet.readJson("https://petstore3.swagger.io/api/v3/pet/findByStatus?status=available")
|
||||
```
|
||||
|
||||
So, very similar indeed!
|
||||
|
||||
(Note: The type of `PetStore` will be generated as `PetStoreDataSchema`, but this doesn't affect the way you can use
|
||||
it.)
|
||||
@@ -0,0 +1,38 @@
|
||||
[//]: # (title: Schema inheritance)
|
||||
|
||||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
|
||||
|
||||
In order to reduce amount of generated code, previously generated [`DataSchema`](schema.md) interfaces are reused and only new
|
||||
properties are introduced
|
||||
|
||||
Let's filter out all `null` values from `age` column and add one more column of type `Boolean`:
|
||||
|
||||
```kotlin
|
||||
val filtered = df.filter { age != null }.add("isAdult") { age!! > 18 }
|
||||
```
|
||||
|
||||
New schema interface for `filtered` variable will be derived from previously generated `DataFrameType`:
|
||||
|
||||
```kotlin
|
||||
@DataSchema
|
||||
interface DataFrameType1 : DataFrameType
|
||||
```
|
||||
|
||||
Extension properties for data access are generated only for new and overridden members of `DataFrameType1` interface:
|
||||
|
||||
```kotlin
|
||||
val ColumnsContainer<DataFrameType1>.age: DataColumn<Int> get() = this["age"] as DataColumn<Int>
|
||||
val DataRow<DataFrameType1>.age: Int get() = this["age"] as Int
|
||||
val ColumnsContainer<DataFrameType1>.isAdult: DataColumn<Boolean> get() = this["isAdult"] as DataColumn<Boolean>
|
||||
val DataRow<DataFrameType1>.isAdult: String get() = this["isAdult"] as Boolean
|
||||
```
|
||||
|
||||
Then variable `filtered` is cast to new interface:
|
||||
|
||||
```kotlin
|
||||
val temp = filtered
|
||||
```
|
||||
|
||||
```kotlin
|
||||
val filtered = temp.cast<DataFrameType1>()
|
||||
```
|
||||
Reference in New Issue
Block a user