init research

2026-02-08 11:20:43 -10:00
commit bdf064f54d
3041 changed files with 1592200 additions and 0 deletions
@@ -0,0 +1,234 @@
+[//]: # (title: Gradle Plugin (deprecated))
+
+> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
+>
+> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
+{style="warning"}
+
+This page describes the Gradle plugin that generates `@DataSchema` from data samples.
+```Kotlin
+id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
+```
+
+It's different from the DataFrame compiler plugin:
+```kotlin
+kotlin("plugin.dataframe") version "%compilerPluginKotlinVersion%"
+```
+
+Gradle plugin by default adds a KSP annotation processor to your build:
+
+```kotlin
+ksp("org.jetbrains.kotlinx.dataframe:symbol-processor-all:%dataFrameVersion%")
+```
+
+You should disable it if you want to use the Gradle plugin together with the compiler plugin.
+
+Add this to `gradle.properties`:
+```properties
+kotlin.dataframe.add.ksp=false
+```
+
+## Examples
+In the best scenario, your schema could be defined as simple as this:
+```kotlin
+dataframes {
+    // output: build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt
+    schema {
+        data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
+    }
+}
+```
+Note that the name of the file and the interface are normalized: split by '_' and ' ' and joined to CamelCase.
+You can set parsing options for CSV:
+```kotlin
+dataframes {
+    // output: build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt
+    schema {
+        data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
+        csvOptions {
+            delimiter = ','
+        }
+    }
+}
+```
+In this case, the output path will depend on your directory structure. 
+For project with package `org.example` path will be `build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt
+`. 
+
+Note that the name of the Kotlin file is derived from the name of the data file with the suffix
+`.Generated` and the package 
+is derived from the directory structure with child directory `dataframe`.
+
+The name of the **data schema** itself is `JetbrainsRepositories`.
+You could specify it explicitly:
+
+```kotlin
+schema {
+    // output: build/generated/dataframe/main/kotlin/org/example/dataframe/MyName.Generated.kt
+    data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
+    name = "MyName"
+}
+```
+
+If you want to change the default package for all schemas:
+
+```kotlin
+dataframes {
+    packageName = "org.example"
+    // Schemas...
+}
+```
+
+Then you can set packageName for specific schema exclusively:
+
+```kotlin
+dataframes {
+    // output: build/generated/dataframe/main/kotlin/org/example/data/OtherName.Generated.kt
+    schema {
+        packageName = "org.example.data"
+        data = file("path/to/data.csv")
+    }
+}
+```
+
+If you want non-default name and package, consider using fully qualified name:
+
+```kotlin
+dataframes {
+    // output: build/generated/dataframe/main/kotlin/org/example/data/OtherName.Generated.kt
+    schema {
+        name = "org.example.data.OtherName"
+        data = file("path/to/data.csv")
+    }
+}
+```
+
+By default, the plugin will generate output in a specified source set. 
+Source set could be specified for all schemas or for specific schema:
+
+```kotlin
+dataframes {
+    packageName = "org.example"
+    sourceSet = "test"
+    // output: build/generated/dataframe/test/kotlin/org/example/Data.Generated.kt
+    schema {
+        data = file("path/to/data.csv")
+    }
+    // output: build/generated/dataframe/integrationTest/kotlin/org/example/Data.Generated.kt
+    schema {
+        sourceSet = "integrationTest"
+        data = file("path/to/data.csv")
+    }
+}
+```
+
+If you need the generated files to be put in another directory, set `src`:
+
+```kotlin
+dataframes {
+    // output: schemas/org/example/test/OtherName.Generated.kt
+    schema {
+        data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
+        name = "org.example.test.OtherName"
+        src = file("schemas")
+    }
+}
+```
+## Schema Definitions from SQL Databases
+
+To generate a schema for an existing SQL table, 
+you need to define a few parameters to establish a JDBC connection:
+URL (passing to `data` field), username, and password.
+
+Also, the `tableName` parameter should be specified to convert the data from the table with that name to the dataframe.
+
+```kotlin
+dataframes {
+    schema {
+        data = "jdbc:mariadb://localhost:3306/imdb"
+        name = "org.example.imdb.Actors"
+        jdbcOptions {
+            user = "root"
+            password = "pass" 
+            tableName = "actors"
+        }
+    }
+}
+```
+
+To generate a schema for the result of an SQL query,
+you need to define the same parameters as before together with the SQL query to establish connection.
+
+```kotlin
+dataframes {
+    schema {
+        data = "jdbc:mariadb://localhost:3306/imdb"
+        name = "org.example.imdb.TarantinoFilms"
+        jdbcOptions {
+            user = "root" 
+            password = "pass"
+            sqlQuery = """
+                SELECT name, year, rank,
+                GROUP_CONCAT (genre) as "genres"
+                FROM movies JOIN movies_directors ON movie_id = movies.id
+                JOIN directors ON directors.id=director_id LEFT JOIN movies_genres ON movies.id = movies_genres.movie_id
+                WHERE directors.first_name = "Quentin" AND directors.last_name = "Tarantino"
+                GROUP BY name, year, rank
+                ORDER BY year
+                """
+        }
+    }
+}
+```
+
+Find full example code [here](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples/blob/master/src/main/kotlin/Example_3_Import_schema_via_Gradle.kt).
+
+**NOTE:** This is an experimental functionality and, for now,
+we only support these databases: MariaDB, MySQL, PostgreSQL, SQLite, MS SQL, and DuckDB.
+
+Additionally, support for JSON and date-time types is limited.
+Please take this into consideration when using these functions.
+
+## DSL reference
+Inside `dataframes` you can configure parameters that will apply to all schemas. 
+Configuration inside `schema` will override these defaults for a specific schema.
+Here is the full DSL for declaring data schemas:
+
+```kotlin
+dataframes {
+    sourceSet = "mySources" // [optional; default: "main"]
+    packageName = "org.jetbrains.data" // [optional; default: common package under source set]
+    
+    visibility = // [optional; default: if explicitApiMode enabled then EXPLICIT_PUBLIC, else IMPLICIT_PUBLIC]
+    // KOTLIN SCRIPT: DataSchemaVisibility.INTERNAL DataSchemaVisibility.IMPLICIT_PUBLIC, DataSchemaVisibility.EXPLICIT_PUBLIC
+    // GROOVY SCRIPT: 'internal', 'implicit_public', 'explicit_public'
+        
+    withoutDefaultPath() // disable a default path for all schemas
+    // i.e., plugin won't copy "data" property of the schemas to generated companion objects
+
+    // split property names by delimiters (arguments of this method), lowercase parts and join to camel case
+    // enabled by default
+    withNormalizationBy('_') // [optional: default: ['\t', '_', ' ']]
+    withoutNormalization() // disable property names normalization
+    
+    schema {
+        sourceSet /* String */ = "…" // [optional; override default]
+        packageName /* String */ = "…" // [optional; override default]
+        visibility /* DataSchemaVisibility */ = "…" // [optional; override default]
+        src /* File */ = file("…") // [optional; default: file("build/generated/dataframe/$sourceSet/kotlin")]
+        
+        data /* URL | File | String */ = "…" // Data in JSON or CSV formats
+        name = "org.jetbrains.data.Person" // [optional; default: from filename]
+        csvOptions {
+            delimiter /* Char */ = ';' // [optional; default: ',']
+        }
+
+        // See names normalization
+        withNormalizationBy('_') // enable property names normalization for this schema and use these delimiters
+        withoutNormalization() // disable property names normalization for this schema
+        
+        withoutDefaultPath() // disable the default path for this schema
+        withDefaultPath() // enable the default path for this schema
+    }
+}
+```
@@ -0,0 +1,157 @@
+[//]: # (title: Data Schemas in Gradle projects)
+
+<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
+
+> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
+>
+> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
+{style="warning"}
+
+In Gradle projects, the Kotlin DataFrame library provides
+
+1. Annotation processing for generation of extension properties
+2. Annotation processing for [`DataSchema`](schemas.md) inference from datasets.
+3. Gradle task for [`DataSchema`](schemas.md) inference from datasets.
+
+### Configuration
+
+To use the [extension properties API](extensionPropertiesApi.md) in Gradle project add the `dataframe` plugin as follows:
+
+<tabs>
+<tab title="Kotlin DSL">
+
+```kotlin
+plugins {
+    id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
+}
+
+dependencies {
+    implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
+}
+```
+
+</tab>
+
+<tab title="Groovy DSL">
+
+```groovy
+plugins {
+    id("org.jetbrains.kotlinx.dataframe") version "%dataFrameVersion%"
+}
+
+dependencies {
+    implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%'
+}
+```
+
+</tab>
+
+</tabs>
+
+### Annotation processing
+
+Declare data schemas in your code and use them to access data in [`DataFrame`](DataFrame.md) objects.
+A data schema is a class or interface annotated with [`@DataSchema`](schemas.md):
+
+```kotlin
+import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
+
+@DataSchema
+interface Person {
+    val name: String
+    val age: Int
+}
+```
+
+#### Execute the `assemble` task to generate type-safe accessors for schemas:
+
+<!---FUN useProperties-->
+
+```kotlin
+val df = dataFrameOf("name", "age")(
+    "Alice", 15,
+    "Bob", 20,
+).cast<Person>()
+// age only available after executing `build` or `kspKotlin`!
+val teens = df.filter { age in 10..19 }
+teens.print()
+```
+
+<!---END-->
+
+### Schema inference
+
+Specify schema with preferred method and execute the `assemble` task.
+
+<tabs>
+<tab title="Method 1. Annotation processing">
+
+`@ImportDataSchema` annotation must be above package directive.
+You can import schemas from a URL or from the relative path of a file.
+Relative path by default is resolved to the project root directory.
+You can configure it by [passing](https://kotlinlang.org/docs/ksp-quickstart.html#pass-options-to-processors) `dataframe.resolutionDir`
+option to preprocessor.
+For example:
+
+```kotlin
+ksp {
+    arg("dataframe.resolutionDir", file("data").absolutePath)
+}
+```
+
+**Note that due to incremental processing, imported schema will be re-generated only if some source code has changed
+from the previous invocation, at least one character.**
+
+For the following configuration, file `Repository.Generated.kt` will be generated to `build/generated/ksp/` folder in
+the same package as file containing the annotation.
+
+```kotlin
+@file:ImportDataSchema(
+    "Repository",
+    "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
+)
+
+import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
+import org.jetbrains.kotlinx.dataframe.api.*
+```
+
+See KDocs for `@ImportDataSchema` in IDE
+or [GitHub](https://github.com/Kotlin/dataframe/blob/master/core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/annotations/ImportDataSchema.kt)
+for more details.
+
+</tab>
+
+<tab title="Method 2. Gradle task">
+
+Put this in `build.gradle` or `build.gradle.kts`
+For the following configuration, file `Repository.Generated.kt` will be generated
+to `build/generated/dataframe/org/example` folder.
+
+```kotlin
+dataframes {
+    schema {
+        data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
+        name = "org.example.Repository"
+    }
+}
+```
+
+See [reference](Gradle-Plugin.md) and [examples](Gradle-Plugin.md#examples) for more details.
+
+</tab>
+</tabs>
+
+After `assemble`, the following code should compile and run:
+
+<!---FUN useInferredSchema-->
+
+```kotlin
+// Repository.readCsv() has argument 'path' with default value https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv
+val df = Repository.readCsv()
+// Use generated properties to access data in rows
+df.maxBy { stargazersCount }.print()
+// Or to access columns in dataframe.
+print(df.fullName.count { it.contains("kotlin") })
+```
+
+<!---END-->
@@ -0,0 +1,75 @@
+[//]: # (title: Import OpenAPI Schemas in Gradle project (Experimental))
+
+<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
+
+> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
+>
+> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
+{style="warning"}
+
+<warning>
+OpenAPI 3.0.0 schema support is marked as experimental. It might change or be removed in the future.
+</warning>
+
+JSON schema inference is great, but it's not perfect. However, more and more APIs offer
+[OpenAPI (Swagger)](https://swagger.io/) specifications. 
+
+Aside from API endpoints, they also hold
+[Data Models](https://swagger.io/docs/specification/data-models/) which include all the information about the types
+that can be returned from or supplied to the API. 
+
+Why should we reinvent the wheel and write our own schema inference
+when we can use the one provided by the API? 
+
+Not only will we now get the proper names of the types, but we will also
+get enums, correct inheritance and overall better type safety.
+
+First of all, you will need the extra dependency:
+
+```kotlin
+implementation("org.jetbrains.kotlinx:dataframe-openapi:$dataframe_version")
+```
+
+OpenAPI type schemas can be generated using both methods described above:
+
+```kotlin
+@file:ImportDataSchema(
+    path = "https://petstore3.swagger.io/api/v3/openapi.json",
+    name = "PetStore",
+    enableExperimentalOpenApi = true,
+)
+
+import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
+```
+
+```kotlin
+dataframes {
+    schema {
+        data = "https://petstore3.swagger.io/api/v3/openapi.json"
+        name = "PetStore"
+    }
+    enableExperimentalOpenApi = true
+}
+```
+
+The only difference is that the name provided is now irrelevant, since the type names are provided by the OpenAPI spec.
+(If you were wondering, yes, the Kotlin DataFrame library can tell the difference between an OpenAPI spec and normal JSON data)
+
+After importing the data schema, you can now start to import any JSON data you like using the generated schemas.
+For instance, one of the types in the schema above is `PetStore.Pet` (which can also be
+explored [here](https://petstore3.swagger.io/)),
+so let's parse some Pets:
+
+```kotlin
+val df: DataFrame<PetStore.Pet> =
+    PetStore.Pet.readJson("https://petstore3.swagger.io/api/v3/pet/findByStatus?status=available")
+```
+
+Now you will have a correctly typed [`DataFrame`](DataFrame.md)!
+
+You can also always ctrl+click on the `PetStore.Pet` type to see all the generated schemas.
+
+If you experience any issues with the OpenAPI support (since there are many gotchas and edge-cases when converting
+something as
+type-fluid as JSON to a strongly typed language), please open an issue on
+the [GitHub repo](https://github.com/Kotlin/dataframe/issues).
@@ -0,0 +1,141 @@
+[//]: # (title: Import SQL Metadata as a Schema in Gradle Project)
+
+<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
+
+> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
+>
+> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead of relying on the plugin.
+{style="warning"}
+
+Each SQL database contains the metadata for all the tables. 
+This metadata could be used for the schema generation.
+
+**NOTE:** Visit this [page](readSqlDatabases.md) to see how to set up all Gradle dependencies for your project.
+
+### With `@file:ImportDataSchema`
+
+To generate schema for existing SQL table,
+you need to define a few parameters to establish JDBC connection:
+URL, username, and password.
+
+Also, the `tableName` parameter could be specified.
+
+You should also specify the name of the generated Kotlin class 
+as the first parameter of the annotation `@file:ImportDataSchema`.
+
+```kotlin
+@file:ImportDataSchema(
+    "Directors",
+    URL,
+    jdbcOptions = JdbcOptions(USER_NAME, PASSWORD, tableName = TABLE_NAME_DIRECTORS)
+)
+
+package org.jetbrains.kotlinx.dataframe.examples.jdbc
+
+import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
+```
+
+```kotlin
+const val URL = "jdbc:mariadb://localhost:3306/imdb"
+
+const val USER_NAME = "root"
+
+const val PASSWORD = "pass"
+
+const val TABLE_NAME_DIRECTORS = "directors"
+```
+To generate schema for the result of an SQL query,
+you need to define the SQL query itself
+and the same parameters to establish connection with the database.
+
+You should also specify the name of the generated Kotlin class
+as a first parameter of annotation `@file:ImportDataSchema`.
+
+```kotlin
+@file:ImportDataSchema(
+    "NewActors",
+    URL,
+    jdbcOptions = JdbcOptions(USER_NAME, PASSWORD, sqlQuery = ACTORS_IN_LATEST_MOVIES)
+)
+
+package org.jetbrains.kotlinx.dataframe.examples.jdbc
+
+import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
+```
+
+```kotlin
+const val URL = "jdbc:mariadb://localhost:3306/imdb"
+
+const val USER_NAME = "root"
+
+const val PASSWORD = "pass"
+
+const val ACTORS_IN_LATEST_MOVIES = """
+    SELECT a.first_name, a.last_name, r.role, m.name AS movie_name, m.year
+    FROM actors a
+    INNER JOIN roles r ON a.id = r.actor_id
+    INNER JOIN movies m ON m.id = r.movie_id
+    WHERE m.year > 2000
+    """
+```
+
+Find full example code [here](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples/blob/master/src/main/kotlin/Example_2_Import_schema_annotation.kt).
+
+### With Gradle Task 
+
+To generate a schema for an existing SQL table,
+you need to define a few parameters to establish a JDBC connection:
+URL (passing to `data` field), username, and password.
+
+Also, the `tableName` parameter should be specified to convert the data from the table with that name to the [`DataFrame`](DataFrame.md).
+
+```kotlin
+dataframes {
+    schema {
+        data = "jdbc:mariadb://localhost:3306/imdb"
+        name = "org.example.imdb.Actors"
+        jdbcOptions {
+            user = "root"
+            password = "pass" 
+            tableName = "actors"
+        }
+    }
+}
+```
+
+To generate a schema for the result of an SQL query,
+you need to define the same parameters as before together with the SQL query to establish connection.
+
+
+```kotlin
+dataframes {
+    schema {
+        data = "jdbc:mariadb://localhost:3306/imdb"
+        name = "org.example.imdb.TarantinoFilms"
+        jdbcOptions {
+            user = "root" 
+            password = "pass"
+            sqlQuery = """
+                SELECT name, year, rank,
+                GROUP_CONCAT (genre) as "genres"
+                FROM movies JOIN movies_directors ON movie_id = movies.id
+                JOIN directors ON directors.id=director_id LEFT JOIN movies_genres ON movies.id = movies_genres.movie_id
+                WHERE directors.first_name = "Quentin" AND directors.last_name = "Tarantino"
+                GROUP BY name, year, rank
+                ORDER BY year
+                """
+        }
+    }
+}
+```
+
+Find full example code [here](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples/blob/master/src/main/kotlin/Example_3_Import_schema_via_Gradle.kt).
+
+After importing the data schema, you can start to import any data from SQL table or as a result of an SQL query
+you like using the generated schemas.
+
+Now you will have a correctly typed [`DataFrame`](DataFrame.md)!
+
+If you experience any issues with the SQL databases support (since there are many edge-cases when converting
+SQL types from different databases to Kotlin types), please open an issue on
+the [GitHub repo](https://github.com/Kotlin/dataframe/issues), specifying the database and the problem.