init research

This commit is contained in:
2026-02-08 11:20:43 -10:00
commit bdf064f54d
3041 changed files with 1592200 additions and 0 deletions
+74
View File
@@ -0,0 +1,74 @@
# Examples of Kotlin DataFrame
### Idea examples
* [Gradle plugin example](kotlin-dataframe-plugin-gradle-example) IDEA project with a
[Kotlin DataFrame Compiler Plugin](https://kotlin.github.io/dataframe/compiler-plugin.html) example.
* [Maven plugin example](kotlin-dataframe-plugin-maven-example) IDEA project with a
[Kotlin DataFrame Compiler Plugin](https://kotlin.github.io/dataframe/compiler-plugin.html) example.
* [android example](android-example) A minimal Android project showcasing integration with Kotlin DataFrame.
Also includes [Kotlin DataFrame Compiler Plugin](https://kotlin.github.io/dataframe/compiler-plugin.html).
* [movies](idea-examples/movies) Using extension properties [Access API](https://kotlin.github.io/dataframe/apilevels.html) to perform a data cleaning task
* [titanic](idea-examples/titanic)
* [youtube](idea-examples/youtube)
* [json](idea-examples/json) Using OpenAPI support in DataFrame's Gradle and KSP plugins to access data from [API guru](https://apis.guru/) in a type-safe manner
* [imdb sql database](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples) This project prominently showcases how to convert data from an SQL table to a Kotlin DataFrame
and how to transform the result of an SQL query into a DataFrame.
* [spark-parquet-dataframe](idea-examples/spark-parquet-dataframe) This project showcases how to export data and ML models from Apache Spark via reading from Parquet files.
* [unsupported-data-sources](idea-examples/unsupported-data-sources) Showcases of how to use DataFrame with
(momentarily) unsupported data libraries such as [Spark](https://spark.apache.org/) and [Exposed](https://github.com/JetBrains/Exposed).
They show how to convert to and from Kotlin DataFrame and their respective tables.
* **JetBrains Exposed**: See the [exposed folder](./idea-examples/unsupported-data-sources/exposed)
for an example of using Kotlin DataFrame with [Exposed](https://github.com/JetBrains/Exposed).
* **Hibernate**: See the [hibernate folder](./idea-examples/unsupported-data-sources/hibernate)
for an example of using Kotlin DataFrame with [Hibernate](https://hibernate.org/orm/).
* **Apache Spark**: See the [spark folder](./idea-examples/unsupported-data-sources/spark)
for an example of using Kotlin DataFrame with [Spark](https://spark.apache.org/) and with the [Kotlin Spark API](https://github.com/JetBrains/kotlin-spark-api).
* **Multik**: See the [multik folder](./idea-examples/unsupported-data-sources/multik)
for an example of using Kotlin DataFrame with [Multik](https://github.com/Kotlin/multik).
### Notebook examples
* people ([Datalore](https://datalore.jetbrains.com/view/notebook/aOTioEClQQrsZZBKeUPAQj)) –
Small artificial dataset used in [DataFrame API examples](https://kotlin.github.io/dataframe/operations.html)
___
* puzzles ([notebook](notebooks/puzzles/40%20puzzles.ipynb)/[Datalore](https://datalore.jetbrains.com/view/notebook/CVp3br3CDXjUGaxxqfJjFF)) –
Inspired [by 100 pandas puzzles](https://github.com/ajcr/100-pandas-puzzles). You will go from the simplest tasks to
complex problems where need to think. This notebook will show you how to solve these tasks with the Kotlin
Dataframe in a laconic, beautiful style.
___
* movies ([notebook](notebooks/movies/movies.ipynb)/[Datalore](https://datalore.jetbrains.com/view/notebook/89IMYb1zbHZxHfwAta6eKP)) –
In this notebook you can see the basic operations of the Kotlin DataFrame on data from [movielens](https://movielens.org/).
You can take the data from the [link](https://grouplens.org/datasets/movielens/latest/).
___
* netflix ([notebook](notebooks/netflix/netflix.ipynb)/[Datalore](https://datalore.jetbrains.com/view/notebook/xSJ4rx49hcH71pPnFgZBCq)) –
Explore TV shows and movies from Netflix with the powerful Kotlin DataFrame API and beautiful
visualizations from [lets-plot](https://github.com/JetBrains/lets-plot-kotlin).
___
* github ([notebook](notebooks/github/github.ipynb)/[Datalore](https://datalore.jetbrains.com/view/notebook/P9n6jYL4mmY1gx3phz5TsX)) –
This notebook shows the hierarchical dataframes look like and how to work with them.
___
* titanic ([notebook](notebooks/titanic/Titanic.ipynb)/[Datalore](https://datalore.jetbrains.com/view/notebook/B5YeMMONSAR78FgKQ9yJyW)) –
Let's see how the new library will show itself on the famous Titanic dataset.
___
* Financial Analyze of the top-12 German companies ([notebook](notebooks/top_12_german_companies)/[Datalore](https://datalore.jetbrains.com/report/static/KQKedA4jDrKu63O53gEN0z/MDg5pHcGvRdDVQnPLmwjuc)) –
Analyze key financial metrics for several major German companies.
___
* wine ([notebook](notebooks/wine/WineNetWIthKotlinDL.ipynb)/[Datalore](https://datalore.jetbrains.com/view/notebook/aK9vYHH8pCA8H1KbKB5WsI)) –
Wine. Kotlin DataFrame. KotlinDL. What came out of this can be seen in this notebook.
___
* youtube ([notebook](notebooks/youtube/Youtube.ipynb)/[Datalore](https://datalore.jetbrains.com/view/notebook/uXH0VfIM6qrrmwPJnLBi0j)) –
Explore YouTube videos with YouTube REST API and Kotlin DataFrame
___
* imdb sql database ([notebook](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples/blob/master/notebooks/imdb.ipynb)) – In this notebook, we use Kotlin DataFrame and Kandy library to analyze data from [IMDB](https://datasets.imdbws.com/) (SQL dump for the MariaDB database with the name "imdb" could be downloaded by this [link](https://drive.google.com/file/d/10HnOu0Yem2Tkz_34SfvDoHTVqF_8b4N7/view?usp=sharing)).
---
* Feature Overviews [notebook folder](notebooks/feature_overviews)
Overview of new features available a given version
The example notebooks always target the latest stable version of the library.
Notebooks compatible with the latest dev/master version are located in the [dev](notebooks/dev) folder.
These [dev versions](notebooks/dev) are tested by the
[:dataframe-jupyter module](../dataframe-jupyter/src/test/kotlin/org/jetbrains/kotlinx/dataframe/jupyter).
+15
View File
@@ -0,0 +1,15 @@
*.iml
.gradle
/local.properties
/.idea/caches
/.idea/libraries
/.idea/modules.xml
/.idea/workspace.xml
/.idea/navEditor.xml
/.idea/assetWizardSettings.xml
.DS_Store
/build
/captures
.externalNativeBuild
.cxx
local.properties
+15
View File
@@ -0,0 +1,15 @@
# 📱 Android Example
A minimal Android project showcasing integration with **Kotlin DataFrame**.
<p align="center">
<img src="screen.jpg" alt="App screenshot" height="320"/>
</p>
It also includes the [Kotlin DataFrame Compiler Plugin](https://kotlin.github.io/dataframe/compiler-plugin.html).
We recommend using an up-to-date Android Studio for the best experience.
For proper functionality in Android Studio requires version Otter | 2025.2.3 or newer.
[Download Android Example](https://github.com/Kotlin/dataframe/raw/example-projects-archives/android-example.zip)
+1
View File
@@ -0,0 +1 @@
/build
+74
View File
@@ -0,0 +1,74 @@
import org.jetbrains.kotlin.gradle.dsl.JvmTarget
plugins {
alias(libs.plugins.android.application)
alias(libs.plugins.kotlin.compose)
// DataFrame Compiler plugin, matching the Kotlin version
alias(libs.plugins.dataframe)
}
android {
namespace = "com.example.myapplication"
compileSdk = 36
defaultConfig {
applicationId = "com.example.myapplication"
minSdk = 21
targetSdk = 36
versionCode = 1
versionName = "1.0"
testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner"
}
buildTypes {
release {
isMinifyEnabled = true
proguardFiles(
getDefaultProguardFile("proguard-android-optimize.txt"),
"proguard-rules.pro",
)
}
}
compileOptions {
sourceCompatibility = JavaVersion.VERSION_1_8
targetCompatibility = JavaVersion.VERSION_1_8
}
kotlin {
compilerOptions {
jvmTarget = JvmTarget.JVM_1_8
}
}
buildFeatures {
compose = true
}
}
dependencies {
implementation(libs.androidx.core.ktx)
implementation(libs.androidx.lifecycle.runtime.ktx)
implementation(libs.androidx.activity.compose)
implementation(platform(libs.androidx.compose.bom))
implementation(libs.androidx.ui)
implementation(libs.androidx.ui.graphics)
implementation(libs.androidx.ui.tooling.preview)
implementation(libs.androidx.material3)
testImplementation(libs.junit)
androidTestImplementation(libs.androidx.junit)
androidTestImplementation(libs.androidx.espresso.core)
androidTestImplementation(platform(libs.androidx.compose.bom))
androidTestImplementation(libs.androidx.ui.test.junit4)
debugImplementation(libs.androidx.ui.tooling)
debugImplementation(libs.androidx.ui.test.manifest)
// Core Kotlin DataFrame API, JSON and CSV IO.
// See custom Gradle setup:
// https://kotlin.github.io/dataframe/setupcustomgradle.html
implementation("org.jetbrains.kotlinx:dataframe-core:1.0.0-Beta4")
implementation("org.jetbrains.kotlinx:dataframe-json:1.0.0-Beta4")
implementation("org.jetbrains.kotlinx:dataframe-csv:1.0.0-Beta4")
// You can add any additional IO modules you like, except for 'dataframe-arrow'.
// Apache Arrow is not supported well on Android.
}
@@ -0,0 +1,21 @@
# Add project specific ProGuard rules here.
# You can control the set of applied configuration files using the
# proguardFiles setting in build.gradle.
#
# For more details, see
# http://developer.android.com/guide/developing/tools/proguard.html
# If your project uses WebView with JS, uncomment the following
# and specify the fully qualified class name to the JavaScript interface
# class:
#-keepclassmembers class fqcn.of.javascript.interface.for.webview {
# public *;
#}
# Uncomment this to preserve the line number information for
# debugging stack traces.
#-keepattributes SourceFile,LineNumberTable
# If you keep the line number information, uncomment this to
# hide the original source file name.
#-renamesourcefileattribute SourceFile
@@ -0,0 +1,24 @@
package com.example.myapplication
import androidx.test.platform.app.InstrumentationRegistry
import androidx.test.ext.junit.runners.AndroidJUnit4
import org.junit.Test
import org.junit.runner.RunWith
import org.junit.Assert.*
/**
* Instrumented test, which will execute on an Android device.
*
* See [testing documentation](http://d.android.com/tools/testing).
*/
@RunWith(AndroidJUnit4::class)
class ExampleInstrumentedTest {
@Test
fun useAppContext() {
// Context of the app under test.
val appContext = InstrumentationRegistry.getInstrumentation().targetContext
assertEquals("com.example.myapplication", appContext.packageName)
}
}
@@ -0,0 +1,27 @@
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools" >
<application
android:allowBackup="true"
android:dataExtractionRules="@xml/data_extraction_rules"
android:fullBackupContent="@xml/backup_rules"
android:icon="@mipmap/ic_launcher"
android:label="@string/app_name"
android:roundIcon="@mipmap/ic_launcher_round"
android:supportsRtl="true"
android:theme="@style/Theme.MyApplication" >
<activity
android:name=".MainActivity"
android:exported="true"
android:label="@string/app_name"
android:theme="@style/Theme.MyApplication" >
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>
</activity>
</application>
</manifest>
@@ -0,0 +1,148 @@
package com.example.myapplication
import android.os.Bundle
import androidx.activity.ComponentActivity
import androidx.activity.compose.setContent
import androidx.activity.enableEdgeToEdge
import androidx.compose.foundation.background
import androidx.compose.foundation.layout.Column
import androidx.compose.foundation.layout.Row
import androidx.compose.foundation.layout.Spacer
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.foundation.layout.height
import androidx.compose.foundation.layout.padding
import androidx.compose.foundation.lazy.LazyColumn
import androidx.compose.foundation.lazy.items
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.Surface
import androidx.compose.material3.Text
import androidx.compose.runtime.Composable
import androidx.compose.runtime.remember
import androidx.compose.ui.Modifier
import androidx.compose.ui.graphics.Color
import androidx.compose.ui.tooling.preview.Preview
import androidx.compose.ui.unit.dp
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
import org.jetbrains.kotlinx.dataframe.api.cast
import org.jetbrains.kotlinx.dataframe.api.dataFrameOf
import org.jetbrains.kotlinx.dataframe.api.filter
import org.jetbrains.kotlinx.dataframe.api.rows
@DataSchema
data class Person(
val age: Int,
val name: String
)
class MainActivity : ComponentActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
enableEdgeToEdge()
val df = dataFrameOf(
"name" to listOf("Andrei", "Nikita", "Jolan"),
"age" to listOf(22, 16, 37)
).cast<Person>()
setContent {
MaterialTheme {
Surface(modifier = Modifier.fillMaxSize()) {
DataFrameScreen(df)
}
}
}
}
}
@Preview(showBackground = true)
@Composable
fun DefaultDataFrameScreenPreview() {
val df = dataFrameOf(
"name" to listOf("Andrei", "Nikita", "Jolan"),
"age" to listOf(22, 16, 37)
).cast<Person>()
DataFrameScreen(df)
}
@Composable
fun DataFrameScreen(df: DataFrame<Person>) {
val filtered = remember(df) { df.filter { age >= 20 } }
Column(
modifier = Modifier
.fillMaxSize()
.padding(top = 48.dp, start = 16.dp, end = 16.dp)
) {
Text(
text = "Kotlin DataFrame on Android",
style = MaterialTheme.typography.headlineSmall,
modifier = Modifier.padding(bottom = 16.dp)
)
Text(
text = "df",
modifier = Modifier
.background(color = Color.LightGray)
.padding(2.dp)
)
DataFrameTable(df)
Text(
text = "df.filter { age >= 20 }",
modifier = Modifier
.background(color = Color.LightGray)
.padding(2.dp)
)
DataFrameTable(filtered)
}
}
@Preview(showBackground = true)
@Composable
fun DefaultDataFrameTablePreview() {
val df = dataFrameOf(
"name" to listOf("Andrei", "Nikita", "Jolan"),
"age" to listOf(22, 16, 37)
).cast<Person>()
DataFrameTable(df)
}
@Composable
fun DataFrameTable(df: DataFrame<*>) {
val columnNames = remember(df) { df.columnNames() }
val rows = remember(df) { df.rows().toList() }
LazyColumn {
item {
// Header
Row {
for (name in columnNames) {
Text(
text = name,
modifier = Modifier
.weight(1f)
.padding(4.dp),
style = MaterialTheme.typography.labelLarge
)
}
}
Spacer(Modifier.height(4.dp))
}
// Rows
items(rows) { row ->
Row {
for (cell in row.values()) {
Text(
text = cell.toString(),
modifier = Modifier
.weight(1f)
.padding(4.dp)
)
}
}
}
}
}
@@ -0,0 +1,170 @@
<?xml version="1.0" encoding="utf-8"?>
<vector xmlns:android="http://schemas.android.com/apk/res/android"
android:width="108dp"
android:height="108dp"
android:viewportWidth="108"
android:viewportHeight="108">
<path
android:fillColor="#3DDC84"
android:pathData="M0,0h108v108h-108z" />
<path
android:fillColor="#00000000"
android:pathData="M9,0L9,108"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M19,0L19,108"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M29,0L29,108"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M39,0L39,108"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M49,0L49,108"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M59,0L59,108"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M69,0L69,108"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M79,0L79,108"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M89,0L89,108"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M99,0L99,108"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M0,9L108,9"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M0,19L108,19"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M0,29L108,29"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M0,39L108,39"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M0,49L108,49"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M0,59L108,59"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M0,69L108,69"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M0,79L108,79"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M0,89L108,89"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M0,99L108,99"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M19,29L89,29"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M19,39L89,39"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M19,49L89,49"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M19,59L89,59"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M19,69L89,69"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M19,79L89,79"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M29,19L29,89"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M39,19L39,89"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M49,19L49,89"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M59,19L59,89"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M69,19L69,89"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
<path
android:fillColor="#00000000"
android:pathData="M79,19L79,89"
android:strokeWidth="0.8"
android:strokeColor="#33FFFFFF" />
</vector>
@@ -0,0 +1,30 @@
<vector xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:aapt="http://schemas.android.com/aapt"
android:width="108dp"
android:height="108dp"
android:viewportWidth="108"
android:viewportHeight="108">
<path android:pathData="M31,63.928c0,0 6.4,-11 12.1,-13.1c7.2,-2.6 26,-1.4 26,-1.4l38.1,38.1L107,108.928l-32,-1L31,63.928z">
<aapt:attr name="android:fillColor">
<gradient
android:endX="85.84757"
android:endY="92.4963"
android:startX="42.9492"
android:startY="49.59793"
android:type="linear">
<item
android:color="#44000000"
android:offset="0.0" />
<item
android:color="#00000000"
android:offset="1.0" />
</gradient>
</aapt:attr>
</path>
<path
android:fillColor="#FFFFFF"
android:fillType="nonZero"
android:pathData="M65.3,45.828l3.8,-6.6c0.2,-0.4 0.1,-0.9 -0.3,-1.1c-0.4,-0.2 -0.9,-0.1 -1.1,0.3l-3.9,6.7c-6.3,-2.8 -13.4,-2.8 -19.7,0l-3.9,-6.7c-0.2,-0.4 -0.7,-0.5 -1.1,-0.3C38.8,38.328 38.7,38.828 38.9,39.228l3.8,6.6C36.2,49.428 31.7,56.028 31,63.928h46C76.3,56.028 71.8,49.428 65.3,45.828zM43.4,57.328c-0.8,0 -1.5,-0.5 -1.8,-1.2c-0.3,-0.7 -0.1,-1.5 0.4,-2.1c0.5,-0.5 1.4,-0.7 2.1,-0.4c0.7,0.3 1.2,1 1.2,1.8C45.3,56.528 44.5,57.328 43.4,57.328L43.4,57.328zM64.6,57.328c-0.8,0 -1.5,-0.5 -1.8,-1.2s-0.1,-1.5 0.4,-2.1c0.5,-0.5 1.4,-0.7 2.1,-0.4c0.7,0.3 1.2,1 1.2,1.8C66.5,56.528 65.6,57.328 64.6,57.328L64.6,57.328z"
android:strokeWidth="1"
android:strokeColor="#00000000" />
</vector>
@@ -0,0 +1,6 @@
<?xml version="1.0" encoding="utf-8"?>
<adaptive-icon xmlns:android="http://schemas.android.com/apk/res/android">
<background android:drawable="@drawable/ic_launcher_background" />
<foreground android:drawable="@drawable/ic_launcher_foreground" />
<monochrome android:drawable="@drawable/ic_launcher_foreground" />
</adaptive-icon>
@@ -0,0 +1,6 @@
<?xml version="1.0" encoding="utf-8"?>
<adaptive-icon xmlns:android="http://schemas.android.com/apk/res/android">
<background android:drawable="@drawable/ic_launcher_background" />
<foreground android:drawable="@drawable/ic_launcher_foreground" />
<monochrome android:drawable="@drawable/ic_launcher_foreground" />
</adaptive-icon>
Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 982 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.6 KiB

@@ -0,0 +1,10 @@
<?xml version="1.0" encoding="utf-8"?>
<resources>
<color name="purple_200">#FFBB86FC</color>
<color name="purple_500">#FF6200EE</color>
<color name="purple_700">#FF3700B3</color>
<color name="teal_200">#FF03DAC5</color>
<color name="teal_700">#FF018786</color>
<color name="black">#FF000000</color>
<color name="white">#FFFFFFFF</color>
</resources>
@@ -0,0 +1,3 @@
<resources>
<string name="app_name">Kotlin Dataframe Simple App</string>
</resources>
@@ -0,0 +1,4 @@
<?xml version="1.0" encoding="utf-8"?>
<resources>
<style name="Theme.MyApplication" parent="android:Theme.Material.Light.NoActionBar" />
</resources>
@@ -0,0 +1,13 @@
<?xml version="1.0" encoding="utf-8"?><!--
Sample backup rules file; uncomment and customize as necessary.
See https://developer.android.com/guide/topics/data/autobackup
for details.
Note: This file is ignored for devices older than API 31
See https://developer.android.com/about/versions/12/backup-restore
-->
<full-backup-content>
<!--
<include domain="sharedpref" path="."/>
<exclude domain="sharedpref" path="device.xml"/>
-->
</full-backup-content>
@@ -0,0 +1,19 @@
<?xml version="1.0" encoding="utf-8"?><!--
Sample data extraction rules file; uncomment and customize as necessary.
See https://developer.android.com/about/versions/12/backup-restore#xml-changes
for details.
-->
<data-extraction-rules>
<cloud-backup>
<!-- TODO: Use <include> and <exclude> to control what is backed up.
<include .../>
<exclude .../>
-->
</cloud-backup>
<!--
<device-transfer>
<include .../>
<exclude .../>
</device-transfer>
-->
</data-extraction-rules>
@@ -0,0 +1,17 @@
package com.example.myapplication
import org.junit.Test
import org.junit.Assert.*
/**
* Example local unit test, which will execute on the development machine (host).
*
* See [testing documentation](http://d.android.com/tools/testing).
*/
class ExampleUnitTest {
@Test
fun addition_isCorrect() {
assertEquals(4, 2 + 2)
}
}
+6
View File
@@ -0,0 +1,6 @@
// Top-level build file where you can add configuration options common to all sub-projects/modules.
plugins {
alias(libs.plugins.android.application) apply false
alias(libs.plugins.kotlin.android) apply false
alias(libs.plugins.kotlin.compose) apply false
}
+25
View File
@@ -0,0 +1,25 @@
# Project-wide Gradle settings.
# IDE (e.g. Android Studio) users:
# Gradle settings configured through the IDE *will override*
# any settings specified in this file.
# For more details on how to configure your build environment visit
# http://www.gradle.org/docs/current/userguide/build_environment.html
# Specifies the JVM arguments used for the daemon process.
# The setting is particularly useful for tweaking memory settings.
org.gradle.jvmargs=-Xmx2048m -Dfile.encoding=UTF-8
# When configured, Gradle will run in incubating parallel mode.
# This option should only be used with decoupled projects. For more details, visit
# https://developer.android.com/r/tools/gradle-multi-project-decoupled-projects
# org.gradle.parallel=true
# AndroidX package structure to make it clearer which packages are bundled with the
# Android operating system, and which are packaged with your app's APK
# https://developer.android.com/topic/libraries/support-library/androidx-rn
android.useAndroidX=true
# Kotlin code style for this project: "official" or "obsolete":
kotlin.code.style=official
# Enables namespacing of each library's R class so that its R class includes only the
# resources declared in the library itself and none from the library's dependencies,
# thereby reducing the size of the R class for that library
android.nonTransitiveRClass=true
kotlin.incremental=false
@@ -0,0 +1,33 @@
[versions]
agp = "9.0.0"
kotlin = "2.3.0-RC2"
coreKtx = "1.10.1"
junit = "4.13.2"
junitVersion = "1.1.5"
espressoCore = "3.5.1"
lifecycleRuntimeKtx = "2.6.1"
activityCompose = "1.8.0"
composeBom = "2024.09.00"
[libraries]
androidx-core-ktx = { group = "androidx.core", name = "core-ktx", version.ref = "coreKtx" }
junit = { group = "junit", name = "junit", version.ref = "junit" }
androidx-junit = { group = "androidx.test.ext", name = "junit", version.ref = "junitVersion" }
androidx-espresso-core = { group = "androidx.test.espresso", name = "espresso-core", version.ref = "espressoCore" }
androidx-lifecycle-runtime-ktx = { group = "androidx.lifecycle", name = "lifecycle-runtime-ktx", version.ref = "lifecycleRuntimeKtx" }
androidx-activity-compose = { group = "androidx.activity", name = "activity-compose", version.ref = "activityCompose" }
androidx-compose-bom = { group = "androidx.compose", name = "compose-bom", version.ref = "composeBom" }
androidx-ui = { group = "androidx.compose.ui", name = "ui" }
androidx-ui-graphics = { group = "androidx.compose.ui", name = "ui-graphics" }
androidx-ui-tooling = { group = "androidx.compose.ui", name = "ui-tooling" }
androidx-ui-tooling-preview = { group = "androidx.compose.ui", name = "ui-tooling-preview" }
androidx-ui-test-manifest = { group = "androidx.compose.ui", name = "ui-test-manifest" }
androidx-ui-test-junit4 = { group = "androidx.compose.ui", name = "ui-test-junit4" }
androidx-material3 = { group = "androidx.compose.material3", name = "material3" }
[plugins]
android-application = { id = "com.android.application", version.ref = "agp" }
kotlin-android = { id = "org.jetbrains.kotlin.android", version.ref = "kotlin" }
kotlin-compose = { id = "org.jetbrains.kotlin.plugin.compose", version.ref = "kotlin" }
dataframe = { id = "org.jetbrains.kotlin.plugin.dataframe", version.ref = "kotlin" }
@@ -0,0 +1,7 @@
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-9.1.0-bin.zip
networkTimeout=10000
validateDistributionUrl=true
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
+185
View File
@@ -0,0 +1,185 @@
#!/usr/bin/env sh
#
# Copyright 2015 the original author or authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
##############################################################################
##
## Gradle start up script for UN*X
##
##############################################################################
# Attempt to set APP_HOME
# Resolve links: $0 may be a link
PRG="$0"
# Need this for relative symlinks.
while [ -h "$PRG" ] ; do
ls=`ls -ld "$PRG"`
link=`expr "$ls" : '.*-> \(.*\)$'`
if expr "$link" : '/.*' > /dev/null; then
PRG="$link"
else
PRG=`dirname "$PRG"`"/$link"
fi
done
SAVED="`pwd`"
cd "`dirname \"$PRG\"`/" >/dev/null
APP_HOME="`pwd -P`"
cd "$SAVED" >/dev/null
APP_NAME="Gradle"
APP_BASE_NAME=`basename "$0"`
# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
DEFAULT_JVM_OPTS='"-Xmx64m" "-Xms64m"'
# Use the maximum available, or set MAX_FD != -1 to use that value.
MAX_FD="maximum"
warn () {
echo "$*"
}
die () {
echo
echo "$*"
echo
exit 1
}
# OS specific support (must be 'true' or 'false').
cygwin=false
msys=false
darwin=false
nonstop=false
case "`uname`" in
CYGWIN* )
cygwin=true
;;
Darwin* )
darwin=true
;;
MINGW* )
msys=true
;;
NONSTOP* )
nonstop=true
;;
esac
CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar
# Determine the Java command to use to start the JVM.
if [ -n "$JAVA_HOME" ] ; then
if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
# IBM's JDK on AIX uses strange locations for the executables
JAVACMD="$JAVA_HOME/jre/sh/java"
else
JAVACMD="$JAVA_HOME/bin/java"
fi
if [ ! -x "$JAVACMD" ] ; then
die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME
Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi
else
JAVACMD="java"
which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi
# Increase the maximum file descriptors if we can.
if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then
MAX_FD_LIMIT=`ulimit -H -n`
if [ $? -eq 0 ] ; then
if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
MAX_FD="$MAX_FD_LIMIT"
fi
ulimit -n $MAX_FD
if [ $? -ne 0 ] ; then
warn "Could not set maximum file descriptor limit: $MAX_FD"
fi
else
warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
fi
fi
# For Darwin, add options to specify how the application appears in the dock
if $darwin; then
GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
fi
# For Cygwin or MSYS, switch paths to Windows format before running java
if [ "$cygwin" = "true" -o "$msys" = "true" ] ; then
APP_HOME=`cygpath --path --mixed "$APP_HOME"`
CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`
JAVACMD=`cygpath --unix "$JAVACMD"`
# We build the pattern for arguments to be converted via cygpath
ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
SEP=""
for dir in $ROOTDIRSRAW ; do
ROOTDIRS="$ROOTDIRS$SEP$dir"
SEP="|"
done
OURCYGPATTERN="(^($ROOTDIRS))"
# Add a user-defined pattern to the cygpath arguments
if [ "$GRADLE_CYGPATTERN" != "" ] ; then
OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
fi
# Now convert the arguments - kludge to limit ourselves to /bin/sh
i=0
for arg in "$@" ; do
CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option
if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition
eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
else
eval `echo args$i`="\"$arg\""
fi
i=`expr $i + 1`
done
case $i in
0) set -- ;;
1) set -- "$args0" ;;
2) set -- "$args0" "$args1" ;;
3) set -- "$args0" "$args1" "$args2" ;;
4) set -- "$args0" "$args1" "$args2" "$args3" ;;
5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
esac
fi
# Escape application args
save () {
for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done
echo " "
}
APP_ARGS=`save "$@"`
# Collect all arguments for the java command, following the shell quoting and substitution rules
eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS"
exec "$JAVACMD" "$@"
+89
View File
@@ -0,0 +1,89 @@
@rem
@rem Copyright 2015 the original author or authors.
@rem
@rem Licensed under the Apache License, Version 2.0 (the "License");
@rem you may not use this file except in compliance with the License.
@rem You may obtain a copy of the License at
@rem
@rem https://www.apache.org/licenses/LICENSE-2.0
@rem
@rem Unless required by applicable law or agreed to in writing, software
@rem distributed under the License is distributed on an "AS IS" BASIS,
@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@rem See the License for the specific language governing permissions and
@rem limitations under the License.
@rem
@if "%DEBUG%" == "" @echo off
@rem ##########################################################################
@rem
@rem Gradle startup script for Windows
@rem
@rem ##########################################################################
@rem Set local scope for the variables with windows NT shell
if "%OS%"=="Windows_NT" setlocal
set DIRNAME=%~dp0
if "%DIRNAME%" == "" set DIRNAME=.
set APP_BASE_NAME=%~n0
set APP_HOME=%DIRNAME%
@rem Resolve any "." and ".." in APP_HOME to make it shorter.
for %%i in ("%APP_HOME%") do set APP_HOME=%%~fi
@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
set DEFAULT_JVM_OPTS="-Xmx64m" "-Xms64m"
@rem Find java.exe
if defined JAVA_HOME goto findJavaFromJavaHome
set JAVA_EXE=java.exe
%JAVA_EXE% -version >NUL 2>&1
if "%ERRORLEVEL%" == "0" goto execute
echo.
echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
echo.
echo Please set the JAVA_HOME variable in your environment to match the
echo location of your Java installation.
goto fail
:findJavaFromJavaHome
set JAVA_HOME=%JAVA_HOME:"=%
set JAVA_EXE=%JAVA_HOME%/bin/java.exe
if exist "%JAVA_EXE%" goto execute
echo.
echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME%
echo.
echo Please set the JAVA_HOME variable in your environment to match the
echo location of your Java installation.
goto fail
:execute
@rem Setup the command line
set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar
@rem Execute Gradle
"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %*
:end
@rem End local scope for the variables with windows NT shell
if "%ERRORLEVEL%"=="0" goto mainEnd
:fail
rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
rem the _cmd.exe /c_ return code!
if not "" == "%GRADLE_EXIT_CONSOLE%" exit 1
exit /b 1
:mainEnd
if "%OS%"=="Windows_NT" endlocal
:omega
Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

+23
View File
@@ -0,0 +1,23 @@
pluginManagement {
repositories {
google {
content {
includeGroupByRegex("com\\.android.*")
includeGroupByRegex("com\\.google.*")
includeGroupByRegex("androidx.*")
}
}
mavenCentral()
gradlePluginPortal()
}
}
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
repositories {
google()
mavenCentral()
}
}
rootProject.name = "android-example"
include(":app")
+69
View File
@@ -0,0 +1,69 @@
import org.jetbrains.kotlin.gradle.dsl.JvmTarget
import org.jetbrains.kotlinx.dataframe.api.JsonPath
plugins {
application
kotlin("jvm")
id("org.jetbrains.kotlinx.dataframe")
// only mandatory if `kotlin.dataframe.add.ksp=false` in gradle.properties
id("com.google.devtools.ksp")
}
repositories {
mavenLocal() // in case of local dataframe development
mavenCentral()
}
dependencies {
// implementation("org.jetbrains.kotlinx:dataframe:X.Y.Z")
implementation(project(":"))
// explicitly depend on openApi
implementation(projects.dataframeOpenapi)
}
kotlin {
compilerOptions {
jvmTarget = JvmTarget.JVM_1_8
freeCompilerArgs.add("-Xjdk-release=8")
}
}
tasks.withType<JavaCompile> {
sourceCompatibility = JavaVersion.VERSION_1_8.toString()
targetCompatibility = JavaVersion.VERSION_1_8.toString()
options.release.set(8)
}
dataframes {
// Metrics, no key-value paths
schema {
data = "src/main/resources/apiGuruMetrics.json"
name = "org.jetbrains.kotlinx.dataframe.examples.openapi.gradle.noOpenApi.MetricsNoKeyValue"
}
// Metrics, with key-value paths
schema {
data = "src/main/resources/apiGuruMetrics.json"
name = "org.jetbrains.kotlinx.dataframe.examples.openapi.gradle.noOpenApi.MetricsKeyValue"
jsonOptions {
keyValuePaths = listOf(
JsonPath()
.append("datasets")
.appendArrayWithWildcard()
.append("data"),
)
}
}
// ApiGuru, OpenApi
schema {
data = "src/main/resources/ApiGuruOpenApi.yaml"
// name is still needed to get the full path
name = "org.jetbrains.kotlinx.dataframe.examples.openapi.ApiGuruOpenApiGradle"
}
enableExperimentalOpenApi = true
}
@@ -0,0 +1,77 @@
@file:ImportDataSchema(
// Using just a sample since the full file will cause OOM errors
path = "src/main/resources/ApiGuruSample.json",
name = "APIsNoKeyValue",
enableExperimentalOpenApi = true,
)
@file:ImportDataSchema(
// Now we can use the full file; either a URL or a local path
path = "src/main/resources/api_guru_list.json",
name = "APIsKeyValue",
jsonOptions = JsonOptions(
// paths in the json that should be converted to KeyValue columns
keyValuePaths = ["""$""", """$[*]["versions"]"""],
),
enableExperimentalOpenApi = true,
)
package org.jetbrains.kotlinx.dataframe.examples.openapi
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
import org.jetbrains.kotlinx.dataframe.annotations.JsonOptions
import org.jetbrains.kotlinx.dataframe.api.first
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.examples.openapi.gradle.noOpenApi.MetricsKeyValue
import org.jetbrains.kotlinx.dataframe.examples.openapi.gradle.noOpenApi.MetricsNoKeyValue
/**
* In this file we'll demonstrate how to use the jsonOption `keyValuePaths`
* both using the Gradle- and KSP plugin and what it does.
*/
fun main() {
gradleNoKeyValue()
gradleKeyValue()
kspNoKeyValue()
kspKeyValue()
}
/**
* Gradle example of reading a JSON file with no key-value pairs.
* Ctrl+Click on [MetricsNoKeyValue] to see the generated code.
*/
private fun gradleNoKeyValue() {
val df = MetricsNoKeyValue.readJson("examples/idea-examples/json/src/main/resources/apiGuruMetrics.json")
df.print(columnTypes = true, title = true, borders = true)
}
/**
* Gradle example of reading a JSON file with key-value pairs.
* Ctrl+Click on [MetricsKeyValue] to see the generated code.
*/
private fun gradleKeyValue() {
val df = MetricsKeyValue.readJson("examples/idea-examples/json/src/main/resources/apiGuruMetrics.json")
df.print(columnTypes = true, title = true, borders = true)
}
/**
* KSP example of reading a JSON file with no key-value pairs.
* Ctrl+Click on [APIsNoKeyValue] to see the generated code.
*
* Note the many generated interfaces. You can imagine larger files crashing the code generator.
*/
private fun kspNoKeyValue() {
val df = APIsNoKeyValue.readJson("examples/idea-examples/json/src/main/resources/ApiGuruSample.json")
df.print(columnTypes = true, title = true, borders = true)
}
/**
* KSP example of reading a JSON file with key-value pairs.
* Ctrl+Click on [APIsKeyValue] to see the generated code.
*/
private fun kspKeyValue() {
val df = APIsKeyValue.readJson("examples/idea-examples/json/src/main/resources/ApiGuruSample.json")
.value.first()
df.print(columnTypes = true, title = true, borders = true)
}
@@ -0,0 +1,61 @@
@file:ImportDataSchema(
path = "src/main/resources/ApiGuruOpenApi.yaml",
name = "ApiGuruOpenApiKsp",
enableExperimentalOpenApi = true,
)
@file:ImportDataSchema(
path = "https://raw.githubusercontent.com/1Password/connect/aac5e44b27570036e6b56e9f5b2a398a824ae5fc/docs/openapi/spec.yaml",
name = "OnePassword",
enableExperimentalOpenApi = true,
)
package org.jetbrains.kotlinx.dataframe.examples.openapi
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
import org.jetbrains.kotlinx.dataframe.api.any
import org.jetbrains.kotlinx.dataframe.api.filter
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.value
/**
* In this file we'll demonstrate how to use OpenApi schemas
* to generate DataSchemas and how to use them.
*/
fun main() {
gradle()
ksp()
}
/**
* Gradle example of reading JSON files with OpenApi schemas.
* Ctrl+Click on [GradleAPIs] or [GradleMetrics] to see the generated code.
*
* (We use import aliases to avoid clashes with the KSP example)
*/
private fun gradle() {
val apis = ApiGuruOpenApiGradle.APIs.readJson("examples/idea-examples/json/src/main/resources/ApiGuruSample.json")
apis.print(columnTypes = true, title = true, borders = true)
apis.filter {
value.versions.value.any {
(it.updated ?: it.added).year >= 2021
}
}
val metrics =
ApiGuruOpenApiGradle.Metrics.readJson("examples/idea-examples/json/src/main/resources/apiGuruMetrics.json")
metrics.print(columnTypes = true, title = true, borders = true)
}
/**
* KSP example of reading JSON files with OpenApi schemas.
* Ctrl+Click on [APIs] or [Metrics] to see the generated code.
*/
private fun ksp() {
val apis = ApiGuruOpenApiKsp.APIs.readJson("examples/idea-examples/json/src/main/resources/ApiGuruSample.json")
apis.print(columnTypes = true, title = true, borders = true)
val metrics =
ApiGuruOpenApiKsp.Metrics.readJson("examples/idea-examples/json/src/main/resources/apiGuruMetrics.json")
metrics.print(columnTypes = true, title = true, borders = true)
}
@@ -0,0 +1,304 @@
# DEMO for DataFrame, this might differ from the actual API (it's updated a bit)
openapi: 3.0.0
info:
version: 2.0.2
title: APIs.guru
description: >
Wikipedia for Web APIs. Repository of API specs in OpenAPI format.
**Warning**: If you want to be notified about changes in advance please join our [Slack channel](https://join.slack.com/t/mermade/shared_invite/zt-g78g7xir-MLE_CTCcXCdfJfG3CJe9qA).
Client sample: [[Demo]](https://apis.guru/simple-ui) [[Repo]](https://github.com/APIs-guru/simple-ui)
contact:
name: APIs.guru
url: https://APIs.guru
email: mike.ralphson@gmail.com
license:
name: CC0 1.0
url: https://github.com/APIs-guru/openapi-directory#licenses
x-logo:
url: https://apis.guru/branding/logo_vertical.svg
externalDocs:
url: https://github.com/APIs-guru/openapi-directory/blob/master/API.md
security: [ ]
tags:
- name: APIs
description: Actions relating to APIs in the collection
paths:
/list.json:
get:
operationId: listAPIs
tags:
- APIs
summary: List all APIs
description: >
List all APIs in the directory.
Returns links to OpenAPI specification for each API in the directory.
If API exist in multiple versions `preferred` one is explicitly marked.
Some basic info from OpenAPI spec is cached inside each object.
This allows to generate some simple views without need to fetch OpenAPI spec for each API.
responses:
"200":
description: OK
content:
application/json; charset=utf-8:
schema:
$ref: "#/components/schemas/APIs"
application/json:
schema:
$ref: "#/components/schemas/APIs"
/metrics.json:
get:
operationId: getMetrics
summary: Get basic metrics
description: >
Some basic metrics for the entire directory.
Just stunning numbers to put on a front page and are intended purely for WoW effect :)
tags:
- APIs
responses:
"200":
description: OK
content:
application/json; charset=utf-8:
schema:
$ref: "#/components/schemas/Metrics"
application/json:
schema:
$ref: "#/components/schemas/Metrics"
components:
schemas:
APIs:
description: |
List of API details.
It is a JSON object with API IDs(`<provider>[:<service>]`) as keys.
type: object
additionalProperties:
$ref: "#/components/schemas/API"
minProperties: 1
example:
googleapis.com:drive:
added: 2015-02-22T20:00:45.000Z
preferred: v3
versions:
v2:
added: 2015-02-22T20:00:45.000Z
info:
title: Drive
version: v2
x-apiClientRegistration:
url: https://console.developers.google.com
x-logo:
url: https://api.apis.guru/v2/cache/logo/https_www.gstatic.com_images_icons_material_product_2x_drive_32dp.png
x-origin:
format: google
url: https://www.googleapis.com/discovery/v1/apis/drive/v2/rest
version: v1
x-preferred: false
x-providerName: googleapis.com
x-serviceName: drive
swaggerUrl: https://api.apis.guru/v2/specs/googleapis.com/drive/v2/swagger.json
swaggerYamlUrl: https://api.apis.guru/v2/specs/googleapis.com/drive/v2/swagger.yaml
updated: 2016-06-17T00:21:44.000Z
v3:
added: 2015-12-12T00:25:13.000Z
info:
title: Drive
version: v3
x-apiClientRegistration:
url: https://console.developers.google.com
x-logo:
url: https://api.apis.guru/v2/cache/logo/https_www.gstatic.com_images_icons_material_product_2x_drive_32dp.png
x-origin:
format: google
url: https://www.googleapis.com/discovery/v1/apis/drive/v3/rest
version: v1
x-preferred: true
x-providerName: googleapis.com
x-serviceName: drive
swaggerUrl: https://api.apis.guru/v2/specs/googleapis.com/drive/v3/swagger.json
swaggerYamlUrl: https://api.apis.guru/v2/specs/googleapis.com/drive/v3/swagger.yaml
updated: 2016-06-17T00:21:44.000Z
API:
description: Meta information about API
type: object
required:
- added
- preferred
- versions
properties:
added:
description: Timestamp when the API was first added to the directory
type: string
format: date-time
preferred:
description: Recommended version
type: string
versions:
description: List of supported versions of the API
type: object
additionalProperties:
$ref: "#/components/schemas/ApiVersion"
minProperties: 1
additionalProperties: false
ApiVersion:
type: object
required:
- added
# - updated apparently not required!
- swaggerUrl
- swaggerYamlUrl
- info
- openapiVer
properties:
added:
description: Timestamp when the version was added
type: string
format: date-time
updated: # apparently not required!
description: Timestamp when the version was updated
type: string
format: date-time
swaggerUrl:
description: URL to OpenAPI definition in JSON format
type: string
format: url
swaggerYamlUrl:
description: URL to OpenAPI definition in YAML format
type: string
format: url
info:
description: Copy of `info` section from OpenAPI definition
type: object
minProperties: 1
externalDocs:
description: Copy of `externalDocs` section from OpenAPI definition
type: object
minProperties: 1
openapiVer:
description: OpenAPI version
type: string
additionalProperties: false
Metrics:
description: List of basic metrics
type: object
required:
- numSpecs
- numAPIs
- numEndpoints
- unreachable
- invalid
- unofficial
- fixes
- fixedPct
- datasets
- stars
- issues
- thisWeek
properties:
numSpecs:
description: Number of API specifications including different versions of the
same API
type: integer
minimum: 1
numAPIs:
description: Number of APIs
type: integer
minimum: 1
numEndpoints:
description: Total number of endpoints inside all specifications
type: integer
minimum: 1
unreachable:
description: Number of unreachable specifications
type: integer
minimum: 0
invalid:
description: Number of invalid specifications
type: integer
minimum: 0
unofficial:
description: Number of unofficial specifications
type: integer
minimum: 0
fixes:
description: Number of fixes applied to specifications
type: integer
minimum: 0
fixedPct:
description: Percentage of fixed specifications
type: number
minimum: 0
maximum: 100
datasets:
description: An overview of the datasets used to gather the APIs
type: array
items:
description: A single metric per dataset
type: object
required:
- title
- data
properties:
title:
description: Title of the metric
type: string
data:
description: Value of the metric per dataset
type: object
additionalProperties:
type: integer
minimum: 0
stars:
description: Number of stars on GitHub
type: integer
minimum: 0
issues:
description: Number of issues on GitHub
type: integer
minimum: 0
thisWeek:
description: Number of new specifications added/updated this week
type: object
required:
- added
- updated
properties:
added:
description: Number of new specifications added this week
type: integer
minimum: 0
updated:
description: Number of specifications updated this week
type: integer
minimum: 0
additionalProperties: false
example:
numSpecs: 1000
numAPIs: 100
numEndpoints: 10000
unreachable: 10
invalid: 10
unofficial: 10
fixes: 10
fixedPct: 10
datasets:
- title: providerCount
data:
"a.com": 10
"b.com": 20
"c.com": 30
stars: 1000
issues: 100
thisWeek:
added: 10
updated: 10
@@ -0,0 +1,717 @@
{
"1forge.com": {
"added": "2017-05-30T08:34:14.000Z",
"preferred": "0.0.1",
"versions": {
"0.0.1": {
"added": "2017-05-30T08:34:14.000Z",
"info": {
"contact": {
"email": "contact@1forge.com",
"name": "1Forge",
"url": "http://1forge.com"
},
"description": "Stock and Forex Data and Realtime Quotes",
"title": "1Forge Finance APIs",
"version": "0.0.1",
"x-apisguru-categories": [
"financial"
],
"x-logo": {
"backgroundColor": "#24292e",
"url": "https://api.apis.guru/v2/cache/logo/https_1forge.com_assets_images_f-blue.svg"
},
"x-origin": [
{
"format": "swagger",
"url": "http://1forge.com/openapi.json",
"version": "2.0"
}
],
"x-providerName": "1forge.com"
},
"updated": "2017-06-27T16:49:57.000Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/1forge.com/0.0.1/swagger.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/1forge.com/0.0.1/swagger.yaml",
"openapiVer": "2.0"
}
}
},
"1password.com:events": {
"added": "2021-07-19T10:17:09.188Z",
"preferred": "1.0.0",
"versions": {
"1.0.0": {
"added": "2021-07-19T10:17:09.188Z",
"info": {
"description": "1Password Events API Specification.",
"title": "Events API",
"version": "1.0.0",
"x-apisguru-categories": [
"security"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_upload.wikimedia.org_wikipedia_commons_thumb_e_e3_1password-logo.svg_1280px-1password-logo.svg.png"
},
"x-origin": [
{
"format": "openapi",
"url": "https://i.1password.com/media/1password-events-reporting/1password-events-api.yaml",
"version": "3.0"
}
],
"x-providerName": "1password.com",
"x-serviceName": "events"
},
"updated": "2021-07-22T10:32:52.774Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/1password.com/events/1.0.0/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/1password.com/events/1.0.0/openapi.yaml",
"openapiVer": "3.0.0"
}
}
},
"1password.local:connect": {
"added": "2021-04-16T15:56:45.939Z",
"preferred": "1.3.0",
"versions": {
"1.3.0": {
"added": "2021-04-16T15:56:45.939Z",
"info": {
"contact": {
"email": "support@1password.com",
"name": "1Password Integrations",
"url": "https://support.1password.com/"
},
"description": "REST API interface for 1Password Connect.",
"title": "1Password Connect",
"version": "1.3.0",
"x-apisguru-categories": [
"security"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_upload.wikimedia.org_wikipedia_commons_thumb_e_e3_1password-logo.svg_1280px-1password-logo.svg.png"
},
"x-origin": [
{
"format": "openapi",
"url": "https://i.1password.com/media/1password-connect/1password-connect-api.yaml",
"version": "3.0"
}
],
"x-providerName": "1password.local",
"x-serviceName": "connect"
},
"updated": "2021-07-26T08:51:53.432Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/1password.local/connect/1.3.0/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/1password.local/connect/1.3.0/openapi.yaml",
"openapiVer": "3.0.2"
}
}
},
"6-dot-authentiqio.appspot.com": {
"added": "2017-03-15T14:45:58.000Z",
"preferred": "6",
"versions": {
"6": {
"added": "2017-03-15T14:45:58.000Z",
"info": {
"contact": {
"email": "hello@authentiq.com",
"name": "Authentiq team",
"url": "http://authentiq.io/support"
},
"description": "Strong authentication, without the passwords.",
"license": {
"name": "Apache 2.0",
"url": "http://www.apache.org/licenses/LICENSE-2.0.html"
},
"termsOfService": "http://authentiq.com/terms/",
"title": "Authentiq API",
"version": "6",
"x-apisguru-categories": [
"security"
],
"x-logo": {
"backgroundColor": "#F26641",
"url": "https://api.apis.guru/v2/cache/logo/https_www.authentiq.com_theme_images_authentiq-logo-a-inverse.svg"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/AuthentiqID/authentiq-docs/master/docs/swagger/issuer.yaml",
"version": "3.0"
}
],
"x-providerName": "6-dot-authentiqio.appspot.com"
},
"updated": "2021-06-21T12:16:53.715Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/6-dot-authentiqio.appspot.com/6/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/6-dot-authentiqio.appspot.com/6/openapi.yaml",
"openapiVer": "3.0.0"
}
}
},
"ably.io:platform": {
"added": "2019-07-13T11:28:07.000Z",
"preferred": "1.1.0",
"versions": {
"1.1.0": {
"added": "2019-07-13T11:28:07.000Z",
"info": {
"contact": {
"email": "support@ably.io",
"name": "Ably Support",
"url": "https://www.ably.io/contact",
"x-twitter": "ablyrealtime"
},
"description": "The [REST API specification](https://www.ably.io/documentation/rest-api) for Ably.",
"title": "Platform API",
"version": "1.1.0",
"x-apisguru-categories": [
"cloud"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_twitter.com_ablyrealtime_profile_image"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/ably/open-specs/main/definitions/platform-v1.yaml",
"version": "3.0"
}
],
"x-providerName": "ably.io",
"x-serviceName": "platform"
},
"updated": "2021-07-26T09:42:14.653Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/ably.io/platform/1.1.0/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/ably.io/platform/1.1.0/openapi.yaml",
"openapiVer": "3.0.1"
}
}
},
"ably.net:control": {
"added": "2021-07-26T09:45:31.536Z",
"preferred": "1.0.14",
"versions": {
"1.0.14": {
"added": "2021-07-26T09:45:31.536Z",
"info": {
"contact": {
"x-twitter": "ablyrealtime"
},
"description": "Use the Control API to manage your applications, namespaces, keys, queues, rules, and more.\n\nDetailed information on using this API can be found in the Ably <a href=\"https://ably.com/documentation/control-api\">developer documentation</a>.\n\nControl API is currently in Beta.\n",
"title": "Control API v1",
"version": "1.0.14",
"x-apisguru-categories": [
"cloud"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_twitter.com_ablyrealtime_profile_image"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/ably/open-specs/main/definitions/control-v1.yaml",
"version": "3.0"
}
],
"x-providerName": "ably.net",
"x-serviceName": "control"
},
"updated": "2021-07-26T09:47:48.565Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/ably.net/control/1.0.14/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/ably.net/control/1.0.14/openapi.yaml",
"openapiVer": "3.0.1"
}
}
},
"abstractapi.com:geolocation": {
"added": "2021-04-14T17:12:40.648Z",
"preferred": "1.0.0",
"versions": {
"1.0.0": {
"added": "2021-04-14T17:12:40.648Z",
"info": {
"description": "Abstract IP geolocation API allows developers to retrieve the region, country and city behind any IP worldwide. The API covers the geolocation of IPv4 and IPv6 addresses in 180+ countries worldwide. Extra information can be retrieved like the currency, flag or language associated to an IP.",
"title": "IP geolocation API",
"version": "1.0.0",
"x-apisguru-categories": [
"location"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_global-uploads.webflow.com_5ebbd0a566a3996636e55959_5ec2ba29feeeb05d69160e7b_webclip.png"
},
"x-origin": [
{
"format": "openapi",
"url": "https://documentation.abstractapi.com/ip-geolocation-openapi.json",
"version": "3.0"
}
],
"x-providerName": "abstractapi.com",
"x-serviceName": "geolocation"
},
"externalDocs": {
"description": "API Documentation",
"url": "https://www.abstractapi.com/ip-geolocation-api#docs"
},
"updated": "2021-06-21T12:16:53.715Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/abstractapi.com/geolocation/1.0.0/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/abstractapi.com/geolocation/1.0.0/openapi.yaml",
"openapiVer": "3.0.1"
}
}
},
"adafruit.com": {
"added": "2018-02-10T10:41:43.000Z",
"preferred": "2.0.0",
"versions": {
"2.0.0": {
"added": "2018-02-10T10:41:43.000Z",
"info": {
"description": "### The Internet of Things for Everyone\n\nThe Adafruit IO HTTP API provides access to your Adafruit IO data from any programming language or hardware environment that can speak HTTP. The easiest way to get started is with [an Adafruit IO learn guide](https://learn.adafruit.com/series/adafruit-io-basics) and [a simple Internet of Things capable device like the Feather Huzzah](https://www.adafruit.com/product/2821).\n\nThis API documentation is hosted on GitHub Pages and is available at [https://github.com/adafruit/io-api](https://github.com/adafruit/io-api). For questions or comments visit the [Adafruit IO Forums](https://forums.adafruit.com/viewforum.php?f=56) or the [adafruit-io channel on the Adafruit Discord server](https://discord.gg/adafruit).\n\n#### Authentication\n\nAuthentication for every API request happens through the `X-AIO-Key` header or query parameter and your IO API key. A simple cURL request to get all available feeds for a user with the username \"io_username\" and the key \"io_key_12345\" could look like this:\n\n $ curl -H \"X-AIO-Key: io_key_12345\" https://io.adafruit.com/api/v2/io_username/feeds\n\nOr like this:\n\n $ curl \"https://io.adafruit.com/api/v2/io_username/feeds?X-AIO-Key=io_key_12345\n\nUsing the node.js [request](https://github.com/request/request) library, IO HTTP requests are as easy as:\n\n```js\nvar request = require('request');\n\nvar options = {\n url: 'https://io.adafruit.com/api/v2/io_username/feeds',\n headers: {\n 'X-AIO-Key': 'io_key_12345',\n 'Content-Type': 'application/json'\n }\n};\n\nfunction callback(error, response, body) {\n if (!error && response.statusCode == 200) {\n var feeds = JSON.parse(body);\n console.log(feeds.length + \" FEEDS AVAILABLE\");\n\n feeds.forEach(function (feed) {\n console.log(feed.name, feed.key);\n })\n }\n}\n\nrequest(options, callback);\n```\n\nUsing the ESP8266 Arduino HTTPClient library, an HTTPS GET request would look like this (replacing `---` with your own values in the appropriate locations):\n\n```arduino\n/// based on\n/// https://github.com/esp8266/Arduino/blob/master/libraries/ESP8266HTTPClient/examples/Authorization/Authorization.ino\n\n#include <Arduino.h>\n#include <ESP8266WiFi.h>\n#include <ESP8266WiFiMulti.h>\n#include <ESP8266HTTPClient.h>\n\nESP8266WiFiMulti WiFiMulti;\n\nconst char* ssid = \"---\";\nconst char* password = \"---\";\n\nconst char* host = \"io.adafruit.com\";\n\nconst char* io_key = \"---\";\nconst char* path_with_username = \"/api/v2/---/dashboards\";\n\n// Use web browser to view and copy\n// SHA1 fingerprint of the certificate\nconst char* fingerprint = \"77 00 54 2D DA E7 D8 03 27 31 23 99 EB 27 DB CB A5 4C 57 18\";\n\nvoid setup() {\n Serial.begin(115200);\n\n for(uint8_t t = 4; t > 0; t--) {\n Serial.printf(\"[SETUP] WAIT %d...\\n\", t);\n Serial.flush();\n delay(1000);\n }\n\n WiFi.mode(WIFI_STA);\n WiFiMulti.addAP(ssid, password);\n\n // wait for WiFi connection\n while(WiFiMulti.run() != WL_CONNECTED) {\n Serial.print('.');\n delay(1000);\n }\n\n Serial.println(\"[WIFI] connected!\");\n\n HTTPClient http;\n\n // start request with URL and TLS cert fingerprint for verification\n http.begin(\"https://\" + String(host) + String(path_with_username), fingerprint);\n\n // IO API authentication\n http.addHeader(\"X-AIO-Key\", io_key);\n\n // start connection and send HTTP header\n int httpCode = http.GET();\n\n // httpCode will be negative on error\n if(httpCode > 0) {\n // HTTP header has been send and Server response header has been handled\n Serial.printf(\"[HTTP] GET response: %d\\n\", httpCode);\n\n // HTTP 200 OK\n if(httpCode == HTTP_CODE_OK) {\n String payload = http.getString();\n Serial.println(payload);\n }\n\n http.end();\n }\n}\n\nvoid loop() {}\n```\n\n#### Client Libraries\n\nWe have client libraries to help you get started with your project: [Python](https://github.com/adafruit/io-client-python), [Ruby](https://github.com/adafruit/io-client-ruby), [Arduino C++](https://github.com/adafruit/Adafruit_IO_Arduino), [Javascript](https://github.com/adafruit/adafruit-io-node), and [Go](https://github.com/adafruit/io-client-go) are available. They're all open source, so if they don't already do what you want, you can fork and add any feature you'd like.\n\n",
"title": "Adafruit IO REST API",
"version": "2.0.0",
"x-apisguru-categories": [
"iot"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_twitter.com_adafruit_profile_image.jpeg"
},
"x-origin": [
{
"format": "swagger",
"url": "https://raw.githubusercontent.com/adafruit/io-api/gh-pages/v2.json",
"version": "2.0"
}
],
"x-providerName": "adafruit.com"
},
"updated": "2021-06-21T12:16:53.715Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/adafruit.com/2.0.0/swagger.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/adafruit.com/2.0.0/swagger.yaml",
"openapiVer": "2.0"
}
}
},
"adobe.com:aem": {
"added": "2019-01-03T07:01:34.000Z",
"preferred": "3.5.0-pre.0",
"versions": {
"3.5.0-pre.0": {
"added": "2019-01-03T07:01:34.000Z",
"info": {
"contact": {
"email": "opensource@shinesolutions.com",
"name": "Shine Solutions",
"url": "http://shinesolutions.com",
"x-twitter": "Adobe"
},
"description": "Swagger AEM is an OpenAPI specification for Adobe Experience Manager (AEM) API",
"title": "Adobe Experience Manager (AEM) API",
"version": "3.5.0-pre.0",
"x-apisguru-categories": [
"marketing"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_twitter.com_Adobe_profile_image.jpeg"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/shinesolutions/swagger-aem/master/conf/api.yml",
"version": "3.0"
}
],
"x-providerName": "adobe.com",
"x-serviceName": "aem",
"x-unofficialSpec": true
},
"updated": "2021-06-21T12:16:53.715Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/adobe.com/aem/3.5.0-pre.0/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/adobe.com/aem/3.5.0-pre.0/openapi.yaml",
"openapiVer": "3.0.0"
}
}
},
"adyen.com:AccountService": {
"added": "2020-11-03T12:51:40.318Z",
"preferred": "6",
"versions": {
"6": {
"added": "2020-11-03T12:51:40.318Z",
"info": {
"contact": {
"email": "developer-experience@adyen.com",
"name": "Adyen Developer Experience team",
"url": "https://www.adyen.help/hc/en-us/community/topics",
"x-twitter": "Adyen"
},
"description": "The Account API provides endpoints for managing account-related entities on your platform. These related entities include account holders, accounts, bank accounts, shareholders, and KYC-related documents. The management operations include actions such as creation, retrieval, updating, and deletion of them.\n\nFor more information, refer to our [documentation](https://docs.adyen.com/platforms).\n## Authentication\nTo connect to the Account API, you must use basic authentication credentials of your web service user. If you don't have one, contact the [Adyen Support Team](https://support.adyen.com/hc/en-us/requests/new). Then use its credentials to authenticate your request, for example:\n\n```\ncurl\n-U \"ws@MarketPlace.YourMarketPlace\":\"YourWsPassword\" \\\n-H \"Content-Type: application/json\" \\\n...\n```\nNote that when going live, you need to generate new web service user credentials to access the [live endpoints](https://docs.adyen.com/development-resources/live-endpoints).\n\n## Versioning\nThe Account API supports versioning of its endpoints through a version suffix in the endpoint URL. This suffix has the following format: \"vXX\", where XX is the version number.\n\nFor example:\n```\nhttps://cal-test.adyen.com/cal/services/Account/v6/createAccountHolder\n```",
"termsOfService": "https://www.adyen.com/legal/terms-and-conditions",
"title": "Adyen for Platforms: Account API",
"version": "6",
"x-apisguru-categories": [
"payment"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_twitter.com_Adyen_profile_image.jpeg"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/Adyen/adyen-openapi/master/json/AccountService-v6.json",
"version": "3.1"
}
],
"x-preferred": true,
"x-providerName": "adyen.com",
"x-publicVersion": true,
"x-serviceName": "AccountService"
},
"updated": "2021-11-12T23:18:19.544Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/adyen.com/AccountService/6/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/adyen.com/AccountService/6/openapi.yaml",
"openapiVer": "3.1.0"
}
}
},
"adyen.com:BalancePlatformService": {
"added": "2021-06-14T12:42:12.263Z",
"preferred": "1",
"versions": {
"1": {
"added": "2021-06-14T12:42:12.263Z",
"info": {
"contact": {
"email": "developer-experience@adyen.com",
"name": "Adyen Developer Experience team",
"url": "https://www.adyen.help/hc/en-us/community/topics",
"x-twitter": "Adyen"
},
"description": "The Balance Platform API enables you to create a platform, onboard users as account holders, create balance accounts, and issue cards.\n\nFor information about use cases, refer to [Adyen Issuing](https://docs.adyen.com/issuing).\n\n ## Authentication\nYour Adyen contact will provide your API credential and an API key. To connect to the API, add an `X-API-Key` header with the API key as the value, for example:\n\n ```\ncurl\n-H \"Content-Type: application/json\" \\\n-H \"X-API-Key: YOUR_API_KEY\" \\\n...\n```\n\nAlternatively, you can use the username and password to connect to the API using basic authentication. For example:\n\n```\ncurl\n-H \"Content-Type: application/json\" \\\n-U \"ws@BalancePlatform.YOUR_BALANCE_PLATFORM\":\"YOUR_WS_PASSWORD\" \\\n...\n```\n## Versioning\nBalance Platform API supports versioning of its endpoints through a version suffix in the endpoint URL. This suffix has the following format: \"vXX\", where XX is the version number.\n\nFor example:\n```\nhttps://balanceplatform-api-test.adyen.com/bcl/v1\n```\n## Going live\nWhen going live, your Adyen contact will provide your API credential for the live environment. You can then use the API key or the username and password to send requests to `https://balanceplatform-api-live.adyen.com/bcl/v1`.\n\nFor more information, refer to our [Going live documentation](https://docs.adyen.com/issuing/integration-checklist#going-live).",
"termsOfService": "https://www.adyen.com/legal/terms-and-conditions",
"title": "Issuing: Balance Platform API",
"version": "1",
"x-apisguru-categories": [
"payment"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_adyen.com_.resources_adyen-website_themes_images_apple-icon-180x180.png"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/Adyen/adyen-openapi/master/json/BalancePlatformService-v1.json",
"version": "3.1"
}
],
"x-providerName": "adyen.com",
"x-publicVersion": true,
"x-serviceName": "BalancePlatformService"
},
"updated": "2021-11-22T23:16:57.458Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/adyen.com/BalancePlatformService/1/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/adyen.com/BalancePlatformService/1/openapi.yaml",
"openapiVer": "3.1.0"
}
}
},
"adyen.com:BinLookupService": {
"added": "2020-11-03T12:51:40.318Z",
"preferred": "50",
"versions": {
"50": {
"added": "2020-11-03T12:51:40.318Z",
"info": {
"contact": {
"email": "developer-experience@adyen.com",
"name": "Adyen Developer Experience team",
"url": "https://www.adyen.help/hc/en-us/community/topics",
"x-twitter": "Adyen"
},
"description": "The BIN Lookup API provides endpoints for retrieving information, such as cost estimates, and 3D Secure supported version based on a given BIN.",
"termsOfService": "https://www.adyen.com/legal/terms-and-conditions",
"title": "Adyen BinLookup API",
"version": "50",
"x-apisguru-categories": [
"payment"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_twitter.com_Adyen_profile_image.jpeg"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/Adyen/adyen-openapi/master/json/BinLookupService-v50.json",
"version": "3.1"
}
],
"x-preferred": true,
"x-providerName": "adyen.com",
"x-publicVersion": true,
"x-serviceName": "BinLookupService"
},
"updated": "2021-11-01T23:17:40.475Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/adyen.com/BinLookupService/50/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/adyen.com/BinLookupService/50/openapi.yaml",
"openapiVer": "3.1.0"
}
}
},
"adyen.com:CheckoutService": {
"added": "2021-11-01T23:17:40.475Z",
"preferred": "68",
"versions": {
"68": {
"added": "2021-11-01T23:17:40.475Z",
"info": {
"contact": {
"email": "developer-experience@adyen.com",
"name": "Adyen Developer Experience team",
"url": "https://www.adyen.help/hc/en-us/community/topics",
"x-twitter": "Adyen"
},
"description": "Adyen Checkout API provides a simple and flexible way to initiate and authorise online payments. You can use the same integration for payments made with cards (including 3D Secure), mobile wallets, and local payment methods (for example, iDEAL and Sofort).\n\nThis API reference provides information on available endpoints and how to interact with them. To learn more about the API, visit [Checkout documentation](https://docs.adyen.com/online-payments).\n\n## Authentication\nEach request to the Checkout API must be signed with an API key. For this, obtain an API Key from your Customer Area, as described in [How to get the API key](https://docs.adyen.com/development-resources/api-credentials#generate-api-key). Then set this key to the `X-API-Key` header value, for example:\n\n```\ncurl\n-H \"Content-Type: application/json\" \\\n-H \"X-API-Key: Your_Checkout_API_key\" \\\n...\n```\nNote that when going live, you need to generate a new API Key to access the [live endpoints](https://docs.adyen.com/development-resources/live-endpoints).\n\n## Versioning\nCheckout API supports versioning of its endpoints through a version suffix in the endpoint URL. This suffix has the following format: \"vXX\", where XX is the version number.\n\nFor example:\n```\nhttps://checkout-test.adyen.com/v68/payments\n```",
"termsOfService": "https://www.adyen.com/legal/terms-and-conditions",
"title": "Adyen Checkout API",
"version": "68",
"x-apisguru-categories": [
"payment"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_adyen.com_.resources_adyen-website_themes_images_apple-icon-180x180.png"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/Adyen/adyen-openapi/master/json/CheckoutService-v68.json",
"version": "3.1"
}
],
"x-preferred": true,
"x-providerName": "adyen.com",
"x-publicVersion": true,
"x-serviceName": "CheckoutService"
},
"updated": "2021-11-12T23:18:19.544Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/adyen.com/CheckoutService/68/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/adyen.com/CheckoutService/68/openapi.yaml",
"openapiVer": "3.1.0"
}
}
},
"adyen.com:CheckoutUtilityService": {
"added": "2021-06-18T13:57:32.889Z",
"preferred": "1",
"versions": {
"1": {
"added": "2021-06-18T13:57:32.889Z",
"info": {
"contact": {
"email": "support@adyen.com",
"name": "Adyen Support",
"url": "https://support.adyen.com/",
"x-twitter": "Adyen"
},
"description": "A web service containing utility functions available for merchants integrating with Checkout APIs.\n## Authentication\nEach request to the Checkout Utility API must be signed with an API key. For this, obtain an API Key from your Customer Area, as described in [How to get the Checkout API key](https://docs.adyen.com/developers/user-management/how-to-get-the-checkout-api-key). Then set this key to the `X-API-Key` header value, for example:\n\n```\ncurl\n-H \"Content-Type: application/json\" \\\n-H \"X-API-Key: Your_Checkout_API_key\" \\\n...\n```\nNote that when going live, you need to generate a new API Key to access the [live endpoints](https://docs.adyen.com/developers/api-reference/live-endpoints).\n\n## Versioning\nCheckout API supports versioning of its endpoints through a version suffix in the endpoint URL. This suffix has the following format: \"vXX\", where XX is the version number.\n\nFor example:\n```\nhttps://checkout-test.adyen.com/v1/originKeys\n```",
"termsOfService": "https://docs.adyen.com/legal/terms-conditions",
"title": "Adyen Checkout Utility Service",
"version": "1",
"x-apisguru-categories": [
"payment"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_twitter.com_Adyen_profile_image.jpeg"
},
"x-origin": [
{
"converter": {
"url": "https://github.com/lucybot/api-spec-converter",
"version": "2.7.11"
},
"format": "openapi",
"url": "https://raw.githubusercontent.com/adyen/adyen-openapi/master/specs/3.0/CheckoutUtilityService-v1.json",
"version": "3.0"
}
],
"x-providerName": "adyen.com",
"x-serviceName": "CheckoutUtilityService"
},
"updated": "2021-06-18T13:57:32.889Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/adyen.com/CheckoutUtilityService/1/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/adyen.com/CheckoutUtilityService/1/openapi.yaml",
"openapiVer": "3.0.0"
}
}
},
"adyen.com:FundService": {
"added": "2020-11-03T12:51:40.318Z",
"preferred": "6",
"versions": {
"6": {
"added": "2020-11-03T12:51:40.318Z",
"info": {
"contact": {
"email": "developer-experience@adyen.com",
"name": "Adyen Developer Experience team",
"url": "https://www.adyen.help/hc/en-us/community/topics",
"x-twitter": "Adyen"
},
"description": "The Fund API provides endpoints for managing the funds in the accounts on your platform. These management operations include actions such as the transfer of funds from one account to another, the payout of funds to an account holder, and the retrieval of balances in an account.\n\nFor more information, refer to our [documentation](https://docs.adyen.com/platforms).\n## Authentication\nTo connect to the Fund API, you must use basic authentication credentials of your web service user. If you don't have one, please contact the [Adyen Support Team](https://support.adyen.com/hc/en-us/requests/new). Then use its credentials to authenticate your request, for example:\n\n```\ncurl\n-U \"ws@MarketPlace.YourMarketPlace\":\"YourWsPassword\" \\\n-H \"Content-Type: application/json\" \\\n...\n```\nNote that when going live, you need to generate new web service user credentials to access the [live endpoints](https://docs.adyen.com/development-resources/live-endpoints).\n\n## Versioning\nThe Fund API supports versioning of its endpoints through a version suffix in the endpoint URL. This suffix has the following format: \"vXX\", where XX is the version number.\n\nFor example:\n```\nhttps://cal-test.adyen.com/cal/services/Fund/v6/accountHolderBalance\n```",
"termsOfService": "https://www.adyen.com/legal/terms-and-conditions",
"title": "Adyen for Platforms: Fund API",
"version": "6",
"x-apisguru-categories": [
"payment"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_twitter.com_Adyen_profile_image.jpeg"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/Adyen/adyen-openapi/master/json/FundService-v6.json",
"version": "3.1"
}
],
"x-preferred": true,
"x-providerName": "adyen.com",
"x-publicVersion": true,
"x-serviceName": "FundService"
},
"updated": "2021-11-01T23:17:40.475Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/adyen.com/FundService/6/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/adyen.com/FundService/6/openapi.yaml",
"openapiVer": "3.1.0"
}
}
},
"adyen.com:HopService": {
"added": "2020-11-03T12:51:40.318Z",
"preferred": "6",
"versions": {
"6": {
"added": "2020-11-03T12:51:40.318Z",
"info": {
"contact": {
"email": "developer-experience@adyen.com",
"name": "Adyen Developer Experience team",
"url": "https://www.adyen.help/hc/en-us/community/topics",
"x-twitter": "Adyen"
},
"description": "The Hosted onboarding API provides endpoints that you can use to generate links to Adyen-hosted pages, such as an [onboarding page](https://docs.adyen.com/platforms/hosted-onboarding-page) or a [PCI compliance questionnaire](https://docs.adyen.com/platforms/platforms-for-partners). Then you can provide the link to your account holder so they can complete their onboarding.\n\n## Authentication\nTo connect to the Hosted onboarding API, you must use basic authentication credentials of your web service user. If you don't have one, contact our [Support Team](https://support.adyen.com/hc/en-us/requests/new). Then use your credentials to authenticate your request, for example:\n\n```\ncurl\n-U \"ws@MarketPlace.YourMarketPlace\":\"YourWsPassword\" \\\n-H \"Content-Type: application/json\" \\\n...\n```\nWhen going live, you need to generate new web service user credentials to access the [live endpoints](https://docs.adyen.com/development-resources/live-endpoints).\n\n## Versioning\nThe Hosted onboarding API supports versioning of its endpoints through a version suffix in the endpoint URL. This suffix has the following format: \"vXX\", where XX is the version number.\n\nFor example:\n```\nhttps://cal-test.adyen.com/cal/services/Hop/v6/getOnboardingUrl\n```",
"termsOfService": "https://www.adyen.com/legal/terms-and-conditions",
"title": "Adyen for Platforms: Hosted Onboarding",
"version": "6",
"x-apisguru-categories": [
"payment"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_twitter.com_Adyen_profile_image.jpeg"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/Adyen/adyen-openapi/master/json/HopService-v6.json",
"version": "3.1"
}
],
"x-preferred": true,
"x-providerName": "adyen.com",
"x-publicVersion": true,
"x-serviceName": "HopService"
},
"updated": "2021-11-01T23:17:40.475Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/adyen.com/HopService/6/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/adyen.com/HopService/6/openapi.yaml",
"openapiVer": "3.1.0"
}
}
},
"adyen.com:MarketPayNotificationService": {
"added": "2021-06-21T10:54:37.877Z",
"preferred": "6",
"versions": {
"6": {
"added": "2021-06-21T10:54:37.877Z",
"info": {
"contact": {
"email": "developer-experience@adyen.com",
"name": "Adyen Developer Experience team",
"url": "https://www.adyen.help/hc/en-us/community/topics",
"x-twitter": "Adyen"
},
"description": "The Notification API sends notifications to the endpoints specified in a given subscription. Subscriptions are managed through the Notification Configuration API. The API specifications listed here detail the format of each notification.\n\nFor more information, refer to our [documentation](https://docs.adyen.com/platforms/notifications).",
"termsOfService": "https://www.adyen.com/legal/terms-and-conditions",
"title": "Adyen for Platforms: Notifications",
"version": "6",
"x-apisguru-categories": [
"payment"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_twitter.com_Adyen_profile_image"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/Adyen/adyen-openapi/master/json/MarketPayNotificationService-v6.json",
"version": "3.1"
}
],
"x-preferred": true,
"x-providerName": "adyen.com",
"x-publicVersion": true,
"x-serviceName": "MarketPayNotificationService"
},
"updated": "2021-11-12T23:18:19.544Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/adyen.com/MarketPayNotificationService/6/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/adyen.com/MarketPayNotificationService/6/openapi.yaml",
"openapiVer": "3.1.0"
}
}
},
"adyen.com:NotificationConfigurationService": {
"added": "2020-11-03T12:51:40.318Z",
"preferred": "6",
"versions": {
"6": {
"added": "2020-11-03T12:51:40.318Z",
"info": {
"contact": {
"email": "developer-experience@adyen.com",
"name": "Adyen Developer Experience team",
"url": "https://www.adyen.help/hc/en-us/community/topics",
"x-twitter": "Adyen"
},
"description": "The Notification Configuration API provides endpoints for setting up and testing notifications that inform you of events on your platform, for example when a KYC check or a payout has been completed.\n\nFor more information, refer to our [documentation](https://docs.adyen.com/platforms/notifications).\n## Authentication\nTo connect to the Notification Configuration API, you must use basic authentication credentials of your web service user. If you don't have one, contact our [Adyen Support Team](https://support.adyen.com/hc/en-us/requests/new). Then use its credentials to authenticate your request, for example:\n\n```\ncurl\n-U \"ws@MarketPlace.YourMarketPlace\":\"YourWsPassword\" \\\n-H \"Content-Type: application/json\" \\\n...\n```\nNote that when going live, you need to generate new web service user credentials to access the [live endpoints](https://docs.adyen.com/development-resources/live-endpoints).\n\n## Versioning\nThe Notification Configuration API supports versioning of its endpoints through a version suffix in the endpoint URL. This suffix has the following format: \"vXX\", where XX is the version number.\n\nFor example:\n```\nhttps://cal-test.adyen.com/cal/services/Notification/v6/createNotificationConfiguration\n```",
"termsOfService": "https://www.adyen.com/legal/terms-and-conditions",
"title": "Adyen for Platforms: Notification Configuration API",
"version": "6",
"x-apisguru-categories": [
"payment"
],
"x-logo": {
"url": "https://api.apis.guru/v2/cache/logo/https_twitter.com_Adyen_profile_image.jpeg"
},
"x-origin": [
{
"format": "openapi",
"url": "https://raw.githubusercontent.com/Adyen/adyen-openapi/master/json/NotificationConfigurationService-v6.json",
"version": "3.1"
}
],
"x-preferred": true,
"x-providerName": "adyen.com",
"x-publicVersion": true,
"x-serviceName": "NotificationConfigurationService"
},
"updated": "2021-11-12T23:18:19.544Z",
"swaggerUrl": "https://api.apis.guru/v2/specs/adyen.com/NotificationConfigurationService/6/openapi.json",
"swaggerYamlUrl": "https://api.apis.guru/v2/specs/adyen.com/NotificationConfigurationService/6/openapi.yaml",
"openapiVer": "3.1.0"
}
}
}
}
@@ -0,0 +1,42 @@
{
"numSpecs": 3809,
"numAPIs": 2362,
"numEndpoints": 79405,
"unreachable": 138,
"invalid": 634,
"unofficial": 24,
"fixes": 34001,
"fixedPct": 21,
"datasets": [
{
"title": "providerCount",
"data": {
"adyen.com": 69,
"amazonaws.com": 295,
"apideck.com": 14,
"apisetu.gov.in": 181,
"azure.com": 1832,
"ebay.com": 20,
"fungenerators.com": 12,
"googleapis.com": 443,
"hubapi.com": 11,
"interzoid.com": 20,
"mastercard.com": 14,
"microsoft.com": 27,
"nexmo.com": 20,
"nytimes.com": 11,
"parliament.uk": 11,
"sportsdata.io": 35,
"twilio.com": 41,
"windows.net": 10,
"Others": 743
}
}
],
"stars": 2964,
"issues": 206,
"thisWeek": {
"added": 123,
"updated": 119
}
}
File diff suppressed because one or more lines are too long
@@ -0,0 +1,36 @@
import org.jetbrains.kotlin.gradle.dsl.JvmTarget
plugins {
application
kotlin("jvm")
id("org.jetbrains.kotlinx.dataframe")
// only mandatory if `kotlin.dataframe.add.ksp=false` in gradle.properties
id("com.google.devtools.ksp")
}
repositories {
mavenCentral()
mavenLocal() // in case of local dataframe development
}
application.mainClass = "org.jetbrains.kotlinx.dataframe.examples.movies.MoviesWithDataClassKt"
dependencies {
// implementation("org.jetbrains.kotlinx:dataframe:X.Y.Z")
implementation(project(":"))
}
kotlin {
compilerOptions {
jvmTarget = JvmTarget.JVM_1_8
freeCompilerArgs.add("-Xjdk-release=8")
}
}
tasks.withType<JavaCompile> {
sourceCompatibility = JavaVersion.VERSION_1_8.toString()
targetCompatibility = JavaVersion.VERSION_1_8.toString()
options.release.set(8)
}
@@ -0,0 +1,66 @@
package org.jetbrains.kotlinx.dataframe.examples.movies
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
import org.jetbrains.kotlinx.dataframe.api.*
import org.jetbrains.kotlinx.dataframe.io.*
/**
* movieId title genres
* 0 9b30aff7943f44579e92c261f3adc193 Women in Black (1997) Fantasy|Suspenseful|Comedy
* 1 2a1ba1fc5caf492a80188e032995843e Bumblebee Movie (2007) Comedy|Jazz|Family|Animation
*/
@DataSchema
interface Movie {
val movieId: String
val title: String
val genres: String
}
private const val pathToCsv = "examples/idea-examples/movies/src/main/resources/movies.csv"
// Uncomment this line if you want to copy-paste and run the code in your project without downloading the file
//private const val pathToCsv = "https://raw.githubusercontent.com/Kotlin/dataframe/master/examples/idea-examples/movies/src/main/resources/movies.csv"
fun main() {
// This example shows how to the use extension properties API to address columns in different operations
// https://kotlin.github.io/dataframe/apilevels.html
// Add the Gradle plugin and run `assemble`
// check the README https://github.com/Kotlin/dataframe?tab=readme-ov-file#setup
val step1 = DataFrame
.read(pathToCsv).convertTo<Movie>()
.split { genres }.by("|").inplace()
.split { title }.by {
listOf<Any>(
"""\s*\(\d{4}\)\s*$""".toRegex().replace(it, ""),
"\\d{4}".toRegex().findAll(it).lastOrNull()?.value?.toIntOrNull() ?: -1,
)
}.into("title", "year")
.explode("genres")
step1.print()
/**
* Data is parsed and prepared for aggregation
* movieId title year genres
* 0 9b30aff7943f44579e92c261f3adc193 Women in Black 1997 Fantasy
* 1 9b30aff7943f44579e92c261f3adc193 Women in Black 1997 Suspenseful
* 2 9b30aff7943f44579e92c261f3adc193 Women in Black 1997 Comedy
* 3 2a1ba1fc5caf492a80188e032995843e Bumblebee Movie 2007 Comedy
* 4 2a1ba1fc5caf492a80188e032995843e Bumblebee Movie 2007 Jazz
* 5 2a1ba1fc5caf492a80188e032995843e Bumblebee Movie 2007 Family
* 6 2a1ba1fc5caf492a80188e032995843e Bumblebee Movie 2007 Animation
*/
val step2 = step1
.filter { "year"<Int>() >= 0 && genres != "(no genres listed)" }
.groupBy("year")
.sortBy("year")
.pivot("genres", inward = false)
.aggregate {
count() into "count"
mean() into "mean"
}
step2.print(10)
// Discover the final reshaped data in an interactive HTML table
// step2.toStandaloneHTML().openInBrowser()
}
@@ -0,0 +1,21 @@
movieId,title,genres
9b30aff7943f44579e92c261f3adc193,Women in Black (1997),Fantasy|Suspenseful|Comedy
2a1ba1fc5caf492a80188e032995843e,Bumblebee Movie (2007),Comedy|Jazz|Family|Animation
f44ceb4771504342bb856d76c112d5a6,Magical School Boy and the Rock of Wise Men (2001),Fantasy|Growing up|Magic
43d02fb064514ff3bd30d1e3a7398357,Master of the Jewlery: The Company of the Jewel (2001),Fantasy|Magic|Suspenseful
6aa0d26a483148998c250b9c80ddf550,Sun Conflicts: Part IV: A Novel Espair (1977),Fantasy
eace16e59ce24eff90bf8924eb6a926c,The Outstanding Bulk (2008),Fantasy|Superhero|Family
ae916bc4844a4bb7b42b70d9573d05cd,In Automata (2014),Horror|Existential
c1f0a868aeb44c5ea8d154ec3ca295ac,Interplanetary (2014),Sci-fi|Futuristic
9595b771f87f42a3b8dd07d91e7cb328,Woods Run (1994),Family|Drama
aa9fc400e068443488b259ea0802a975,Anthropod-Dude (2002),Superhero|Fantasy|Family|Growing up
22d20c2ba11d44cab83aceea39dc00bd,The Chamber (2003),Comedy|Drama
8cf4d0c1bd7b41fab6af9d92c892141f,That Thing About an Iceberg (1997),Drama|History|Family|Romance
c2f3e7588da84684a7d78d6bd8d8e1f4,Vehicles (2006),Animation|Family
ce06175106af4105945f245161eac3c7,Playthings Tale (1995),Animation|Family
ee28d7e69103485c83e10b8055ef15fb,Metal Man 2 (2010),Fantasy|Superhero|Family
c32bdeed466f4ec09de828bb4b6fc649,Surgeon Odd in the Omniverse of Crazy (2022),Fantasy|Superhero|Family|Horror
d4a325ab648a42c4a2d6f35dfabb387f,Bad Dream on Pine Street (1984),Horror
60ebe74947234ddcab49dea1a958faed,The Shimmering (1980),Horror
f24327f2b05147b197ca34bf13ae3524,Krubit: Societal Teachings for Do Many Good Amazing Country of Uzbekistan (2006),Comedy
2bb29b3a245e434fa80542e711fd2cee,This is No Movie (1950),(no genres listed)
1 movieId title genres
2 9b30aff7943f44579e92c261f3adc193 Women in Black (1997) Fantasy|Suspenseful|Comedy
3 2a1ba1fc5caf492a80188e032995843e Bumblebee Movie (2007) Comedy|Jazz|Family|Animation
4 f44ceb4771504342bb856d76c112d5a6 Magical School Boy and the Rock of Wise Men (2001) Fantasy|Growing up|Magic
5 43d02fb064514ff3bd30d1e3a7398357 Master of the Jewlery: The Company of the Jewel (2001) Fantasy|Magic|Suspenseful
6 6aa0d26a483148998c250b9c80ddf550 Sun Conflicts: Part IV: A Novel Espair (1977) Fantasy
7 eace16e59ce24eff90bf8924eb6a926c The Outstanding Bulk (2008) Fantasy|Superhero|Family
8 ae916bc4844a4bb7b42b70d9573d05cd In Automata (2014) Horror|Existential
9 c1f0a868aeb44c5ea8d154ec3ca295ac Interplanetary (2014) Sci-fi|Futuristic
10 9595b771f87f42a3b8dd07d91e7cb328 Woods Run (1994) Family|Drama
11 aa9fc400e068443488b259ea0802a975 Anthropod-Dude (2002) Superhero|Fantasy|Family|Growing up
12 22d20c2ba11d44cab83aceea39dc00bd The Chamber (2003) Comedy|Drama
13 8cf4d0c1bd7b41fab6af9d92c892141f That Thing About an Iceberg (1997) Drama|History|Family|Romance
14 c2f3e7588da84684a7d78d6bd8d8e1f4 Vehicles (2006) Animation|Family
15 ce06175106af4105945f245161eac3c7 Playthings Tale (1995) Animation|Family
16 ee28d7e69103485c83e10b8055ef15fb Metal Man 2 (2010) Fantasy|Superhero|Family
17 c32bdeed466f4ec09de828bb4b6fc649 Surgeon Odd in the Omniverse of Crazy (2022) Fantasy|Superhero|Family|Horror
18 d4a325ab648a42c4a2d6f35dfabb387f Bad Dream on Pine Street (1984) Horror
19 60ebe74947234ddcab49dea1a958faed The Shimmering (1980) Horror
20 f24327f2b05147b197ca34bf13ae3524 Krubit: Societal Teachings for Do Many Good Amazing Country of Uzbekistan (2006) Comedy
21 2bb29b3a245e434fa80542e711fd2cee This is No Movie (1950) (no genres listed)
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,39 @@
userId,movieId,tag,timestamp
3,9595b771f87f42a3b8dd07d91e7cb328,classic,1439472355
3,6aa0d26a483148998c250b9c80ddf550,sci-fi,1439472256
4,f24327f2b05147b197ca34bf13ae3524,dark comedy,1573943598
4,ae916bc4844a4bb7b42b70d9573d05cd,great dialogue,1573943604
4,f24327f2b05147b197ca34bf13ae3524,so bad it's good,1573943455
4,d4a325ab648a42c4a2d6f35dfabb387f,tense,1573943077
4,ae916bc4844a4bb7b42b70d9573d05cd,artificial intelligence,1573942979
4,ae916bc4844a4bb7b42b70d9573d05cd,philosophical,1573943033
4,c1f0a868aeb44c5ea8d154ec3ca295ac,tense,1573943042
4,22d20c2ba11d44cab83aceea39dc00bd,so bad it's good,1573942965
4,8cf4d0c1bd7b41fab6af9d92c892141f,cliche,1573943721
4,2bb29b3a245e434fa80542e711fd2cee,musical,1573943714
4,60ebe74947234ddcab49dea1a958faed,horror,1573945163
4,2bb29b3a245e434fa80542e711fd2cee,unpredictable,1573945171
19,9b30aff7943f44579e92c261f3adc193,Oscar (Best Supporting Actress),1446909853
19,43d02fb064514ff3bd30d1e3a7398357,adventure,1445286141
19,f44ceb4771504342bb856d76c112d5a6,fantasy,1445286144
19,c1f0a868aeb44c5ea8d154ec3ca295ac,post-apocalyptic,1445286136
20,2a1ba1fc5caf492a80188e032995843e,bah,1155082282
84,f24327f2b05147b197ca34bf13ae3524,documentary,1549387432
87,c1f0a868aeb44c5ea8d154ec3ca295ac,sci-fi,1542308464
87,ae916bc4844a4bb7b42b70d9573d05cd,android(s)/cyborg(s),1542309549
87,c1f0a868aeb44c5ea8d154ec3ca295ac,apocalypse,1542309703
87,ae916bc4844a4bb7b42b70d9573d05cd,artificial intelligence,1542309599
87,ee28d7e69103485c83e10b8055ef15fb,franchise,1542309536
87,ee28d7e69103485c83e10b8055ef15fb,sci-fi,1542308408
87,ee28d7e69103485c83e10b8055ef15fb,science fiction,1542308395
87,eace16e59ce24eff90bf8924eb6a926c,bad science,1522676752
87,ae916bc4844a4bb7b42b70d9573d05cd,philosophical issues,1522676687
87,6aa0d26a483148998c250b9c80ddf550,sci-fi,1522676660
87,6aa0d26a483148998c250b9c80ddf550,science fiction,1522676703
87,6aa0d26a483148998c250b9c80ddf550,space,1522676664
87,c1f0a868aeb44c5ea8d154ec3ca295ac,space travel,1522676685
87,c1f0a868aeb44c5ea8d154ec3ca295ac,visually appealing,1522676682
91,aa9fc400e068443488b259ea0802a975,quirky,1415914797
91,8cf4d0c1bd7b41fab6af9d92c892141f,romantic,1415131173
91,ae916bc4844a4bb7b42b70d9573d05cd,thought-provoking,1415131203
91,f44ceb4771504342bb856d76c112d5a6,based on book,1414248543
1 userId movieId tag timestamp
2 3 9595b771f87f42a3b8dd07d91e7cb328 classic 1439472355
3 3 6aa0d26a483148998c250b9c80ddf550 sci-fi 1439472256
4 4 f24327f2b05147b197ca34bf13ae3524 dark comedy 1573943598
5 4 ae916bc4844a4bb7b42b70d9573d05cd great dialogue 1573943604
6 4 f24327f2b05147b197ca34bf13ae3524 so bad it's good 1573943455
7 4 d4a325ab648a42c4a2d6f35dfabb387f tense 1573943077
8 4 ae916bc4844a4bb7b42b70d9573d05cd artificial intelligence 1573942979
9 4 ae916bc4844a4bb7b42b70d9573d05cd philosophical 1573943033
10 4 c1f0a868aeb44c5ea8d154ec3ca295ac tense 1573943042
11 4 22d20c2ba11d44cab83aceea39dc00bd so bad it's good 1573942965
12 4 8cf4d0c1bd7b41fab6af9d92c892141f cliche 1573943721
13 4 2bb29b3a245e434fa80542e711fd2cee musical 1573943714
14 4 60ebe74947234ddcab49dea1a958faed horror 1573945163
15 4 2bb29b3a245e434fa80542e711fd2cee unpredictable 1573945171
16 19 9b30aff7943f44579e92c261f3adc193 Oscar (Best Supporting Actress) 1446909853
17 19 43d02fb064514ff3bd30d1e3a7398357 adventure 1445286141
18 19 f44ceb4771504342bb856d76c112d5a6 fantasy 1445286144
19 19 c1f0a868aeb44c5ea8d154ec3ca295ac post-apocalyptic 1445286136
20 20 2a1ba1fc5caf492a80188e032995843e bah 1155082282
21 84 f24327f2b05147b197ca34bf13ae3524 documentary 1549387432
22 87 c1f0a868aeb44c5ea8d154ec3ca295ac sci-fi 1542308464
23 87 ae916bc4844a4bb7b42b70d9573d05cd android(s)/cyborg(s) 1542309549
24 87 c1f0a868aeb44c5ea8d154ec3ca295ac apocalypse 1542309703
25 87 ae916bc4844a4bb7b42b70d9573d05cd artificial intelligence 1542309599
26 87 ee28d7e69103485c83e10b8055ef15fb franchise 1542309536
27 87 ee28d7e69103485c83e10b8055ef15fb sci-fi 1542308408
28 87 ee28d7e69103485c83e10b8055ef15fb science fiction 1542308395
29 87 eace16e59ce24eff90bf8924eb6a926c bad science 1522676752
30 87 ae916bc4844a4bb7b42b70d9573d05cd philosophical issues 1522676687
31 87 6aa0d26a483148998c250b9c80ddf550 sci-fi 1522676660
32 87 6aa0d26a483148998c250b9c80ddf550 science fiction 1522676703
33 87 6aa0d26a483148998c250b9c80ddf550 space 1522676664
34 87 c1f0a868aeb44c5ea8d154ec3ca295ac space travel 1522676685
35 87 c1f0a868aeb44c5ea8d154ec3ca295ac visually appealing 1522676682
36 91 aa9fc400e068443488b259ea0802a975 quirky 1415914797
37 91 8cf4d0c1bd7b41fab6af9d92c892141f romantic 1415131173
38 91 ae916bc4844a4bb7b42b70d9573d05cd thought-provoking 1415131203
39 91 f44ceb4771504342bb856d76c112d5a6 based on book 1414248543
@@ -0,0 +1,177 @@
# spark-parquet-dataframe
This example shows how to:
- Load a CSV (California Housing) with local Apache Spark
- Write it to Parquet, then read Parquet back with Kotlin DataFrame (Arrow-based reader)
- Train a simple Linear Regression model with Spark MLlib
- Export the model in two ways and explain why we do both
- Inspect the saved Spark model artifacts
- Build a 2D plot for a single model coefficient
Below is a faithful, step-by-step walkthrough matching the code in `SparkParquetDataframe.kt`.
## The 10 steps of the example (with explanations)
1. Start local Spark
- A local `SparkSession` is created. The example configures Spark to work against the local filesystem and sets Java options required by Arrow/Parquet.
2. Read `housing.csv` with Spark
- Spark loads the CSV with header and automatic schema inference into a Spark DataFrame.
3. Show the Spark DataFrame and write it to Parquet
- `show(10, false)` prints the first rows for inspection.
- The DataFrame is written to a temporary directory in Parquet format.
4. Read this Parquet with Kotlin DataFrame (Arrow backend)
- Kotlin DataFrame reads the concrete `part-*.parquet` files produced by Spark using the Arrow-based Parquet reader.
5. Print `head()` of the Kotlin DataFrame
- A quick glance at the loaded data in Kotlin DataFrame form.
6. Train a regression model with Spark MLlib
- Numeric features are assembled with `VectorAssembler` (the categorical `ocean_proximity` is excluded).
- A `LinearRegression` model (no intercept in the code, elasticNet=0.5, maxIter=10) is trained on a train split.
7. Export model summary to Parquet (tabular, portable)
- The learned coefficients are paired with their feature names, plus a special row for the intercept.
- This small, explicit summary table is written to Parquet. Its easy to exchange and read without Spark.
8. Read the model-summary Parquet with Kotlin DataFrame
- Kotlin DataFrame reads the summary Parquet and prints its head. This is the portable path for analytics/visualization.
9. Save the full fitted PipelineModel
- The entire fitted `PipelineModel` is saved using Sparks native ML writer. This produces a directory with both JSON metadata and Parquet data.
10. Inspect pipeline internals using Kotlin DataFrame
- For exploration, the example then reads some of those JSON and Parquet files back using Kotlin DataFrame.
- Notes:
- Internal folder names contain stage indices and UIDs (e.g., `0_...`, `1_...`) and may vary across Spark versions.
- This inspection method is for exploration only. For reuse in Spark, you should load using `PipelineModel.load(...)`.
- Sub-steps:
- 10.1 Root metadata (JSON): read each file under `.../metadata/` and print heads.
- 10.2 Stage 0 (VectorAssembler): read JSON metadata and Parquet data under `.../stages/0_*/{metadata,data}` if present.
- 10.3 Stage 1 (LinearRegressionModel): read JSON metadata and Parquet data under `.../stages/1_*/{metadata,data}` if present.
11. Build a 2D plot using one coefficient
- We choose the feature `median_income` and the label `median_house_value` to produce a 2D scatter plot.
- From the summary table, we extract the slope for `median_income` and the intercept, and draw the line `y = slope*x + intercept`.
- Sub-steps:
- 11.1 Concatenate any metadata JSON frames that were successfully read (optional, for inspection).
- 11.2 Use the model-summary table (coefficients + intercept) as the unified model data source.
- 11.3 Compute the slope/intercept for the chosen feature from the summary table.
- 11.4 Create a Kandy plot (points + abLine) and save it to `linear_model_plot.jpg`.
- The plot is saved as `linear_model_plot.jpg` (an example image is committed at `lets-plot-images/linear_model_plot.jpg`).
![Linear model plot](src/main/resources/linear_model_plot.jpg)
## Why two ways to serialize the model?
We deliberately show both because they serve different goals:
- Tabular summary (Parquet):
- A small, human- and tool-friendly table of coefficients + intercept.
- Portable across tools; easy to read directly in Kotlin DataFrame, pandas, SQL engines, etc.
- Great for analytics, reporting, and plotting.
- Full Spark ML writer (PipelineModel.save):
- Contains everything needed to reuse the trained model inside Spark (including metadata and internal data).
- Directory layout and file names arent guaranteed to be stable across versions; the intended way to consume is `PipelineModel.load(...)` in Spark.
- Not ideal as a cross-tool tabular export, but perfect for production use in Spark pipelines.
## Why do we plot only one coefficient?
The linear model has multiple coefficients (one per feature). A 2D chart can only show two axes. To visualize the learned relationship, we pick a single feature (here, `median_income`) and the target (`median_house_value`) and draw the corresponding fitted line. You can repeat the procedure with any other feature to obtain a different 2D projection of the multi-dimensional model.
## About the dataset (`housing.csv`)
1. __longitude:__ How far west a house is; higher values are farther west
2. __latitude:__ How far north a house is; higher values are farther north
3. __housingMedianAge:__ Median age of a house within a block; lower means newer
4. __totalRooms:__ Total number of rooms within a block
5. __totalBedrooms:__ Total number of bedrooms within a block
6. __population:__ Total number of people residing within a block
7. __households:__ Total number of households within a block
8. __medianIncome:__ Median household income (in tens of thousands of USD)
9. __medianHouseValue:__ Median house value (in USD)
10. __oceanProximity:__ Location of the house with respect to the ocean/sea
The CSV file is located at `examples/housing.csv` in the repository root.
## Windows note
<details>
<summary>Running on Windows: install winutils and set Hadoop environment variables</summary>
On Windows, Spark may require Hadoop native helpers. If you see errors like "winutils.exe not found" or permission/FS issues, do the following:
1. Install winutils.exe that matches your Spark/Hadoop version and place it under a Hadoop directory, e.g. `C:\hadoop\bin\winutils.exe`.
2. Set environment variables:
- `HADOOP_HOME=C:\hadoop`
- Add `%HADOOP_HOME%\bin` to your `PATH`
3. Restart your IDE/terminal so the variables are picked up and re-run the example.
This ensures Spark can operate correctly with Hadoop on Windows.
</details>
## SparkSession configuration to bypass Hadoop/winutils and enable Arrow
Use the following SparkSession builder if you want to completely avoid native Hadoop libraries (including winutils on Windows) and enable Arrow-related add-opens:
```kotlin
val spark = SparkSession.builder()
.appName("spark-parquet-dataframe")
.master("local[*]")
.config("spark.sql.warehouse.dir", Files.createTempDirectory("spark-warehouse").toString())
// Completely bypass native Hadoop libraries and winutils
.config("spark.hadoop.fs.defaultFS", "file:///")
.config("spark.hadoop.fs.AbstractFileSystem.file.impl", "org.apache.hadoop.fs.local.LocalFs")
.config("spark.hadoop.fs.file.impl.disable.cache", "true")
// Disable Hadoop native library requirements and native warnings
.config("spark.hadoop.hadoop.native.lib", "false")
.config("spark.hadoop.io.native.lib.available", "false")
.config(
"spark.driver.extraJavaOptions",
"--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED"
)
.config(
"spark.executor.extraJavaOptions",
"--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED"
)
.getOrCreate()
```
Notes:
- This configuration uses the pure-Java local filesystem (file://) and disables Hadoop native library checks, making winutils unnecessary.
- If you rely on HDFS or native Hadoop tooling, omit these overrides and configure Hadoop as usual.
## What each Spark config does (and why it matters on JDK 21 and the Java module system)
- `spark.sql.warehouse.dir=Files.createTempDirectory("spark-warehouse").toString()`
- Points Spark SQLs warehouse to an ephemeral, writable temp directory.
- Avoids permission issues and clutter in the project directory, especially on Windows.
- `spark.hadoop.fs.defaultFS = file:///`
- Forces Hadoop to use the local filesystem instead of HDFS.
- Bypasses native Hadoop bits and makes winutils unnecessary on Windows for this example.
- `spark.hadoop.fs.AbstractFileSystem.file.impl = org.apache.hadoop.fs.local.LocalFs`
- Ensures the AbstractFileSystem implementation resolves to the pure-Java LocalFs.
- `spark.hadoop.fs.file.impl.disable.cache = true`
- Disables FS implementation caching so the LocalFs overrides are applied immediately within the current JVM.
- `spark.hadoop.hadoop.native.lib = false` and `spark.hadoop.io.native.lib.available = false`
- Tell Hadoop not to load native libraries and suppress related warnings.
- Prevents errors stemming from missing native binaries (e.g., winutils) when you only need local file IO.
- `spark.driver.extraJavaOptions` and `spark.executor.extraJavaOptions` with:
`--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED`
- Why needed: Starting with the Java Platform Module System (JDK 9+) and especially under JDK 17/21 (JEP 403 strong encapsulation), reflective access into JDK internals is restricted. Apache Arrow (used by the vectorized Parquet reader in Kotlin DataFrame) may need reflective access within java.nio for memory management and buffer internals. Without opening the package, you can get errors like:
- `java.lang.reflect.InaccessibleObjectException: module java.base does not open java.nio to org.apache.arrow.memory.core`
- ...does not open `java.nio` to unnamed module @xxxx
- What it does: Opens the `java.nio` package in module `java.base` at runtime to both the named module org.apache.arrow.memory.core (when Arrow is on the module path) and to ALL-UNNAMED (when Arrow is on the classpath). This enables Arrows memory code to work on modern JDKs.
- Driver vs executor: In `local[*]` both apply to the same process, but keeping both symmetric makes this snippet cluster-ready (executors are separate JVMs).
- When you might not need it: On JDK 8 (no module system) or if your stack does not use Arrows vectorized path. On JDK 17/21+, keep it if you see `InaccessibleObjectException` referencing `java.nio`.
- Other packages: Some environments/libraries (e.g., Netty) may require additional opens such as `--add-opens=java.base/sun.nio.ch=ALL-UNNAMED`. Only add the opens that your error messages explicitly mention.
- Security note: add-opens affects only the current JVM process at runtime; it doesnt change compile-time checks or system-wide settings.
## Troubleshooting on JDK 17+
- Symptom: `InaccessibleObjectException` mentioning java.nio or “illegal reflective access” warnings.
- Fix: Ensure both spark.driver.extraJavaOptions and `spark.executor.extraJavaOptions` include the exact `--add-opens` string shown above.
- Symptom: Works in IDE, fails with spark-submit.
- Fix: Pass the options with `--conf spark.driver.extraJavaOptions=...` and `--conf spark.executor.extraJavaOptions=...` (or via SPARK_SUBMIT_OPTS), not only in IDE settings.
- Symptom: On Windows, “winutils.exe not found”.
- Fix: Either use this configuration block (bypassing native Hadoop) or install winutils as described in the Windows note above.
@@ -0,0 +1,70 @@
import org.jetbrains.kotlin.gradle.dsl.JvmTarget
import org.jetbrains.kotlin.gradle.tasks.KotlinCompile
plugins {
application
kotlin("jvm")
id("org.jetbrains.kotlinx.dataframe")
// only mandatory if `kotlin.dataframe.add.ksp=false` in gradle.properties
id("com.google.devtools.ksp")
}
repositories {
mavenCentral()
mavenLocal() // in case of local dataframe development
}
application.mainClass = "org.jetbrains.kotlinx.dataframe.examples.spark.parquet.SparkParquetDataframeKt"
dependencies {
implementation(project(":"))
// Spark SQL + MLlib (Spark 4.0.0)
implementation("org.apache.spark:spark-sql_2.13:4.0.0")
implementation("org.apache.spark:spark-mllib_2.13:4.0.0")
// Kandy (Lets-Plot backend) for plotting
implementation(libs.kandy) {
// Avoid pulling transitive kotlinx-dataframe from Kandy — we use the monorepo modules
exclude("org.jetbrains.kotlinx", "dataframe")
}
// Logging to keep Spark quiet
implementation(libs.log4j.core)
implementation(libs.log4j.api)
}
// for Java 17+, and Arrow/Parquet support
application {
applicationDefaultJvmArgs = listOf(
"--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED",
)
}
java {
toolchain {
languageVersion.set(JavaLanguageVersion.of(11))
}
}
tasks.withType<JavaExec> {
jvmArgs(
"--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED",
)
}
tasks.withType<JavaCompile> {
sourceCompatibility = JavaVersion.VERSION_11.toString()
targetCompatibility = JavaVersion.VERSION_11.toString()
options.release.set(11)
}
// Configure KSP tasks to use the same JVM target
kotlin {
compilerOptions {
jvmTarget.set(JvmTarget.JVM_11)
freeCompilerArgs.add("-Xjdk-release=11")
}
}
@@ -0,0 +1,333 @@
package org.jetbrains.kotlinx.dataframe.examples.spark.parquet
import org.apache.spark.ml.PipelineStage
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.ml.regression.LinearRegressionModel
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.col
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.api.add
import org.jetbrains.kotlinx.dataframe.api.concat
import org.jetbrains.kotlinx.dataframe.api.head
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.cast
import org.jetbrains.kotlinx.dataframe.api.dropNA
import org.jetbrains.kotlinx.dataframe.api.getColumn
import org.jetbrains.kotlinx.dataframe.io.readJson
import org.jetbrains.kotlinx.dataframe.io.readParquet
import org.jetbrains.kotlinx.kandy.dsl.plot
import org.jetbrains.kotlinx.kandy.letsplot.layers.points
import org.jetbrains.kotlinx.kandy.letsplot.layers.abLine
import org.jetbrains.kotlinx.kandy.letsplot.export.save
import org.jetbrains.kotlinx.kandy.util.color.Color
import java.nio.file.Files
import java.nio.file.Path
import java.nio.file.Paths
import java.util.stream.Collectors
import kotlin.io.path.exists
import kotlin.io.path.isDirectory
import kotlin.io.path.notExists
import kotlin.jvm.java
/**
* Demonstrates reading CSV with Apache Spark, writing Parquet, and reading Parquet with Kotlin DataFrame via Arrow.
* Also trains a simple Spark ML regression model and exports a summary as Parquet, then reads it back with Kotlin DataFrame.
*
* NOTE: This example doesn't use Kotlin Apache Spark API, as it relies on the Java Spark API directly.
*/
fun main() {
// 1) Start local Spark
val spark = SparkSession.builder()
.appName("spark-parquet-dataframe")
.master("local[*]")
.config("spark.sql.warehouse.dir", Files.createTempDirectory("spark-warehouse").toString())
// Completely bypass native Hadoop libraries and winutils
.config("spark.hadoop.fs.defaultFS", "file:///")
.config("spark.hadoop.fs.AbstractFileSystem.file.impl", "org.apache.hadoop.fs.local.LocalFs")
.config("spark.hadoop.fs.file.impl.disable.cache", "true")
// Disable Hadoop native library requirements and native warnings
.config("spark.hadoop.hadoop.native.lib", false)
.config("spark.hadoop.io.native.lib.available", false)
.config(
"spark.driver.extraJavaOptions",
"--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED"
)
.config(
"spark.executor.extraJavaOptions",
"--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED"
)
.getOrCreate()
// Make Spark a bit quieter
spark.sparkContext().setLogLevel("WARN")
// 2) Read housing.csv (from a repo path) with Spark
val csvResource = object {}::class.java.getResource("/housing.csv")
?: throw IllegalStateException("housing.csv not found in classpath resources")
val csvPath = Paths.get(csvResource.toURI()).toAbsolutePath().toString()
val sdf = spark.read()
.option("header", "true")
.option("inferSchema", "true")
.csv(csvPath)
// 3) Print the Spark DataFrame and export to Parquet in a temp directory
println("Spark DataFrame (head):")
sdf.show(10, false)
val parquetDir: Path = Files.createTempDirectory("housing_spark_parquet_")
val parquetPath = parquetDir.toString()
sdf.write().mode("overwrite").parquet(parquetPath)
println("Saved Spark Parquet to: $parquetPath")
// 4) Read this Parquet with Kotlin DataFrame (Arrow backend)
// Pass the actual part-*.parquet files instead of the directory
val parquetFiles = listParquetFilesIfAny(parquetDir)
val kdf = DataFrame.readParquet(*parquetFiles)
// 5) Print out head() for this Kotlin DataFrame
println("Kotlin DataFrame (head):")
kdf.head().print()
// 6) Train a regression model with Spark MLlib
// Use numeric features only, drop the categorical 'ocean_proximity'
val labelCol = "median_house_value"
val candidateFeatureCols = listOf(
"longitude", "latitude", "housing_median_age", "total_rooms", "total_bedrooms",
"population", "households", "median_income"
)
val colsArray = (candidateFeatureCols + labelCol).map { col(it) }.toTypedArray()
val sdfNumeric = sdf.select(*colsArray)
.na().drop()
val assembler = VectorAssembler()
.setInputCols(candidateFeatureCols.toTypedArray())
.setOutputCol("features")
// Build Pipeline (VectorAssembler -> LinearRegression) and train/test split WITHOUT prebuilt 'features'
val lr = LinearRegression()
.setFeaturesCol("features")
.setLabelCol(labelCol)
.setFitIntercept(false)
.setElasticNetParam(0.5)
.setMaxIter(10)
val fullPipeline = org.apache.spark.ml.Pipeline().setStages(arrayOf<PipelineStage>(assembler, lr))
val fullPipelineModel = fullPipeline.fit(sdfNumeric)
val lrModel = fullPipelineModel.stages()[1] as LinearRegressionModel
val summary = lrModel.summary()
println("Training RMSE: ${summary.rootMeanSquaredError()}")
println("Training r2: ${summary.r2()}")
// 7) Export model information to Parquet (coefficients per feature + intercept row)
val coeffs = lrModel.coefficients().toArray()
val rows =
candidateFeatureCols.mapIndexed { idx, name -> org.apache.spark.sql.RowFactory.create(name, coeffs[idx]) } +
listOf(org.apache.spark.sql.RowFactory.create("intercept", lrModel.intercept()))
val schema = org.apache.spark.sql.types.StructType(
arrayOf(
org.apache.spark.sql.types.StructField(
"term",
org.apache.spark.sql.types.DataTypes.StringType,
false,
org.apache.spark.sql.types.Metadata.empty()
),
org.apache.spark.sql.types.StructField(
"coefficient",
org.apache.spark.sql.types.DataTypes.DoubleType,
false,
org.apache.spark.sql.types.Metadata.empty()
)
)
)
val modelDf = spark.createDataFrame(rows, schema)
val modelParquetDir = parquetDir.resolve("model")
modelDf.write().mode("overwrite").parquet(modelParquetDir.toString())
println("Saved model summary Parquet to: $modelParquetDir")
// 8) Read this model Parquet with Kotlin DataFrame and print
val modelParquetFiles = listParquetFilesIfAny(modelParquetDir)
val modelKdf = DataFrame.readParquet(*modelParquetFiles)
println("Model summary Kotlin DataFrame (head):")
modelKdf.head().print()
// 9) Save the entire PipelineModel using the standard Spark ML mechanism
// The model is already fitted above; just save it.
val pipelinePath = parquetDir.resolve("pipeline_model_spark").toString()
fullPipelineModel.write().overwrite().save(pipelinePath)
println("Step 8: Saved PipelineModel to: $pipelinePath")
// 10) Inspect pipeline internals using Kotlin DataFrame from concrete paths (no directory walking)
// IMPORTANT (why this is not the most convenient way to export/import):
// - The ML writer saves a directory with mixed JSON (metadata) and Parquet (model data).
// - Internal folder names for stages include indexes and algorithm/uids (e.g., "0_VectorAssembler_xxx", "1_LinearRegressionModel_xxx"),
// which are not guaranteed to be stable across Spark versions.
// - Reading internals are suitable only for inspection/exploration. For reuse, prefer PipelineModel.load();
// for portable/tabular exchange, write an explicit summary DataFrame.
//
// Concrete layout this demo relies on:
// $pipelinePath/metadata/
// $pipelinePath/stages/0_*/metadata/, $pipelinePath/stages/0_*/data/
// $pipelinePath/stages/1_*/metadata/, $pipelinePath/stages/1_*/data/
val pipelineRoot = Paths.get(pipelinePath)
val stagesDir = pipelineRoot.resolve("stages")
val stage0Dir = findStageDir(stagesDir, "0_")
val stage1Dir = findStageDir(stagesDir, "1_")
// Accumulate Kotlin DataFrames found in step 9 so we can optionally join only existing ones in step 10
val metaKdfs = mutableListOf<DataFrame<*>>()
val stageDataKdfs = mutableListOf<DataFrame<*>>()
// 10.1) Root metadata (JSON) -> read each file one-by-one
val rootMetaDir = pipelineRoot.resolve("metadata")
val rootMetaFiles = listTextOrJsonFiles(rootMetaDir)
for (file in rootMetaFiles) {
val df = DataFrame.readJson(file.toFile())
println("Step 9: Pipeline root metadata JSON (${file.fileName}) head:")
df.head().print()
metaKdfs += df
}
// 10.2) Stage 0 (VectorAssembler) metadata/data
val stage0MetaDir = stage0Dir.resolve("metadata")
for (file in listTextOrJsonFiles(stage0MetaDir)) {
val df = DataFrame.readJson(file.toFile())
println("Step 9: Stage 0 metadata (${file.fileName}) head:")
df.head().print()
metaKdfs += df
}
val stage0DataDir = stage0Dir.resolve("data")
val stage0ParquetFiles = listParquetFilesIfAny(stage0DataDir)
if (stage0ParquetFiles.isNotEmpty()) {
val stage0Kdf = DataFrame.readParquet(*stage0ParquetFiles)
println("Step 9: Stage 0 data (Parquet) head:")
stage0Kdf.head().print()
stageDataKdfs += stage0Kdf
} else {
println("Step 9: Stage 0 data directory is missing or has no .parquet files, skipping.")
}
// 10.3) Stage 1 (LinearRegressionModel) metadata/data
val stage1MetaDir = stage1Dir.resolve("metadata")
for (file in listTextOrJsonFiles(stage1MetaDir)) {
val df = DataFrame.readJson(file.toFile())
println("Step 9: Stage 1 metadata (${file.fileName}) head:")
df.head().print()
metaKdfs += df
}
val stage1DataDir = stage1Dir.resolve("data")
val stage1ParquetFiles = listParquetFilesIfAny(stage1DataDir)
if (stage1ParquetFiles.isNotEmpty()) {
val stage1Kdf = DataFrame.readParquet(*stage1ParquetFiles)
println("Step 9: Stage 1 data (Parquet) head:")
stage1Kdf.head().print()
stageDataKdfs += stage1Kdf
} else {
println("Step 9: Stage 1 data directory is missing or has no .parquet files, skipping.")
}
// 11) Join only existing Kotlin DataFrames and build a plot from the linear model
// 11.1) Unified metadata from any JSON files we successfully parsed above
val unifiedMeta = if (metaKdfs.isNotEmpty()) metaKdfs.concat() else null
if (unifiedMeta != null) {
println("Step 10: Unified metadata head:")
unifiedMeta.head().print()
} else {
println("Step 10: No metadata DataFrames were found to unify.")
}
// 11.2) Unified model data: in this demo we already have a single modelKdf (coefficients + intercept)
val unifiedModelDf = modelKdf
println("Step 10: Unified model data (coefficients) head:")
unifiedModelDf.head().print()
// 11.3) Build a linear plot: dataset points and model line y = a*x + b for the chosen feature
// Choose feature 'median_income' vs. label 'median_house_value'
val pointsDf = kdf.dropNA("median_income", "median_house_value")
// Extract slope (coefficient for 'median_income') and intercept from modelKdf
val terms = unifiedModelDf.getColumn("term").cast<String>().toList()
val coefs = unifiedModelDf.getColumn("coefficient").cast<Double>().toList()
val slopeIdx = terms.indexOf("median_income")
val interceptIdx = terms.indexOf("intercept")
val slopeValue = if (slopeIdx >= 0) coefs[slopeIdx] else 0.0
val interceptValue = if (interceptIdx >= 0) coefs[interceptIdx] else 0.0
println("slope: $slopeValue intercept: $interceptValue")
// Prepare DF for plotting: add constant columns for abLine mapping
val dfForPlot = pointsDf
.add("slope_const") { slopeValue }
.add("intercept_const") { interceptValue }
// 11.4) Create Kandy plot using abLine (slope/intercept) and export to a .jpg file
val plot = dfForPlot.plot {
points {
x("median_income")
y("median_house_value")
// Visual hint: small circles
color = Color.LIGHT_BLUE
size = 2.0
}
abLine {
// Use linear model parameters: y = slope * x + intercept
slope.constant(slopeValue)
intercept.constant(interceptValue)
color = Color.RED
width = 2.0
}
}
val targetDir = Paths.get("").normalize()
Files.createDirectories(targetDir)
val plotPath = targetDir.resolve("linear_model_plot.jpg").toAbsolutePath().toString()
plot.save(plotPath)
println("Step 10: Saved plot to: $plotPath")
spark.stop()
}
/**
* Returns .parquet files if the directory exists and contains any; otherwise returns an empty array.
* Safe to use for Spark ML stage "data" subfolders that may be absent.
*/
private fun listParquetFilesIfAny(dir: Path): Array<Path> {
if (dir.notExists() || !dir.isDirectory()) return emptyArray()
val files: List<Path> = Files.list(dir).use { stream ->
stream
.filter { Files.isRegularFile(it) && it.fileName.toString().endsWith(".parquet", ignoreCase = true) }
.collect(Collectors.toList())
}
return files.toTypedArray()
}
/**
* Finds a stage directory inside 'stagesDir' by prefix (e.g., "0_", "1_").
* No extra checks: assumes such a directory exists.
*/
private fun findStageDir(stagesDir: Path, prefix: String): Path {
return Files.list(stagesDir).use { s ->
s.filter { Files.isDirectory(it) && it.fileName.toString().startsWith(prefix) }
.findFirst().get()
}
}
private fun listTextOrJsonFiles(dir: Path): List<Path> {
return Files.list(dir).use { s ->
s.filter {
Files.isRegularFile(it) &&
(it.fileName.toString().endsWith(".json", ignoreCase = true) ||
it.fileName.toString().endsWith(".txt", ignoreCase = true))
}.collect(Collectors.toList())
}
}
File diff suppressed because it is too large Load Diff
Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

@@ -0,0 +1,52 @@
import org.jetbrains.kotlin.gradle.dsl.JvmTarget
plugins {
application
kotlin("jvm")
id("org.jetbrains.kotlinx.dataframe")
// only mandatory if `kotlin.dataframe.add.ksp=false` in gradle.properties
id("com.google.devtools.ksp")
}
repositories {
mavenCentral()
mavenLocal() // in case of local dataframe development
}
application.mainClass = "org.jetbrains.kotlinx.dataframe.examples.titanic.ml.TitanicKt"
dependencies {
// implementation("org.jetbrains.kotlinx:dataframe:X.Y.Z")
implementation(project(":"))
// note: needs to target java 11 for these dependencies
implementation("org.jetbrains.kotlinx:kotlin-deeplearning-api:0.5.2")
implementation("org.jetbrains.kotlinx:kotlin-deeplearning-impl:0.5.2")
implementation("org.jetbrains.kotlinx:kotlin-deeplearning-tensorflow:0.5.2")
implementation("org.jetbrains.kotlinx:kotlin-deeplearning-dataset:0.5.2")
}
dataframes {
schema {
data = "src/main/resources/titanic.csv"
name = "org.jetbrains.kotlinx.dataframe.examples.titanic.ml.Passenger"
csvOptions {
delimiter = ';'
}
}
}
kotlin {
compilerOptions {
jvmTarget = JvmTarget.JVM_11
freeCompilerArgs.add("-Xjdk-release=11")
}
}
tasks.withType<JavaCompile> {
sourceCompatibility = JavaVersion.VERSION_11.toString()
targetCompatibility = JavaVersion.VERSION_11.toString()
options.release.set(11)
}
@@ -0,0 +1,95 @@
package org.jetbrains.kotlinx.dataframe.examples.titanic.ml
import org.jetbrains.kotlinx.dataframe.ColumnSelector
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.api.*
import org.jetbrains.kotlinx.dl.api.core.Sequential
import org.jetbrains.kotlinx.dl.api.core.activation.Activations
import org.jetbrains.kotlinx.dl.api.core.initializer.HeNormal
import org.jetbrains.kotlinx.dl.api.core.initializer.Zeros
import org.jetbrains.kotlinx.dl.api.core.layer.core.Dense
import org.jetbrains.kotlinx.dl.api.core.layer.core.Input
import org.jetbrains.kotlinx.dl.api.core.loss.Losses
import org.jetbrains.kotlinx.dl.api.core.metric.Metrics
import org.jetbrains.kotlinx.dl.api.core.optimizer.Adam
import org.jetbrains.kotlinx.dl.dataset.OnHeapDataset
import java.util.Locale
private const val SEED = 12L
private const val TEST_BATCH_SIZE = 100
private const val EPOCHS = 50
private const val TRAINING_BATCH_SIZE = 50
private val model = Sequential.of(
Input(9),
Dense(50, Activations.Relu, kernelInitializer = HeNormal(SEED), biasInitializer = Zeros()),
Dense(50, Activations.Relu, kernelInitializer = HeNormal(SEED), biasInitializer = Zeros()),
Dense(2, Activations.Linear, kernelInitializer = HeNormal(SEED), biasInitializer = Zeros())
)
fun main() {
// Set Locale for correct number parsing
Locale.setDefault(Locale.FRANCE)
val df = Passenger.readCsv()
// Calculating imputing values
val (train, test) = df
// imputing
.fillNulls { sibsp and parch and age and fare }.perCol { it.mean() }
.fillNulls { sex }.with { "female" }
// one hot encoding
.pivotMatches { pclass and sex }
// feature extraction
.select { survived and pclass and sibsp and parch and age and fare and sex }
.shuffle()
.toTrainTest(0.7) { survived }
model.use {
it.compile(
optimizer = Adam(),
loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
metric = Metrics.ACCURACY
)
it.summary()
it.fit(dataset = train, epochs = EPOCHS, batchSize = TRAINING_BATCH_SIZE)
val accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]
println("Accuracy: $accuracy")
}
}
fun <T> DataFrame<T>.toTrainTest(
trainRatio: Double,
yColumn: ColumnSelector<T, Number>,
): Pair<OnHeapDataset, OnHeapDataset> =
toOnHeapDataset(yColumn)
.split(trainRatio)
private fun <T> DataFrame<T>.toOnHeapDataset(yColumn: ColumnSelector<T, Number>): OnHeapDataset =
OnHeapDataset.create(
dataframe = this,
yColumn = yColumn,
)
private fun <T> OnHeapDataset.Companion.create(
dataframe: DataFrame<T>,
yColumn: ColumnSelector<T, Number>,
): OnHeapDataset {
fun extractX(): Array<FloatArray> =
dataframe.remove(yColumn)
.convert { colsAtAnyDepth().filter { !it.isColumnGroup() } }.toFloat()
.merge { colsAtAnyDepth().colsOf<Float>() }.by { it.toFloatArray() }.into("X")
.getColumn("X").cast<FloatArray>().toTypedArray()
fun extractY(): FloatArray = dataframe.get(yColumn).toFloatArray()
return create(
::extractX,
::extractY,
)
}
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,43 @@
import org.jetbrains.kotlin.gradle.dsl.JvmTarget
plugins {
application
kotlin("jvm")
// uses the 'old' Gradle plugin instead of the compiler plugin for now
id("org.jetbrains.kotlinx.dataframe")
// only mandatory if `kotlin.dataframe.add.ksp=false` in gradle.properties
id("com.google.devtools.ksp")
}
repositories {
mavenLocal() // in case of local dataframe development
mavenCentral()
}
dependencies {
// implementation("org.jetbrains.kotlinx:dataframe:X.Y.Z")
implementation(project(":"))
// exposed + sqlite database support
implementation(libs.sqlite)
implementation(libs.exposed.core)
implementation(libs.exposed.kotlin.datetime)
implementation(libs.exposed.jdbc)
implementation(libs.exposed.json)
implementation(libs.exposed.money)
}
kotlin {
compilerOptions {
jvmTarget = JvmTarget.JVM_1_8
freeCompilerArgs.add("-Xjdk-release=8")
}
}
tasks.withType<JavaCompile> {
sourceCompatibility = JavaVersion.VERSION_1_8.toString()
targetCompatibility = JavaVersion.VERSION_1_8.toString()
options.release.set(8)
}
@@ -0,0 +1,107 @@
package org.jetbrains.kotlinx.dataframe.examples.exposed
import org.jetbrains.exposed.v1.core.BiCompositeColumn
import org.jetbrains.exposed.v1.core.Column
import org.jetbrains.exposed.v1.core.Expression
import org.jetbrains.exposed.v1.core.ExpressionAlias
import org.jetbrains.exposed.v1.core.ResultRow
import org.jetbrains.exposed.v1.core.Table
import org.jetbrains.exposed.v1.jdbc.Query
import org.jetbrains.kotlinx.dataframe.AnyFrame
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
import org.jetbrains.kotlinx.dataframe.api.convertTo
import org.jetbrains.kotlinx.dataframe.api.toDataFrame
import org.jetbrains.kotlinx.dataframe.codeGen.NameNormalizer
import org.jetbrains.kotlinx.dataframe.impl.schema.DataFrameSchemaImpl
import org.jetbrains.kotlinx.dataframe.schema.ColumnSchema
import org.jetbrains.kotlinx.dataframe.schema.DataFrameSchema
import kotlin.reflect.KProperty1
import kotlin.reflect.full.isSubtypeOf
import kotlin.reflect.full.memberProperties
import kotlin.reflect.typeOf
/**
* Retrieves all columns of any [Iterable][Iterable]`<`[ResultRow][ResultRow]`>`, like [Query][Query],
* from Exposed row by row and converts the resulting [Map] into a [DataFrame], cast to type [T].
*
* In notebooks, the untyped version works just as well due to runtime inference :)
*/
inline fun <reified T : Any> Iterable<ResultRow>.convertToDataFrame(): DataFrame<T> =
convertToDataFrame().convertTo<T>()
/**
* Retrieves all columns of an [Iterable][Iterable]`<`[ResultRow][ResultRow]`>` from Exposed, like [Query][Query],
* row by row and converts the resulting [Map] of lists into a [DataFrame] by calling
* [Map.toDataFrame].
*/
@JvmName("convertToAnyFrame")
fun Iterable<ResultRow>.convertToDataFrame(): AnyFrame {
val map = mutableMapOf<String, MutableList<Any?>>()
for (row in this) {
for (expression in row.fieldIndex.keys) {
map.getOrPut(expression.readableName) {
mutableListOf()
} += row[expression]
}
}
return map.toDataFrame()
}
/**
* Retrieves a simple column name from [this] [Expression].
*
* Might need to be expanded with multiple types of [Expression].
*/
val Expression<*>.readableName: String
get() = when (this) {
is Column<*> -> name
is ExpressionAlias<*> -> alias
is BiCompositeColumn<*, *, *> -> getRealColumns().joinToString("_") { it.readableName }
else -> toString()
}
/**
* Creates a [DataFrameSchema] from the declared [Table] instance.
*
* This is not needed for conversion, but it can be useful to create a DataFrame [@DataSchema][DataSchema] instance.
*
* @param columnNameToAccessor Optional [MutableMap] which will be filled with entries mapping
* the SQL column name to the accessor name from the [Table].
* This can be used to define a [NameNormalizer] later.
* @see toDataFrameSchemaWithNameNormalizer
*/
@Suppress("UNCHECKED_CAST")
fun Table.toDataFrameSchema(columnNameToAccessor: MutableMap<String, String> = mutableMapOf()): DataFrameSchema {
// we use reflection to go over all `Column<*>` properties in the Table object
val columns = this::class.memberProperties
.filter { it.returnType.isSubtypeOf(typeOf<Column<*>>()) }
.associate { prop ->
prop as KProperty1<Table, Column<*>>
// retrieve the SQL column name
val columnName = prop.get(this).name
// store the SQL column name together with the accessor name in the map
columnNameToAccessor[columnName] = prop.name
// get the column type from `val a: Column<Type>`
val type = prop.returnType.arguments.first().type!!
// and we add the name and column shema type to the `columns` map :)
columnName to ColumnSchema.Value(type)
}
return DataFrameSchemaImpl(columns)
}
/**
* Creates a [DataFrameSchema] from the declared [Table] instance with a [NameNormalizer] to
* convert the SQL column names to the corresponding Kotlin property names.
*
* This is not needed for conversion, but it can be useful to create a DataFrame [@DataSchema][DataSchema] instance.
*
* @see toDataFrameSchema
*/
fun Table.toDataFrameSchemaWithNameNormalizer(): Pair<DataFrameSchema, NameNormalizer> {
val columnNameToAccessor = mutableMapOf<String, String>()
return Pair(toDataFrameSchema(), NameNormalizer { columnNameToAccessor[it] ?: it })
}
@@ -0,0 +1,96 @@
package org.jetbrains.kotlinx.dataframe.examples.exposed
import org.jetbrains.exposed.v1.core.Column
import org.jetbrains.exposed.v1.core.SortOrder
import org.jetbrains.exposed.v1.core.count
import org.jetbrains.exposed.v1.jdbc.Database
import org.jetbrains.exposed.v1.jdbc.SchemaUtils
import org.jetbrains.exposed.v1.jdbc.batchInsert
import org.jetbrains.exposed.v1.jdbc.deleteAll
import org.jetbrains.exposed.v1.jdbc.select
import org.jetbrains.exposed.v1.jdbc.selectAll
import org.jetbrains.exposed.v1.jdbc.transactions.transaction
import org.jetbrains.kotlinx.dataframe.api.asSequence
import org.jetbrains.kotlinx.dataframe.api.count
import org.jetbrains.kotlinx.dataframe.api.describe
import org.jetbrains.kotlinx.dataframe.api.groupBy
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.sortByDesc
import org.jetbrains.kotlinx.dataframe.size
import java.io.File
/**
* Describes a simple bridge between [Exposed](https://www.jetbrains.com/exposed/) and DataFrame!
*/
fun main() {
// defining where to find our SQLite database for Exposed
val resourceDb = "chinook.db"
val dbPath = File(object {}.javaClass.classLoader.getResource(resourceDb)!!.toURI()).absolutePath
val db = Database.connect(url = "jdbc:sqlite:$dbPath", driver = "org.sqlite.JDBC")
// let's read the database!
val df = transaction(db) {
// addLogger(StdOutSqlLogger) // enable if you want to see verbose logs
// tables in Exposed need to be defined, see tables.kt
SchemaUtils.create(Customers, Artists, Albums)
println()
// In Exposed, we can write queries like this.
// Here, we count per country how many customers there are and print the results:
Customers
.select(Customers.country, Customers.customerId.count())
.groupBy(Customers.country)
.orderBy(Customers.customerId.count() to SortOrder.DESC)
.forEach {
println("${it[Customers.country]}: ${it[Customers.customerId.count()]} customers")
}
println()
// Perform the specific query you want to read into the DataFrame.
// Note: DataFrames are in-memory structures, so don't make it too large if you don't have the RAM ;)
val query = Customers.selectAll() // .where { Customers.company.isNotNull() }
println()
// read and convert the query to a typed DataFrame
// see compatibilityLayer.kt for how we created convertToDataFrame<>()
// and see tables.kt for how we created DfCustomers!
query.convertToDataFrame<DfCustomers>()
}
println(df.size())
// now we have a DataFrame, we can perform DataFrame operations,
// like doing the same operation as we did in Exposed above
df.groupBy { country }.count()
.sortByDesc { "count"<Int>() }
.print(columnTypes = true, borders = true)
// or just general statistics
df.describe()
.print(columnTypes = true, borders = true)
// or make plots using Kandy! It's all up to you
// writing a DataFrame back into an SQL database with Exposed can also be done easily!
transaction(db) {
// addLogger(StdOutSqlLogger) // enable if you want to see verbose logs
// first delete the original contents
Customers.deleteAll()
println()
// batch-insert our dataframe back into the SQL database as a sequence of rows
Customers.batchInsert(df.asSequence()) { dfRow ->
// we simply go over each value in the row and put it in the right place in the Exposed statement
for (column in Customers.columns) {
@Suppress("UNCHECKED_CAST")
this[column as Column<Any?>] = dfRow[column.name]
}
}
}
}
@@ -0,0 +1,97 @@
package org.jetbrains.kotlinx.dataframe.examples.exposed
import org.jetbrains.exposed.v1.core.Column
import org.jetbrains.exposed.v1.core.Table
import org.jetbrains.kotlinx.dataframe.annotations.ColumnName
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
import org.jetbrains.kotlinx.dataframe.api.generateDataClasses
import org.jetbrains.kotlinx.dataframe.api.print
object Albums : Table() {
val albumId: Column<Int> = integer("AlbumId").autoIncrement()
val title: Column<String> = varchar("Title", 160)
val artistId: Column<Int> = integer("ArtistId")
override val primaryKey = PrimaryKey(albumId)
}
object Artists : Table() {
val artistId: Column<Int> = integer("ArtistId").autoIncrement()
val name: Column<String> = varchar("Name", 120)
override val primaryKey = PrimaryKey(artistId)
}
object Customers : Table() {
val customerId: Column<Int> = integer("CustomerId").autoIncrement()
val firstName: Column<String> = varchar("FirstName", 40)
val lastName: Column<String> = varchar("LastName", 20)
val company: Column<String?> = varchar("Company", 80).nullable()
val address: Column<String?> = varchar("Address", 70).nullable()
val city: Column<String?> = varchar("City", 40).nullable()
val state: Column<String?> = varchar("State", 40).nullable()
val country: Column<String?> = varchar("Country", 40).nullable()
val postalCode: Column<String?> = varchar("PostalCode", 10).nullable()
val phone: Column<String?> = varchar("Phone", 24).nullable()
val fax: Column<String?> = varchar("Fax", 24).nullable()
val email: Column<String> = varchar("Email", 60)
val supportRepId: Column<Int?> = integer("SupportRepId").nullable()
override val primaryKey = PrimaryKey(customerId)
}
/**
* Exposed requires you to provide [Table] instances to
* provide type-safe access to your columns and data.
*
* While DataFrame can infer types at runtime, which is enough for Kotlin Notebook,
* to get type safe access at compile time, we need to define a [@DataSchema][DataSchema].
*
* This is what we created the [toDataFrameSchema] function for!
*/
fun main() {
val (schema, nameNormalizer) = Customers.toDataFrameSchemaWithNameNormalizer()
// checking whether the schema is converted correctly.
// schema.print()
// printing a @DataSchema data class to copy-paste into the code.
// we use a NameNormalizer to let DataFrame generate the same accessors as in the Table
// while keeping the correct column names
schema.generateDataClasses(
markerName = "DfCustomers",
nameNormalizer = nameNormalizer,
).print()
}
// created by Customers.toDataFrameSchema()
// The same can be done for the other tables
@DataSchema
data class DfCustomers(
@ColumnName("Address")
val address: String?,
@ColumnName("City")
val city: String?,
@ColumnName("Company")
val company: String?,
@ColumnName("Country")
val country: String?,
@ColumnName("CustomerId")
val customerId: Int,
@ColumnName("Email")
val email: String,
@ColumnName("Fax")
val fax: String?,
@ColumnName("FirstName")
val firstName: String,
@ColumnName("LastName")
val lastName: String,
@ColumnName("Phone")
val phone: String?,
@ColumnName("PostalCode")
val postalCode: String?,
@ColumnName("State")
val state: String?,
@ColumnName("SupportRepId")
val supportRepId: Int?,
)
@@ -0,0 +1,43 @@
import org.jetbrains.kotlin.gradle.dsl.JvmTarget
plugins {
application
kotlin("jvm")
// uses the 'old' Gradle plugin instead of the compiler plugin for now
id("org.jetbrains.kotlinx.dataframe")
// only mandatory if `kotlin.dataframe.add.ksp=false` in gradle.properties
id("com.google.devtools.ksp")
}
repositories {
mavenLocal() // in case of local dataframe development
mavenCentral()
}
dependencies {
// implementation("org.jetbrains.kotlinx:dataframe:X.Y.Z")
implementation(project(":"))
// Hibernate + H2 + HikariCP (for Hibernate example)
implementation(libs.hibernate.core)
implementation(libs.hibernate.hikaricp)
implementation(libs.hikaricp)
implementation(libs.h2db)
implementation(libs.sl4jsimple)
}
kotlin {
compilerOptions {
jvmTarget = JvmTarget.JVM_11
freeCompilerArgs.add("-Xjdk-release=11")
}
}
tasks.withType<JavaCompile> {
sourceCompatibility = JavaVersion.VERSION_11.toString()
targetCompatibility = JavaVersion.VERSION_11.toString()
options.release.set(11)
}
@@ -0,0 +1,100 @@
package org.jetbrains.kotlinx.dataframe.examples.hibernate
import jakarta.persistence.Column
import jakarta.persistence.Entity
import jakarta.persistence.GeneratedValue
import jakarta.persistence.GenerationType
import jakarta.persistence.Id
import jakarta.persistence.Table
import org.jetbrains.kotlinx.dataframe.annotations.ColumnName
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
@Entity
@Table(name = "Albums")
class AlbumsEntity(
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "AlbumId")
var albumId: Int? = null,
@Column(name = "Title", length = 160, nullable = false)
var title: String = "",
@Column(name = "ArtistId", nullable = false)
var artistId: Int = 0,
)
@Entity
@Table(name = "Artists")
class ArtistsEntity(
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "ArtistId")
var artistId: Int? = null,
@Column(name = "Name", length = 120, nullable = false)
var name: String = "",
)
@Entity
@Table(name = "Customers")
class CustomersEntity(
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "CustomerId")
var customerId: Int? = null,
@Column(name = "FirstName", length = 40, nullable = false)
var firstName: String = "",
@Column(name = "LastName", length = 20, nullable = false)
var lastName: String = "",
@Column(name = "Company", length = 80)
var company: String? = null,
@Column(name = "Address", length = 70)
var address: String? = null,
@Column(name = "City", length = 40)
var city: String? = null,
@Column(name = "State", length = 40)
var state: String? = null,
@Column(name = "Country", length = 40)
var country: String? = null,
@Column(name = "PostalCode", length = 10)
var postalCode: String? = null,
@Column(name = "Phone", length = 24)
var phone: String? = null,
@Column(name = "Fax", length = 24)
var fax: String? = null,
@Column(name = "Email", length = 60, nullable = false)
var email: String = "",
@Column(name = "SupportRepId")
var supportRepId: Int? = null,
)
// DataFrame schema to get typed accessors similar to Exposed example
@DataSchema
data class DfCustomers(
@ColumnName("Address") val address: String?,
@ColumnName("City") val city: String?,
@ColumnName("Company") val company: String?,
@ColumnName("Country") val country: String?,
@ColumnName("CustomerId") val customerId: Int,
@ColumnName("Email") val email: String,
@ColumnName("Fax") val fax: String?,
@ColumnName("FirstName") val firstName: String,
@ColumnName("LastName") val lastName: String,
@ColumnName("Phone") val phone: String?,
@ColumnName("PostalCode") val postalCode: String?,
@ColumnName("State") val state: String?,
@ColumnName("SupportRepId") val supportRepId: Int?,
)
@@ -0,0 +1,251 @@
package org.jetbrains.kotlinx.dataframe.examples.hibernate
import jakarta.persistence.Tuple
import jakarta.persistence.criteria.CriteriaBuilder
import jakarta.persistence.criteria.CriteriaDelete
import jakarta.persistence.criteria.CriteriaQuery
import jakarta.persistence.criteria.Expression
import jakarta.persistence.criteria.Root
import org.hibernate.FlushMode
import org.hibernate.SessionFactory
import org.hibernate.cfg.Configuration
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.DataRow
import org.jetbrains.kotlinx.dataframe.api.asSequence
import org.jetbrains.kotlinx.dataframe.api.count
import org.jetbrains.kotlinx.dataframe.api.describe
import org.jetbrains.kotlinx.dataframe.api.groupBy
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.sortByDesc
import org.jetbrains.kotlinx.dataframe.api.toDataFrame
import org.jetbrains.kotlinx.dataframe.size
/**
* Example showing Kotlin DataFrame with Hibernate ORM + H2 in-memory DB.
* Mirrors logic from the Exposed example: load data, convert to DataFrame, group/describe, write back.
*/
fun main() {
val sessionFactory: SessionFactory = buildSessionFactory()
sessionFactory.insertSampleData()
val df = sessionFactory.loadCustomersAsDataFrame()
// Pure Hibernate + Criteria API approach for counting customers per country
println("=== Hibernate + Criteria API Approach ===")
sessionFactory.countCustomersPerCountryWithHibernate()
println("\n=== DataFrame Approach ===")
df.analyzeAndPrintResults()
sessionFactory.replaceCustomersFromDataFrame(df)
sessionFactory.close()
}
private fun SessionFactory.insertSampleData() {
withTransaction { session ->
// a few artists and albums (minimal, not used further; just demo schema)
val artist1 = ArtistsEntity(name = "AC/DC")
val artist2 = ArtistsEntity(name = "Queen")
session.persist(artist1)
session.persist(artist2)
session.flush()
session.persist(AlbumsEntity(title = "High Voltage", artistId = artist1.artistId!!))
session.persist(AlbumsEntity(title = "Back in Black", artistId = artist1.artistId!!))
session.persist(AlbumsEntity(title = "A Night at the Opera", artistId = artist2.artistId!!))
// customers we'll analyze using DataFrame
session.persist(
CustomersEntity(
firstName = "John",
lastName = "Doe",
email = "john.doe@example.com",
country = "USA",
),
)
session.persist(
CustomersEntity(
firstName = "Jane",
lastName = "Smith",
email = "jane.smith@example.com",
country = "USA",
),
)
session.persist(
CustomersEntity(
firstName = "Alice",
lastName = "Wang",
email = "alice.wang@example.com",
country = "Canada",
),
)
}
}
private fun SessionFactory.loadCustomersAsDataFrame(): DataFrame<DfCustomers> {
return withReadOnlyTransaction { session ->
val criteriaBuilder: CriteriaBuilder = session.criteriaBuilder
val criteriaQuery: CriteriaQuery<CustomersEntity> = criteriaBuilder.createQuery(CustomersEntity::class.java)
val root: Root<CustomersEntity> = criteriaQuery.from(CustomersEntity::class.java)
criteriaQuery.select(root)
session.createQuery(criteriaQuery)
.resultList
.map { c ->
DfCustomers(
address = c.address,
city = c.city,
company = c.company,
country = c.country,
customerId = c.customerId ?: -1,
email = c.email,
fax = c.fax,
firstName = c.firstName,
lastName = c.lastName,
phone = c.phone,
postalCode = c.postalCode,
state = c.state,
supportRepId = c.supportRepId,
)
}
.toDataFrame()
}
}
/** DTO used for aggregation projection. */
private data class CountryCountDto(
val country: String,
val customerCount: Long,
)
/**
* **Hibernate + Criteria API:**
* - ✅ Database-level aggregation (efficient)
* - ✅ Type-safe queries
* - ❌ Verbose syntax
* - ❌ Limited to SQL-like operations
*/
private fun SessionFactory.countCustomersPerCountryWithHibernate() {
withReadOnlyTransaction { session ->
val cb = session.criteriaBuilder
val cq: CriteriaQuery<CountryCountDto> = cb.createQuery(CountryCountDto::class.java)
val root: Root<CustomersEntity> = cq.from(CustomersEntity::class.java)
val countryPath = root.get<String>("country")
val idPath = root.get<Long>("customerId")
val countExpr = cb.count(idPath)
cq.select(
cb.construct(
CountryCountDto::class.java,
countryPath, // country
countExpr, // customerCount
),
)
cq.groupBy(countryPath)
cq.orderBy(cb.desc(countExpr))
val results = session.createQuery(cq).resultList
results.forEach { dto ->
println("${dto.country}: ${dto.customerCount} customers")
}
}
}
/**
* **DataFrame approach: **
* - ✅ Rich analytical operations
* - ✅ Fluent, readable API
* - ✅ Flexible data transformations
* - ❌ In-memory processing (less efficient for large datasets)
*/
private fun DataFrame<DfCustomers>.analyzeAndPrintResults() {
println(size())
// same operation as Exposed example: customers per country
groupBy { country }.count()
.sortByDesc { "count"<Int>() }
.print(columnTypes = true, borders = true)
// general statistics
describe()
.print(columnTypes = true, borders = true)
}
private fun SessionFactory.replaceCustomersFromDataFrame(df: DataFrame<DfCustomers>) {
withTransaction { session ->
val criteriaBuilder: CriteriaBuilder = session.criteriaBuilder
val criteriaDelete: CriteriaDelete<CustomersEntity> =
criteriaBuilder.createCriteriaDelete(CustomersEntity::class.java)
criteriaDelete.from(CustomersEntity::class.java)
session.createMutationQuery(criteriaDelete).executeUpdate()
}
withTransaction { session ->
df.asSequence().forEach { row ->
session.persist(row.toCustomersEntity())
}
}
}
private fun DataRow<DfCustomers>.toCustomersEntity(): CustomersEntity {
return CustomersEntity(
customerId = null, // let DB generate
firstName = this.firstName,
lastName = this.lastName,
company = this.company,
address = this.address,
city = this.city,
state = this.state,
country = this.country,
postalCode = this.postalCode,
phone = this.phone,
fax = this.fax,
email = this.email,
supportRepId = this.supportRepId,
)
}
private inline fun <T> SessionFactory.withSession(block: (session: org.hibernate.Session) -> T): T {
return openSession().use(block)
}
private inline fun SessionFactory.withTransaction(block: (session: org.hibernate.Session) -> Unit) {
withSession { session ->
session.beginTransaction()
try {
block(session)
session.transaction.commit()
} catch (e: Exception) {
session.transaction.rollback()
throw e
}
}
}
/** Read-only transaction helper for SELECT queries to minimize overhead. */
private inline fun <T> SessionFactory.withReadOnlyTransaction(block: (session: org.hibernate.Session) -> T): T {
return withSession { session ->
session.beginTransaction()
// Minimize overhead for read operations
session.isDefaultReadOnly = true
session.hibernateFlushMode = FlushMode.MANUAL
try {
val result = block(session)
session.transaction.commit()
result
} catch (e: Exception) {
session.transaction.rollback()
throw e
}
}
}
private fun buildSessionFactory(): SessionFactory {
// Load configuration from resources/hibernate/hibernate.cfg.xml
return Configuration().configure("hibernate/hibernate.cfg.xml").buildSessionFactory()
}
@@ -0,0 +1,32 @@
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE hibernate-configuration PUBLIC
"-//Hibernate/Hibernate Configuration DTD 5.3//EN"
"http://hibernate.org/dtd/hibernate-configuration-5.3.dtd">
<hibernate-configuration>
<session-factory>
<!-- H2 in-memory -->
<property name="hibernate.connection.driver_class">org.h2.Driver</property>
<property name="hibernate.connection.url">jdbc:h2:mem:testdb;DB_CLOSE_DELAY=-1</property>
<property name="hibernate.connection.username">sa</property>
<property name="hibernate.connection.password"></property>
<!-- Connection pool: HikariCP via Hibernate integration -->
<property name="hibernate.connection.provider_class">org.hibernate.hikaricp.internal.HikariCPConnectionProvider</property>
<property name="hibernate.hikari.maximumPoolSize">5</property>
<!-- Hibernate Dialect -->
<property name="hibernate.dialect">org.hibernate.dialect.H2Dialect</property>
<!-- Automatic schema generation -->
<property name="hibernate.hbm2ddl.auto">create-drop</property>
<!-- Logging -->
<property name="hibernate.show_sql">true</property>
<property name="hibernate.format_sql">true</property>
<!-- Mappings -->
<mapping class="org.jetbrains.kotlinx.dataframe.examples.hibernate.CustomersEntity"/>
<mapping class="org.jetbrains.kotlinx.dataframe.examples.hibernate.ArtistsEntity"/>
<mapping class="org.jetbrains.kotlinx.dataframe.examples.hibernate.AlbumsEntity"/>
</session-factory>
</hibernate-configuration>
@@ -0,0 +1,39 @@
import org.jetbrains.kotlin.gradle.dsl.JvmTarget
plugins {
application
kotlin("jvm")
// uses the 'old' Gradle plugin instead of the compiler plugin for now
id("org.jetbrains.kotlinx.dataframe")
// only mandatory if `kotlin.dataframe.add.ksp=false` in gradle.properties
id("com.google.devtools.ksp")
}
repositories {
mavenLocal() // in case of local dataframe development
mavenCentral()
}
dependencies {
// implementation("org.jetbrains.kotlinx:dataframe:X.Y.Z")
implementation(project(":"))
// multik support
implementation(libs.multik.core)
implementation(libs.multik.default)
}
kotlin {
compilerOptions {
jvmTarget = JvmTarget.JVM_1_8
freeCompilerArgs.add("-Xjdk-release=8")
}
}
tasks.withType<JavaCompile> {
sourceCompatibility = JavaVersion.VERSION_1_8.toString()
targetCompatibility = JavaVersion.VERSION_1_8.toString()
options.release.set(8)
}
@@ -0,0 +1,374 @@
@file:OptIn(ExperimentalTypeInference::class)
package org.jetbrains.kotlinx.dataframe.examples.multik
import org.jetbrains.kotlinx.dataframe.AnyFrame
import org.jetbrains.kotlinx.dataframe.ColumnSelector
import org.jetbrains.kotlinx.dataframe.ColumnsSelector
import org.jetbrains.kotlinx.dataframe.DataColumn
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.api.ValueProperty
import org.jetbrains.kotlinx.dataframe.api.cast
import org.jetbrains.kotlinx.dataframe.api.colsOf
import org.jetbrains.kotlinx.dataframe.api.column
import org.jetbrains.kotlinx.dataframe.api.dataFrameOf
import org.jetbrains.kotlinx.dataframe.api.getColumn
import org.jetbrains.kotlinx.dataframe.api.getColumns
import org.jetbrains.kotlinx.dataframe.api.map
import org.jetbrains.kotlinx.dataframe.api.named
import org.jetbrains.kotlinx.dataframe.api.toColumn
import org.jetbrains.kotlinx.dataframe.api.toColumnGroup
import org.jetbrains.kotlinx.dataframe.api.toDataFrame
import org.jetbrains.kotlinx.dataframe.columns.BaseColumn
import org.jetbrains.kotlinx.dataframe.columns.ColumnGroup
import org.jetbrains.kotlinx.multik.api.mk
import org.jetbrains.kotlinx.multik.api.ndarray
import org.jetbrains.kotlinx.multik.ndarray.complex.Complex
import org.jetbrains.kotlinx.multik.ndarray.data.D1Array
import org.jetbrains.kotlinx.multik.ndarray.data.D2Array
import org.jetbrains.kotlinx.multik.ndarray.data.D3Array
import org.jetbrains.kotlinx.multik.ndarray.data.MultiArray
import org.jetbrains.kotlinx.multik.ndarray.data.NDArray
import org.jetbrains.kotlinx.multik.ndarray.data.get
import org.jetbrains.kotlinx.multik.ndarray.operations.toList
import org.jetbrains.kotlinx.multik.ndarray.operations.toListD2
import kotlin.experimental.ExperimentalTypeInference
import kotlin.reflect.KClass
import kotlin.reflect.KType
import kotlin.reflect.full.isSubtypeOf
import kotlin.reflect.typeOf
// region 1D
/** Converts a one-dimensional array ([D1Array]) to a [DataColumn] with optional [name]. */
inline fun <reified N> D1Array<N>.convertToColumn(name: String = ""): DataColumn<N> {
// we can simply convert the 1D array to a typed list and create a typed column from it
// by using the reified type parameter, DataFrame needs to do no inference :)
val values = this.toList()
return column<N>(values) named name
}
/**
* Converts a one-dimensional array ([D1Array]) of type [N] into a DataFrame.
* The resulting DataFrame contains a single column named "value", where each element of the array becomes a row in the DataFrame.
*
* @return a DataFrame where each element of the source array is represented as a row in a column named "value" under the schema [ValueProperty].
*/
@JvmName("convert1dArrayToDataFrame")
inline fun <reified N> D1Array<N>.convertToDataFrame(): DataFrame<ValueProperty<N>> {
// do the conversion like above, but name the column "value"...
val column = this.convertToColumn(ValueProperty<*>::value.name)
// ...so we can cast it to a ValueProperty DataFrame
return dataFrameOf(column).cast<ValueProperty<N>>()
}
/** Converts a [DataColumn] to a one-dimensional array ([D1Array]). */
@JvmName("convertNumberColumnToMultik")
inline fun <reified N> DataColumn<N>.convertToMultik(): D1Array<N> where N : Number, N : Comparable<N> {
// we can convert our column to a typed list again to convert it to a multik array
val values = this.toList()
return mk.ndarray(values)
}
/** Converts a [DataColumn] to a one-dimensional array ([D1Array]). */
@JvmName("convertComplexColumnToMultik")
inline fun <reified N : Complex> DataColumn<N>.convertToMultik(): D1Array<N> {
// we can convert our column to a typed list again to convert it to a multik array
val values = this.toList()
return mk.ndarray(values)
}
/** Converts a [DataColumn] selected by [column] to a one-dimensional array ([D1Array]). */
@JvmName("convertNumberColumnFromDfToMultik")
@OverloadResolutionByLambdaReturnType
inline fun <T, reified N> DataFrame<T>.convertToMultik(
crossinline column: ColumnSelector<T, N>,
): D1Array<N>
where N : Number, N : Comparable<N> {
// use the selector to get the column from this DataFrame and convert it
val col = this.getColumn { column(it) }
return col.convertToMultik()
}
/** Converts a [DataColumn] selected by [column] to a one-dimensional array ([D1Array]). */
@JvmName("convertComplexColumnFromDfToMultik")
@OverloadResolutionByLambdaReturnType
inline fun <T, reified N : Complex> DataFrame<T>.convertToMultik(crossinline column: ColumnSelector<T, N>): D1Array<N> {
// use the selector to get the column from this DataFrame and convert it
val col = this.getColumn { column(it) }
return col.convertToMultik()
}
// endregion
// region 2D
/**
* Converts a two-dimensional array ([D2Array]) to a DataFrame.
* It will contain `shape[0]` rows and `shape[1]` columns.
*
* Column names can be specified using the [columnNameGenerator] lambda.
*
* The conversion enforces that `multikArray[x][y] == dataframe[x][y]`
*/
@JvmName("convert2dArrayToDataFrame")
inline fun <reified N> D2Array<N>.convertToDataFrame(columnNameGenerator: (Int) -> String = { "col$it" }): AnyFrame {
// Turning the 2D array into a list of typed columns first, no inference needed
val columns: List<DataColumn<N>> = List(shape[1]) { i ->
this[0..<shape[0], i] // get all cells of column i
.toList()
.toColumn<N>(name = columnNameGenerator(i))
}
// and make a DataFrame from it
return columns.toDataFrame()
}
/**
* Converts a [DataFrame] to a two-dimensional array ([D2Array]).
* You'll need to specify which columns to convert using the [columns] selector.
*
* All columns need to be of the same type. If no columns are supplied, the function
* will only succeed if all columns are of the same type.
*
* @see convertToMultikOf
*/
@JvmName("convertNumberColumnsFromDfToMultik")
@OverloadResolutionByLambdaReturnType
inline fun <T, reified N> DataFrame<T>.convertToMultik(
crossinline columns: ColumnsSelector<T, N>,
): D2Array<N>
where N : Number, N : Comparable<N> {
// use the selector to get the columns from this DataFrame and convert them
val cols = this.getColumns { columns(it) }
return cols.convertToMultik()
}
/**
* Converts a [DataFrame] to a two-dimensional array ([D2Array]).
* You'll need to specify which columns to convert using the [columns] selector.
*
* All columns need to be of the same type. If no columns are supplied, the function
* will only succeed if all columns are of the same type.
*
* @see convertToMultikOf
*/
@JvmName("convertComplexColumnsFromDfToMultik")
@OverloadResolutionByLambdaReturnType
inline fun <T, reified N : Complex> DataFrame<T>.convertToMultik(
crossinline columns: ColumnsSelector<T, N>,
): D2Array<N> {
// use the selector to get the columns from this DataFrame and convert them
val cols = this.getColumns { columns(it) }
return cols.convertToMultik()
}
/**
* Converts a [DataFrame] to a two-dimensional array ([D2Array]).
* You'll need to specify which columns to convert using the `columns` selector.
*
* All columns need to be of the same type. If no columns are supplied, the function
* will only succeed if all columns in [this] are of the same type.
*
* @see convertToMultikOf
*/
@JvmName("convertToMultikGuess")
fun AnyFrame.convertToMultik(): D2Array<*> {
val columnTypes = this.columnTypes().distinct()
val type = columnTypes.singleOrNull() ?: error("found multiple column types: $columnTypes")
return when {
type == typeOf<Complex>() -> convertToMultik { colsOf<Complex>() }
type.isSubtypeOf(typeOf<Byte>()) -> convertToMultik { colsOf<Byte>() }
type.isSubtypeOf(typeOf<Short>()) -> convertToMultik { colsOf<Short>() }
type.isSubtypeOf(typeOf<Int>()) -> convertToMultik { colsOf<Int>() }
type.isSubtypeOf(typeOf<Long>()) -> convertToMultik { colsOf<Long>() }
type.isSubtypeOf(typeOf<Float>()) -> convertToMultik { colsOf<Float>() }
type.isSubtypeOf(typeOf<Double>()) -> convertToMultik { colsOf<Double>() }
else -> error("found multiple column types: $columnTypes")
}
}
/**
* Converts a [DataFrame] to a two-dimensional array ([D2Array]) by taking all
* columns of type [N].
*
* Allows you to write `df.convertToMultikOf<Complex>()`.
*
* @see convertToMultik
*/
@JvmName("convertToMultikOfComplex")
@Suppress("LocalVariableName")
inline fun <reified N : Complex> AnyFrame.convertToMultikOf(
// unused param to avoid overload resolution ambiguity
_klass: KClass<Complex> = Complex::class,
): D2Array<N> =
convertToMultik { colsOf<N>() }
/**
* Converts a [DataFrame] to a two-dimensional array ([D2Array]) by taking all
* columns of type [N].
*
* Allows you to write `df.convertToMultikOf<Int>()`.
*
* @see convertToMultik
*/
@JvmName("convertToMultikOfNumber")
@Suppress("LocalVariableName")
inline fun <reified N> AnyFrame.convertToMultikOf(
// unused param to avoid overload resolution ambiguity
_klass: KClass<Number> = Number::class,
): D2Array<N> where N : Number, N : Comparable<N> = convertToMultik { colsOf<N>() }
/**
* Helper function to convert a list of same-typed [DataColumn]s to a two-dimensional array ([D2Array]).
* We cannot enforce all columns have the same type if we require just a [DataFrame].
*/
@Suppress("UNCHECKED_CAST")
@JvmName("convertNumberColumnsToMultik")
inline fun <reified N> List<DataColumn<N>>.convertToMultik(): D2Array<N> where N : Number, N : Comparable<N> {
// to get the list of columns as a list of rows, we need to convert them back to a dataframe first,
// then we can get the values of each row
val rows = this.toDataFrame().map { row -> row.values() as List<N> }
return mk.ndarray(rows)
}
/**
* Helper function to convert a list of same-typed [DataColumn]s to a two-dimensional array ([D2Array]).
* We cannot enforce all columns have the same type if we require just a [DataFrame].
*/
@Suppress("UNCHECKED_CAST")
@JvmName("convertComplexColumnsToMultik")
inline fun <reified N : Complex> List<DataColumn<N>>.convertToMultik(): D2Array<N> {
// to get the list of columns as a list of rows, we need to convert them back to a dataframe first,
// then we can get the values of each row
val rows = this.toDataFrame().map { row -> row.values() as List<N> }
return mk.ndarray(rows)
}
// endregion
// region higher dimensions
/**
* Converts a three-dimensional array ([D3Array]) to a DataFrame.
* It will contain `shape[0]` rows and `shape[1]` columns containing lists of size `shape[2]`.
*
* Column names can be specified using the [columnNameGenerator] lambda.
*
* The conversion enforces that `multikArray[x][y][z] == dataframe[x][y][z]`
*/
inline fun <reified N> D3Array<N>.convertToDataFrameWithLists(
columnNameGenerator: (Int) -> String = { "col$it" },
): AnyFrame {
val columns: List<DataColumn<List<N>>> = List(shape[1]) { y ->
this[0..<shape[0], y, 0..<shape[2]] // get all cells of column y, each is a 2d array of size shape[0] x shape[2]
.toListD2() // get a shape[0]-sized list/column filled with lists of size shape[2]
.toColumn<List<N>>(name = columnNameGenerator(y))
}
return columns.toDataFrame()
}
/**
* Converts a three-dimensional array ([D3Array]) to a DataFrame.
* It will contain `shape[0]` rows and `shape[1]` column groups containing `shape[2]` columns each.
*
* Column names can be specified using the [columnNameGenerator] lambda.
*
* The conversion enforces that `multikArray[x][y][z] == dataframe[x][y][z]`
*/
@JvmName("convert3dArrayToDataFrame")
inline fun <reified N> D3Array<N>.convertToDataFrame(columnNameGenerator: (Int) -> String = { "col$it" }): AnyFrame {
val columns: List<ColumnGroup<*>> = List(shape[1]) { y ->
this[0..<shape[0], y, 0..<shape[2]] // get all cells of column i, each is a 2d array of size shape[0] x shape[2]
.transpose(1, 0) // flip, so we get shape[2] x shape[0]
.toListD2() // get a shape[2]-sized list filled with lists of size shape[0]
.mapIndexed { z, list ->
list.toColumn<N>(name = columnNameGenerator(z))
} // we get shape[2] columns inside each column group
.toColumnGroup(name = columnNameGenerator(y))
}
return columns.toDataFrame()
}
/**
* Exploratory recursive function to convert a [MultiArray] of any number of dimensions
* to a `List<List<...>>` of the same number of dimensions.
*/
fun <T> MultiArray<T, *>.toListDn(): List<*> {
// Recursive helper function to handle traversal across dimensions
fun toListRecursive(indices: IntArray): List<*> {
// If we are at the last dimension (1D case)
if (indices.size == shape.lastIndex) {
return List(shape[indices.size]) { i ->
this[intArrayOf(*indices, i)] // Collect values for this dimension
}
}
// For higher dimensions, recursively process smaller dimensions
return List(shape[indices.size]) { i ->
toListRecursive(indices + i) // Add `i` to the current index array
}
}
return toListRecursive(intArrayOf())
}
/**
* Converts a multidimensional array ([NDArray]) to a DataFrame.
* Inspired by [toListDn].
*
* For a single-dimensional array, it will call [D1Array.convertToDataFrame].
*
* Column names can be specified using the [columnNameGenerator] lambda.
*
* The conversion enforces that `multikArray[a][b][c][d]... == dataframe[a][b][c][d]...`
*/
@Suppress("UNCHECKED_CAST")
inline fun <reified N> NDArray<N, *>.convertToDataFrameNestedGroups(
noinline columnNameGenerator: (Int) -> String = { "col$it" },
): AnyFrame {
if (shape.size == 1) return (this as D1Array<N>).convertToDataFrame()
// push the first dimension to the end, because this represents the rows in DataFrame,
// and they are accessed by []'s first
return transpose(*(1..<dim.d).toList().toIntArray(), 0)
.convertToDataFrameNestedGroupsRecursive(
indices = intArrayOf(),
type = typeOf<N>(), // cannot inline a recursive function, so pass the type explicitly
columnNameGenerator = columnNameGenerator,
).let {
// we could just cast this to a DataFrame<*>, because a ColumnGroup<*>: DataFrame
// however, this can sometimes cause issues where instance checks are done at runtime
// this converts it to an actual DataFrame instance
dataFrameOf((it as ColumnGroup<*>).columns())
}
}
/**
* Recursive helper function to handle traversal across dimensions. Do not call directly,
* use [convertToDataFrameNestedGroups] instead.
*/
@PublishedApi
internal fun NDArray<*, *>.convertToDataFrameNestedGroupsRecursive(
indices: IntArray,
type: KType,
columnNameGenerator: (Int) -> String,
): BaseColumn<*> {
// If we are at the last dimension (1D case)
if (indices.size == shape.lastIndex) {
return List(shape[indices.size]) { i ->
this[intArrayOf(*indices, i)] // Collect values for this dimension
}.let {
DataColumn.createByType(name = "", values = it, type = type)
}
}
// For higher dimensions, recursively process smaller dimensions
return List(shape[indices.size]) { i ->
convertToDataFrameNestedGroupsRecursive(
indices = indices + i, // Add `i` to the current index array
type = type,
columnNameGenerator = columnNameGenerator,
).rename(columnNameGenerator(i))
}.toColumnGroup("")
}
// endregion
@@ -0,0 +1,23 @@
package org.jetbrains.kotlinx.dataframe.examples.multik
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.multik.api.io.readNPY
import org.jetbrains.kotlinx.multik.api.mk
import org.jetbrains.kotlinx.multik.ndarray.data.D1
import java.io.File
/**
* Multik can read/write data from NPY/NPZ files.
* We can use this from DataFrame too!
*
* We use compatibilityLayer.kt for the conversions, check it out for the implementation details of the conversion!
*/
fun main() {
val npyFilename = "a1d.npy"
val npyFile = File(object {}.javaClass.classLoader.getResource(npyFilename)!!.toURI())
val mk1 = mk.readNPY<Long, D1>(npyFile)
val df1 = mk1.convertToDataFrame()
df1.print(borders = true, columnTypes = true)
}
@@ -0,0 +1,99 @@
package org.jetbrains.kotlinx.dataframe.examples.multik
import org.jetbrains.kotlinx.dataframe.api.cast
import org.jetbrains.kotlinx.dataframe.api.colsOf
import org.jetbrains.kotlinx.dataframe.api.describe
import org.jetbrains.kotlinx.dataframe.api.mean
import org.jetbrains.kotlinx.dataframe.api.meanFor
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.value
import org.jetbrains.kotlinx.multik.api.mk
import org.jetbrains.kotlinx.multik.api.rand
import org.jetbrains.kotlinx.multik.ndarray.data.get
/**
* Let's explore some ways we can combine Multik with Kotlin DataFrame.
*
* We will use compatibilityLayer.kt for the conversions.
* Take a look at that file for the implementation details!
*/
fun main() {
oneDimension()
twoDimensions()
higherDimensions()
}
fun oneDimension() {
// we can convert a 1D ndarray to a column of a DataFrame:
val mk1 = mk.rand<Double>(50)
val col1 by mk1.convertToColumn()
println(col1)
// or straight to a DataFrame. It will become the `value` column.
val df1 = mk1.convertToDataFrame()
println(df1)
// this allows us to perform any DF operation:
println(df1.mean { value })
df1.describe().print(borders = true)
// we can convert back to Multik:
val mk2 = df1.convertToMultik { value }
// or
df1.value.convertToMultik()
println(mk2)
}
fun twoDimensions() {
// we can also convert a 2D ndarray to a DataFrame
// This conversion will create columns like "col0", "col1", etc.
// (careful, when the number of columns is too large, this can cause problems)
// but will allow for similar access like in multik
// aka: `multikArray[x][y] == dataframe[x][y]`
val mk1 = mk.rand<Int>(5, 10)
println(mk1)
val df = mk1.convertToDataFrame()
df.print()
// this allows us to perform any DF operation:
val means = df.meanFor { ("col0".."col9").cast<Int>() }
means.print()
// we can convert back to Multik in multiple ways.
// Multik can only store one type of data, so we need to specify the type or select
// only the columns we want:
val mk2 = df.convertToMultik { colsOf<Int>() }
// or
df.convertToMultikOf<Int>()
// or if all columns are of the same type:
df.convertToMultik()
println(mk2)
}
fun higherDimensions() {
// Multik can store higher dimensions as well
// however; to convert this to a DataFrame, we need to specify how to do a particular conversion
// for instance, for 3d, we could store a list in each cell of the DF to represent the extra dimension:
val mk1 = mk.rand<Int>(5, 4, 3)
println(mk1)
val df1 = mk1.convertToDataFrameWithLists()
df1.print()
// Alternatively, this could be solved using column groups.
// This subdivides each column into more columns, while ensuring `multikArray[x][y][z] == dataframe[x][y][z]`
val df2 = mk1.convertToDataFrame()
df2.print()
// For even higher dimensions, we can keep adding more column groups
val mk2 = mk.rand<Int>(5, 4, 3, 2)
val df3 = mk2.convertToDataFrameNestedGroups()
df3.print()
// ...or use nested DataFrames (in FrameColumns)
// (for instance, a 4D matrix could be stored in a 2D DataFrame where each cell is another DataFrame)
// but, we'll leave that as an exercise for the reader :)
}
@@ -0,0 +1,115 @@
package org.jetbrains.kotlinx.dataframe.examples.multik
import kotlinx.datetime.LocalDate
import kotlinx.datetime.Month
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
import org.jetbrains.kotlinx.dataframe.api.append
import org.jetbrains.kotlinx.dataframe.api.cast
import org.jetbrains.kotlinx.dataframe.api.mapToFrame
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.single
import org.jetbrains.kotlinx.dataframe.api.toDataFrame
import org.jetbrains.kotlinx.multik.api.mk
import org.jetbrains.kotlinx.multik.api.rand
import org.jetbrains.kotlinx.multik.ndarray.data.D3Array
import org.jetbrains.kotlinx.multik.ndarray.data.D4Array
/**
* DataFrames can store anything inside, including Multik ndarrays.
* This can be useful for storing matrices for easier access later or to simply organize data read from other files.
* For example, MRI data is often stored as 3D arrays and sometimes even 4D arrays.
*/
fun main() {
// imaginary list of patient data
@Suppress("ktlint:standard:argument-list-wrapping")
val metadata = listOf(
MriMetadata(10012L, 25, "Healthy", LocalDate(2023, 1, 1)),
MriMetadata(10013L, 45, "Tuberculosis", LocalDate(2023, 2, 15)),
MriMetadata(10014L, 32, "Healthy", LocalDate(2023, 3, 22)),
MriMetadata(10015L, 58, "Pneumonia", LocalDate(2023, 4, 8)),
MriMetadata(10016L, 29, "Tuberculosis", LocalDate(2023, 5, 30)),
MriMetadata(10017L, 42, "Healthy", LocalDate(2023, 6, 15)),
MriMetadata(10018L, 37, "Healthy", LocalDate(2023, 7, 1)),
MriMetadata(10019L, 55, "Healthy", LocalDate(2023, 8, 15)),
MriMetadata(10020L, 28, "Healthy", LocalDate(2023, 9, 1)),
MriMetadata(10021L, 44, "Healthy", LocalDate(2023, 10, 15)),
MriMetadata(10022L, 31, "Healthy", LocalDate(2023, 11, 1)),
).toDataFrame()
// "reading" the results from "files"
val results = metadata.mapToFrame {
+patientId
+age
+diagnosis
+scanDate
"t1WeightedMri" from { readT1WeightedMri(patientId) }
"fMriBoldSeries" from { readFMRiBoldSeries(patientId) }
}.cast<MriResults>(verify = true)
.append()
results.print(borders = true)
// now when we want to check and visualize the T1-weighted MRI scan
// for that one healthy patient in July, we can do:
val scan = results
.single { scanDate.month == Month.JULY && diagnosis == "Healthy" }
.t1WeightedMri
// easy :)
visualize(scan)
}
@DataSchema
data class MriMetadata(
/** Unique patient ID. */
val patientId: Long,
/** Patient age. */
val age: Int,
/** Clinical diagnosis (e.g. "Healthy", "Tuberculosis") */
val diagnosis: String,
/** Date of the scan */
val scanDate: LocalDate,
)
@DataSchema
data class MriResults(
/** Unique patient ID. */
val patientId: Long,
/** Patient age. */
val age: Int,
/** Clinical diagnosis (e.g. "Healthy", "Tuberculosis") */
val diagnosis: String,
/** Date of the scan */
val scanDate: LocalDate,
/**
* T1-weighted anatomical MRI scan.
*
* Dimensions: (256 x 256 x 180)
* - 256 width x 256 height
* - 180 slices
*/
val t1WeightedMri: D3Array<Float>,
/**
* Blood oxygenation level-dependent (BOLD) time series from an fMRI scan.
*
* Dimensions: (64 x 64 x 30 x 200)
* - 64 width x 64 height
* - 30 slices
* - 200 timepoints
*/
val fMriBoldSeries: D4Array<Float>,
)
fun readT1WeightedMri(id: Long): D3Array<Float> {
// This should in practice, of course, read the actual data, but for this example we just return a dummy array
return mk.rand(256, 256, 180)
}
fun readFMRiBoldSeries(id: Long): D4Array<Float> {
// This should in practice, of course, read the actual data, but for this example we just return a dummy array
return mk.rand(64, 64, 30, 200)
}
fun visualize(scan: D3Array<Float>) {
// This would then actually visualize the scan
}
@@ -0,0 +1,77 @@
import org.jetbrains.kotlin.gradle.dsl.JvmTarget
plugins {
application
kotlin("jvm")
// uses the 'old' Gradle plugin instead of the compiler plugin for now
id("org.jetbrains.kotlinx.dataframe")
// only mandatory if `kotlin.dataframe.add.ksp=false` in gradle.properties
id("com.google.devtools.ksp")
}
repositories {
mavenLocal() // in case of local dataframe development
mavenCentral()
}
dependencies {
// implementation("org.jetbrains.kotlinx:dataframe:X.Y.Z")
implementation(project(":"))
// (kotlin) spark support
implementation(libs.kotlin.spark)
compileOnly(libs.spark)
implementation(libs.log4j.core)
implementation(libs.log4j.api)
}
/**
* Runs the kotlinSpark/typedDataset example with java 11.
*/
val runKotlinSparkTypedDataset by tasks.registering(JavaExec::class) {
classpath = sourceSets["main"].runtimeClasspath
javaLauncher = javaToolchains.launcherFor { languageVersion = JavaLanguageVersion.of(11) }
mainClass = "org.jetbrains.kotlinx.dataframe.examples.kotlinSpark.TypedDatasetKt"
}
/**
* Runs the kotlinSpark/untypedDataset example with java 11.
*/
val runKotlinSparkUntypedDataset by tasks.registering(JavaExec::class) {
classpath = sourceSets["main"].runtimeClasspath
javaLauncher = javaToolchains.launcherFor { languageVersion = JavaLanguageVersion.of(11) }
mainClass = "org.jetbrains.kotlinx.dataframe.examples.kotlinSpark.UntypedDatasetKt"
}
/**
* Runs the spark/typedDataset example with java 11.
*/
val runSparkTypedDataset by tasks.registering(JavaExec::class) {
classpath = sourceSets["main"].runtimeClasspath
javaLauncher = javaToolchains.launcherFor { languageVersion = JavaLanguageVersion.of(11) }
mainClass = "org.jetbrains.kotlinx.dataframe.examples.spark.TypedDatasetKt"
}
/**
* Runs the spark/untypedDataset example with java 11.
*/
val runSparkUntypedDataset by tasks.registering(JavaExec::class) {
classpath = sourceSets["main"].runtimeClasspath
javaLauncher = javaToolchains.launcherFor { languageVersion = JavaLanguageVersion.of(11) }
mainClass = "org.jetbrains.kotlinx.dataframe.examples.spark.UntypedDatasetKt"
}
kotlin {
compilerOptions {
jvmTarget = JvmTarget.JVM_11
freeCompilerArgs.add("-Xjdk-release=11")
}
}
tasks.withType<JavaCompile> {
sourceCompatibility = JavaVersion.VERSION_11.toString()
targetCompatibility = JavaVersion.VERSION_11.toString()
options.release.set(11)
}
@@ -0,0 +1,8 @@
@file:Suppress("ktlint:standard:no-empty-file")
package org.jetbrains.kotlinx.dataframe.examples.kotlinSpark
/*
* See ../spark/compatibilityLayer.kt for the implementation.
* It's the same with- and without the Kotlin Spark API.
*/
@@ -0,0 +1,78 @@
@file:Suppress("ktlint:standard:function-signature")
package org.jetbrains.kotlinx.dataframe.examples.kotlinSpark
import org.apache.spark.sql.Dataset
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
import org.jetbrains.kotlinx.dataframe.api.aggregate
import org.jetbrains.kotlinx.dataframe.api.groupBy
import org.jetbrains.kotlinx.dataframe.api.max
import org.jetbrains.kotlinx.dataframe.api.mean
import org.jetbrains.kotlinx.dataframe.api.min
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.schema
import org.jetbrains.kotlinx.dataframe.api.std
import org.jetbrains.kotlinx.dataframe.api.toDataFrame
import org.jetbrains.kotlinx.dataframe.api.toList
import org.jetbrains.kotlinx.spark.api.withSpark
/**
* With the Kotlin Spark API, normal Kotlin data classes are supported,
* meaning we can reuse the same class for Spark and DataFrame!
*
* Also, since we use an actual class to define the schema, we need no type conversion!
*
* See [Person] and [Name] for an example.
*
* NOTE: You will likely need to run this function with Java 8 or 11 for it to work correctly.
* Use the `runKotlinSparkTypedDataset` Gradle task to do so.
*/
fun main() = withSpark {
// Creating a Spark Dataset. Usually, this is loaded from some server or database.
val rawDataset: Dataset<Person> = listOf(
Person(Name("Alice", "Cooper"), 15, "London", 54, true),
Person(Name("Bob", "Dylan"), 45, "Dubai", 87, true),
Person(Name("Charlie", "Daniels"), 20, "Moscow", null, false),
Person(Name("Charlie", "Chaplin"), 40, "Milan", null, true),
Person(Name("Bob", "Marley"), 30, "Tokyo", 68, true),
Person(Name("Alice", "Wolf"), 20, null, 55, false),
Person(Name("Charlie", "Byrd"), 30, "Moscow", 90, true),
).toDS()
// we can perform large operations in Spark.
// DataFrames are in-memory structures, so this is a good place to limit the number of rows if you don't have the RAM ;)
val dataset = rawDataset.filter { it.age > 17 }
// and convert it to DataFrame via a typed List
val dataframe = dataset.collectAsList().toDataFrame()
dataframe.schema().print()
dataframe.print(columnTypes = true, borders = true)
// now we can use DataFrame-specific functions
val ageStats = dataframe
.groupBy { city }.aggregate {
mean { age } into "meanAge"
std { age } into "stdAge"
min { age } into "minAge"
max { age } into "maxAge"
}
ageStats.print(columnTypes = true, borders = true)
// and when we want to convert a DataFrame back to Spark, we can do the same trick via a typed List
val sparkDatasetAgain = dataframe.toList().toDS()
sparkDatasetAgain.printSchema()
sparkDatasetAgain.show()
}
@DataSchema
data class Name(val firstName: String, val lastName: String)
@DataSchema
data class Person(
val name: Name,
val age: Int,
val city: String?,
val weight: Int?,
val isHappy: Boolean,
)
@@ -0,0 +1,74 @@
@file:Suppress("ktlint:standard:function-signature")
package org.jetbrains.kotlinx.dataframe.examples.kotlinSpark
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.Row
import org.jetbrains.kotlinx.dataframe.api.aggregate
import org.jetbrains.kotlinx.dataframe.api.groupBy
import org.jetbrains.kotlinx.dataframe.api.max
import org.jetbrains.kotlinx.dataframe.api.mean
import org.jetbrains.kotlinx.dataframe.api.min
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.schema
import org.jetbrains.kotlinx.dataframe.api.std
import org.jetbrains.kotlinx.dataframe.examples.spark.convertToDataFrame
import org.jetbrains.kotlinx.dataframe.examples.spark.convertToDataFrameByInference
import org.jetbrains.kotlinx.dataframe.examples.spark.convertToSpark
import org.jetbrains.kotlinx.spark.api.col
import org.jetbrains.kotlinx.spark.api.gt
import org.jetbrains.kotlinx.spark.api.withSpark
/**
* Since we don't know the schema at compile time this time, we need to do
* some schema mapping in between Spark and DataFrame.
*
* We will use spark/compatibilityLayer.kt to do this.
* Take a look at that file for the implementation details!
*
* NOTE: You will likely need to run this function with Java 8 or 11 for it to work correctly.
* Use the `runKotlinSparkUntypedDataset` Gradle task to do so.
*/
fun main() = withSpark {
// Creating a Spark Dataframe (untyped Dataset). Usually, this is loaded from some server or database.
val rawDataset: Dataset<Row> = listOf(
Person(Name("Alice", "Cooper"), 15, "London", 54, true),
Person(Name("Bob", "Dylan"), 45, "Dubai", 87, true),
Person(Name("Charlie", "Daniels"), 20, "Moscow", null, false),
Person(Name("Charlie", "Chaplin"), 40, "Milan", null, true),
Person(Name("Bob", "Marley"), 30, "Tokyo", 68, true),
Person(Name("Alice", "Wolf"), 20, null, 55, false),
Person(Name("Charlie", "Byrd"), 30, "Moscow", 90, true),
).toDF()
// we can perform large operations in Spark.
// DataFrames are in-memory structures, so this is a good place to limit the number of rows if you don't have the RAM ;)
val dataset = rawDataset.filter(col("age") gt 17)
// Using inference
val df1 = dataset.convertToDataFrameByInference()
df1.schema().print()
df1.print(columnTypes = true, borders = true)
// Using full schema mapping
val df2 = dataset.convertToDataFrame()
df2.schema().print()
df2.print(columnTypes = true, borders = true)
// now we can use DataFrame-specific functions
val ageStats = df1
.groupBy("city").aggregate {
mean("age") into "meanAge"
std("age") into "stdAge"
min("age") into "minAge"
max("age") into "maxAge"
}
ageStats.print(columnTypes = true, borders = true)
// and when we want to convert a DataFrame back to Spark, we will use the `convertToSpark()` extension function
// This performs the necessary schema mapping under the hood.
val sparkDataset = df2.convertToSpark(spark, sc)
sparkDataset.printSchema()
sparkDataset.show()
}
@@ -0,0 +1,330 @@
package org.jetbrains.kotlinx.dataframe.examples.spark
import org.apache.spark.api.java.JavaRDD
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.Row
import org.apache.spark.sql.RowFactory
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.ArrayType
import org.apache.spark.sql.types.DataType
import org.apache.spark.sql.types.DataTypes
import org.apache.spark.sql.types.Decimal
import org.apache.spark.sql.types.DecimalType
import org.apache.spark.sql.types.MapType
import org.apache.spark.sql.types.StructType
import org.apache.spark.unsafe.types.CalendarInterval
import org.jetbrains.kotlinx.dataframe.AnyFrame
import org.jetbrains.kotlinx.dataframe.DataColumn
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.DataRow
import org.jetbrains.kotlinx.dataframe.api.rows
import org.jetbrains.kotlinx.dataframe.api.schema
import org.jetbrains.kotlinx.dataframe.api.toDataFrame
import org.jetbrains.kotlinx.dataframe.columns.ColumnGroup
import org.jetbrains.kotlinx.dataframe.columns.TypeSuggestion
import org.jetbrains.kotlinx.dataframe.schema.ColumnSchema
import org.jetbrains.kotlinx.dataframe.schema.DataFrameSchema
import java.math.BigDecimal
import java.math.BigInteger
import java.sql.Date
import java.sql.Timestamp
import java.time.Instant
import java.time.LocalDate
import kotlin.reflect.KType
import kotlin.reflect.KTypeProjection
import kotlin.reflect.full.createType
import kotlin.reflect.full.isSubtypeOf
import kotlin.reflect.full.withNullability
import kotlin.reflect.typeOf
// region Spark to DataFrame
/**
* Converts an untyped Spark [Dataset] (Dataframe) to a Kotlin [DataFrame].
* [StructTypes][StructType] are converted to [ColumnGroups][ColumnGroup].
*
* DataFrame supports type inference to do the conversion automatically.
* This is usually fine for smaller data sets, but when working with larger datasets, a type map might be a good idea.
* See [convertToDataFrame] for more information.
*/
fun Dataset<Row>.convertToDataFrameByInference(
schema: StructType = schema(),
prefix: List<String> = emptyList(),
): AnyFrame {
val columns = schema.fields().map { field ->
val name = field.name()
when (val dataType = field.dataType()) {
is StructType ->
// a column group can be easily created from a dataframe and a name
DataColumn.createColumnGroup(
name = name,
df = this.convertToDataFrameByInference(dataType, prefix + name),
)
else ->
// we can use DataFrame type inference to create a column with the correct type
// from Spark we use `select()` to select a single column
// and `collectAsList()` to get all the values in a list of single-celled rows
DataColumn.createByInference(
name = name,
values = this.select((prefix + name).joinToString("."))
.collectAsList()
.map { it[0] },
suggestedType = TypeSuggestion.Infer,
// Spark provides nullability :) you can leave this out if you want this to be inferred too
nullable = field.nullable(),
)
}
}
return columns.toDataFrame()
}
/**
* Converts an untyped Spark [Dataset] (Dataframe) to a Kotlin [DataFrame].
* [StructTypes][StructType] are converted to [ColumnGroups][ColumnGroup].
*
* This version uses a [type-map][DataType.convertToDataFrame] to convert the schemas with a fallback to inference.
* For smaller data sets, inference is usually fine too.
* See [convertToDataFrameByInference] for more information.
*/
fun Dataset<Row>.convertToDataFrame(schema: StructType = schema(), prefix: List<String> = emptyList()): AnyFrame {
val columns = schema.fields().map { field ->
val name = field.name()
when (val dataType = field.dataType()) {
is StructType ->
// a column group can be easily created from a dataframe and a name
DataColumn.createColumnGroup(
name = name,
df = convertToDataFrame(dataType, prefix + name),
)
else ->
// we create a column with the correct type using our type-map with fallback to inference
// from Spark we use `select()` to select a single column
// and `collectAsList()` to get all the values in a list of single-celled rows
DataColumn.createByInference(
name = name,
values = select((prefix + name).joinToString("."))
.collectAsList()
.map { it[0] },
suggestedType =
dataType.convertToDataFrame()
?.let(TypeSuggestion::Use)
?: TypeSuggestion.Infer, // fallback to inference if needed
nullable = field.nullable(),
)
}
}
return columns.toDataFrame()
}
/**
* Returns the corresponding [Kotlin type][KType] for a given Spark [DataType].
*
* This list may be incomplete, but it can at least give you a good start.
*
* @return The [KType] that corresponds to the Spark [DataType], or null if no matching [KType] is found.
*/
fun DataType.convertToDataFrame(): KType? =
when {
this == DataTypes.ByteType -> typeOf<Byte>()
this == DataTypes.ShortType -> typeOf<Short>()
this == DataTypes.IntegerType -> typeOf<Int>()
this == DataTypes.LongType -> typeOf<Long>()
this == DataTypes.BooleanType -> typeOf<Boolean>()
this == DataTypes.FloatType -> typeOf<Float>()
this == DataTypes.DoubleType -> typeOf<Double>()
this == DataTypes.StringType -> typeOf<String>()
this == DataTypes.DateType -> typeOf<Date>()
this == DataTypes.TimestampType -> typeOf<Timestamp>()
this is DecimalType -> typeOf<Decimal>()
this == DataTypes.CalendarIntervalType -> typeOf<CalendarInterval>()
this == DataTypes.NullType -> nullableNothingType
this == DataTypes.BinaryType -> typeOf<ByteArray>()
this is ArrayType -> {
when (elementType()) {
DataTypes.ShortType -> typeOf<ShortArray>()
DataTypes.IntegerType -> typeOf<IntArray>()
DataTypes.LongType -> typeOf<LongArray>()
DataTypes.FloatType -> typeOf<FloatArray>()
DataTypes.DoubleType -> typeOf<DoubleArray>()
DataTypes.BooleanType -> typeOf<BooleanArray>()
else -> null
}
}
this is MapType -> {
val key = keyType().convertToDataFrame() ?: return null
val value = valueType().convertToDataFrame() ?: return null
Map::class.createType(
listOf(
KTypeProjection.invariant(key),
KTypeProjection.invariant(value.withNullability(valueContainsNull())),
),
)
}
else -> null
}
// endregion
// region DataFrame to Spark
/**
* Converts the [DataFrame] to a Spark [Dataset] of [Rows][Row] using the provided [SparkSession] and [JavaSparkContext].
*
* Spark needs both the data and the schema to be converted to create a correct [Dataset],
* so we need to map our types somehow.
*
* @param spark The [SparkSession] object to use for creating the [DataFrame].
* @param sc The [JavaSparkContext] object to use for converting the [DataFrame] to [RDD][JavaRDD].
* @return A [Dataset] of [Rows][Row] representing the converted DataFrame.
*/
fun DataFrame<*>.convertToSpark(spark: SparkSession, sc: JavaSparkContext): Dataset<Row> {
// Convert each row to spark rows
val rows = sc.parallelize(this.rows().map { it.convertToSpark() })
// convert the data schema to a spark StructType
val schema = this.schema().convertToSpark()
return spark.createDataFrame(rows, schema)
}
/**
* Converts a [DataRow] to a Spark [Row] object.
*
* @return The converted Spark [Row].
*/
fun DataRow<*>.convertToSpark(): Row =
RowFactory.create(
*values().map {
when (it) {
// a row can be nested inside another row if it's a column group
is DataRow<*> -> it.convertToSpark()
is DataFrame<*> -> error("nested dataframes are not supported")
else -> it
}
}.toTypedArray(),
)
/**
* Converts a [DataFrameSchema] to a Spark [StructType].
*
* @return The converted Spark [StructType].
*/
fun DataFrameSchema.convertToSpark(): StructType =
DataTypes.createStructType(
this.columns.map { (name, schema) ->
DataTypes.createStructField(name, schema.convertToSpark(), schema.nullable)
},
)
/**
* Converts a [ColumnSchema] object to Spark [DataType].
*
* @return The Spark [DataType] corresponding to the given [ColumnSchema] object.
* @throws IllegalArgumentException if the column type or kind is unknown.
*/
fun ColumnSchema.convertToSpark(): DataType =
when (this) {
is ColumnSchema.Value -> type.convertToSpark() ?: error("unknown data type: $type")
is ColumnSchema.Group -> schema.convertToSpark()
is ColumnSchema.Frame -> error("nested dataframes are not supported")
else -> error("unknown column kind: $this")
}
/**
* Returns the corresponding Spark [DataType] for a given [Kotlin type][KType].
*
* This list may be incomplete, but it can at least give you a good start.
*
* @return The Spark [DataType] that corresponds to the [Kotlin type][KType], or null if no matching [DataType] is found.
*/
fun KType.convertToSpark(): DataType? =
when {
isSubtypeOf(typeOf<Byte?>()) -> DataTypes.ByteType
isSubtypeOf(typeOf<Short?>()) -> DataTypes.ShortType
isSubtypeOf(typeOf<Int?>()) -> DataTypes.IntegerType
isSubtypeOf(typeOf<Long?>()) -> DataTypes.LongType
isSubtypeOf(typeOf<Boolean?>()) -> DataTypes.BooleanType
isSubtypeOf(typeOf<Float?>()) -> DataTypes.FloatType
isSubtypeOf(typeOf<Double?>()) -> DataTypes.DoubleType
isSubtypeOf(typeOf<String?>()) -> DataTypes.StringType
isSubtypeOf(typeOf<LocalDate?>()) -> DataTypes.DateType
isSubtypeOf(typeOf<Date?>()) -> DataTypes.DateType
isSubtypeOf(typeOf<Timestamp?>()) -> DataTypes.TimestampType
isSubtypeOf(typeOf<Instant?>()) -> DataTypes.TimestampType
isSubtypeOf(typeOf<Decimal?>()) -> DecimalType.SYSTEM_DEFAULT()
isSubtypeOf(typeOf<BigDecimal?>()) -> DecimalType.SYSTEM_DEFAULT()
isSubtypeOf(typeOf<BigInteger?>()) -> DecimalType.SYSTEM_DEFAULT()
isSubtypeOf(typeOf<CalendarInterval?>()) -> DataTypes.CalendarIntervalType
isSubtypeOf(nullableNothingType) -> DataTypes.NullType
isSubtypeOf(typeOf<ByteArray?>()) -> DataTypes.BinaryType
isSubtypeOf(typeOf<ShortArray?>()) -> DataTypes.createArrayType(DataTypes.ShortType, false)
isSubtypeOf(typeOf<IntArray?>()) -> DataTypes.createArrayType(DataTypes.IntegerType, false)
isSubtypeOf(typeOf<LongArray?>()) -> DataTypes.createArrayType(DataTypes.LongType, false)
isSubtypeOf(typeOf<FloatArray?>()) -> DataTypes.createArrayType(DataTypes.FloatType, false)
isSubtypeOf(typeOf<DoubleArray?>()) -> DataTypes.createArrayType(DataTypes.DoubleType, false)
isSubtypeOf(typeOf<BooleanArray?>()) -> DataTypes.createArrayType(DataTypes.BooleanType, false)
isSubtypeOf(typeOf<Array<*>>()) ->
error("non-primitive arrays are not supported for now, you can add it yourself")
isSubtypeOf(typeOf<List<*>>()) -> error("lists are not supported for now, you can add it yourself")
isSubtypeOf(typeOf<Set<*>>()) -> error("sets are not supported for now, you can add it yourself")
classifier == Map::class -> {
val (key, value) = arguments
DataTypes.createMapType(
key.type?.convertToSpark(),
value.type?.convertToSpark(),
value.type?.isMarkedNullable ?: true,
)
}
else -> null
}
private val nullableNothingType: KType = typeOf<List<Nothing?>>().arguments.first().type!!
// endregion
@@ -0,0 +1,105 @@
@file:Suppress("ktlint:standard:function-signature")
package org.jetbrains.kotlinx.dataframe.examples.spark
import org.apache.spark.SparkConf
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.Encoder
import org.apache.spark.sql.Encoders
import org.apache.spark.sql.SparkSession
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
import org.jetbrains.kotlinx.dataframe.api.aggregate
import org.jetbrains.kotlinx.dataframe.api.groupBy
import org.jetbrains.kotlinx.dataframe.api.max
import org.jetbrains.kotlinx.dataframe.api.mean
import org.jetbrains.kotlinx.dataframe.api.min
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.schema
import org.jetbrains.kotlinx.dataframe.api.std
import org.jetbrains.kotlinx.dataframe.api.toDataFrame
import org.jetbrains.kotlinx.dataframe.api.toList
import java.io.Serializable
/**
* For Spark, Kotlin data classes are supported if we:
* - Add [@JvmOverloads][JvmOverloads] to the constructor
* - Make all parameter arguments mutable and with defaults
* - Make them [Serializable]
*
* But by adding [@DataSchema][DataSchema] we can reuse the same class for Spark and DataFrame!
*
* See [Person] and [Name] for an example.
*
* Also, since we use an actual class to define the schema, we need no type conversion!
*
* NOTE: You will likely need to run this function with Java 8 or 11 for it to work correctly.
* Use the `runSparkTypedDataset` Gradle task to do so.
*/
fun main() {
val spark = SparkSession.builder()
.master(SparkConf().get("spark.master", "local[*]"))
.appName("Kotlin Spark Sample")
.getOrCreate()
val sc = JavaSparkContext(spark.sparkContext())
// Creating a Spark Dataset. Usually, this is loaded from some server or database.
val rawDataset: Dataset<Person> = spark.createDataset(
listOf(
Person(Name("Alice", "Cooper"), 15, "London", 54, true),
Person(Name("Bob", "Dylan"), 45, "Dubai", 87, true),
Person(Name("Charlie", "Daniels"), 20, "Moscow", null, false),
Person(Name("Charlie", "Chaplin"), 40, "Milan", null, true),
Person(Name("Bob", "Marley"), 30, "Tokyo", 68, true),
Person(Name("Alice", "Wolf"), 20, null, 55, false),
Person(Name("Charlie", "Byrd"), 30, "Moscow", 90, true),
),
beanEncoderOf(),
)
// we can perform large operations in Spark.
// DataFrames are in-memory structures, so this is a good place to limit the number of rows if you don't have the RAM ;)
val dataset = rawDataset.filter { it.age > 17 }
// and convert it to DataFrame via a typed List
val dataframe = dataset.collectAsList().toDataFrame()
dataframe.schema().print()
dataframe.print(columnTypes = true, borders = true)
// now we can use DataFrame-specific functions
val ageStats = dataframe
.groupBy { city }.aggregate {
mean { age } into "meanAge"
std { age } into "stdAge"
min { age } into "minAge"
max { age } into "maxAge"
}
ageStats.print(columnTypes = true, borders = true)
// and when we want to convert a DataFrame back to Spark, we can do the same trick via a typed List
val sparkDatasetAgain = spark.createDataset(dataframe.toList(), beanEncoderOf())
sparkDatasetAgain.printSchema()
sparkDatasetAgain.show()
spark.stop()
}
/** Creates a [bean encoder][Encoders.bean] for the given [T] instance. */
inline fun <reified T : Serializable> beanEncoderOf(): Encoder<T> = Encoders.bean(T::class.java)
@DataSchema
data class Name
@JvmOverloads
constructor(var firstName: String = "", var lastName: String = "") : Serializable
@DataSchema
data class Person
@JvmOverloads
constructor(
var name: Name = Name(),
var age: Int = -1,
var city: String? = null,
var weight: Int? = null,
var isHappy: Boolean = false,
) : Serializable
@@ -0,0 +1,87 @@
@file:Suppress("ktlint:standard:function-signature")
package org.jetbrains.kotlinx.dataframe.examples.spark
import org.apache.spark.SparkConf
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
import org.jetbrains.kotlinx.dataframe.api.aggregate
import org.jetbrains.kotlinx.dataframe.api.groupBy
import org.jetbrains.kotlinx.dataframe.api.max
import org.jetbrains.kotlinx.dataframe.api.mean
import org.jetbrains.kotlinx.dataframe.api.min
import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.schema
import org.jetbrains.kotlinx.dataframe.api.std
import org.jetbrains.kotlinx.dataframe.examples.spark.convertToDataFrame
import org.jetbrains.kotlinx.dataframe.examples.spark.convertToDataFrameByInference
import org.jetbrains.kotlinx.dataframe.examples.spark.convertToSpark
import org.jetbrains.kotlinx.spark.api.col
import org.jetbrains.kotlinx.spark.api.gt
/**
* Since we don't know the schema at compile time this time, we need to do
* some schema mapping in between Spark and DataFrame.
*
* We will use spark/compatibilityLayer.kt to do this.
* Take a look at that file for the implementation details!
*
* NOTE: You will likely need to run this function with Java 8 or 11 for it to work correctly.
* Use the `runSparkUntypedDataset` Gradle task to do so.
*/
fun main() {
val spark = SparkSession.builder()
.master(SparkConf().get("spark.master", "local[*]"))
.appName("Kotlin Spark Sample")
.getOrCreate()
val sc = JavaSparkContext(spark.sparkContext())
// Creating a Spark Dataframe (untyped Dataset). Usually, this is loaded from some server or database.
val rawDataset: Dataset<Row> = spark.createDataset(
listOf(
Person(Name("Alice", "Cooper"), 15, "London", 54, true),
Person(Name("Bob", "Dylan"), 45, "Dubai", 87, true),
Person(Name("Charlie", "Daniels"), 20, "Moscow", null, false),
Person(Name("Charlie", "Chaplin"), 40, "Milan", null, true),
Person(Name("Bob", "Marley"), 30, "Tokyo", 68, true),
Person(Name("Alice", "Wolf"), 20, null, 55, false),
Person(Name("Charlie", "Byrd"), 30, "Moscow", 90, true),
),
beanEncoderOf<Person>(),
).toDF()
// we can perform large operations in Spark.
// DataFrames are in-memory structures, so this is a good place to limit the number of rows if you don't have the RAM ;)
val dataset = rawDataset.filter(col("age") gt 17)
// Using inference
val df1 = dataset.convertToDataFrameByInference()
df1.schema().print()
df1.print(columnTypes = true, borders = true)
// Using full schema mapping
val df2 = dataset.convertToDataFrame()
df2.schema().print()
df2.print(columnTypes = true, borders = true)
// now we can use DataFrame-specific functions
val ageStats = df1
.groupBy("city").aggregate {
mean("age") into "meanAge"
std("age") into "stdAge"
min("age") into "minAge"
max("age") into "maxAge"
}
ageStats.print(columnTypes = true, borders = true)
// and when we want to convert a DataFrame back to Spark, we will use the `convertToSpark()` extension function
// This performs the necessary schema mapping under the hood.
val sparkDataset = df2.convertToSpark(spark, sc)
sparkDataset.printSchema()
sparkDataset.show()
spark.stop()
}
@@ -0,0 +1,42 @@
import org.jetbrains.kotlin.gradle.dsl.JvmTarget
import org.jetbrains.kotlin.gradle.tasks.KotlinCompile
plugins {
application
kotlin("jvm")
id("org.jetbrains.kotlinx.dataframe")
// only mandatory if `kotlin.dataframe.add.ksp=false` in gradle.properties
id("com.google.devtools.ksp")
}
repositories {
mavenCentral()
mavenLocal() // in case of local dataframe development
}
application.mainClass = "org.jetbrains.kotlinx.dataframe.examples.youtube.YoutubeKt"
dependencies {
// implementation("org.jetbrains.kotlinx:dataframe:X.Y.Z")
implementation(project(":"))
implementation(libs.kotlin.datetimeJvm)
}
tasks.withType<KotlinCompile> {
compilerOptions.jvmTarget = JvmTarget.JVM_1_8
}
kotlin {
compilerOptions {
jvmTarget = JvmTarget.JVM_1_8
freeCompilerArgs.add("-Xjdk-release=8")
}
}
tasks.withType<JavaCompile> {
sourceCompatibility = JavaVersion.VERSION_1_8.toString()
targetCompatibility = JavaVersion.VERSION_1_8.toString()
options.release.set(8)
}
@@ -0,0 +1,4 @@
package org.jetbrains.kotlinx.dataframe.examples.youtube
val apiKey: String = TODO("Insert your API key here")
const val basePath = "https://www.googleapis.com/youtube/v3"
@@ -0,0 +1,94 @@
@file:ImportDataSchema(
"SearchResponse",
"src/main/resources/searchResponse.json",
)
@file:ImportDataSchema(
"StatisticsResponse",
"src/main/resources/statisticsResponse.json",
)
package org.jetbrains.kotlinx.dataframe.examples.youtube
import kotlinx.datetime.Instant
import org.jetbrains.kotlinx.dataframe.AnyFrame
import org.jetbrains.kotlinx.dataframe.AnyRow
import org.jetbrains.kotlinx.dataframe.DataRow
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
import org.jetbrains.kotlinx.dataframe.api.*
import org.jetbrains.kotlinx.dataframe.dataTypes.IFRAME
import org.jetbrains.kotlinx.dataframe.dataTypes.IMG
import org.jetbrains.kotlinx.dataframe.io.read
import java.net.URL
fun load(path: String) = DataRow.read("$basePath/$path&key=$apiKey")
fun load(path: String, maxPages: Int): AnyFrame {
val rows = mutableListOf<AnyRow>()
var pagePath = path
do {
val row = load(pagePath)
rows.add(row)
val next = row.getValueOrNull<String>("nextPageToken")
pagePath = "$path&pageToken=$next"
} while (next != null && rows.size < maxPages)
return rows.concat()
}
fun main() {
val searchRequest = "cute%20cats"
val resultsPerPage = 50
val maxPages = 5
val videoId by column<String>("id")
val channel by columnGroup()
val videos = load("search?q=$searchRequest&maxResults=$resultsPerPage&part=snippet", maxPages)
.convertTo<SearchResponse> {
convert<String?>().with { it.toString() }
convert<Int?>().with { it ?: 0 }
}
.items.concat()
.dropNulls { id.videoId }
.select { id.videoId into videoId and snippet }
.distinct()
.parse()
.convert { colsAtAnyDepth().colsOf<URL>() }.with {
IMG(it, maxHeight = 150)
}.add("video") {
val id = videoId()
IFRAME("http://www.youtube.com/embed/$id")
}.move { snippet.title and snippet.publishTime }.toTop()
.move { snippet.channelId and snippet.channelTitle }.under(channel)
.remove { snippet }
val stats = videos[videoId]
.chunked(50)
.map {
val ids = it.joinToString("%2C")
load("videos?part=statistics&id=$ids").cast<StatisticsResponse>()
}.asColumnGroup()
.items.concat()
.select { id and statistics.allCols() }
.parse()
val withStat = videos.join(stats) { videoId match right.id }
val viewCount by column<Int>()
val publishTime by column<Instant>()
val channels = withStat
.groupBy { channel }.sum { viewCount }
.sortByDesc { viewCount }
.flatten()
channels.print(borders = true, columnTypes = true)
val growth = withStat
.select { publishTime and viewCount }
.convert { publishTime and viewCount }.toLong()
.sortBy { publishTime }
.cumSum { viewCount }
growth.print(borders = true, columnTypes = true)
}
@@ -0,0 +1,182 @@
{
"kind": "youtube#searchListResponse",
"etag": "nl77cg-yrK-TW2q2RtoGXrdkkfo",
"nextPageToken": "CAUQAA",
"regionCode": "NL",
"pageInfo": {
"totalResults": 1000000,
"resultsPerPage": 5
},
"items": [
{
"kind": "youtube#searchResult",
"etag": "gsRtDXx5RZlp-qILhP65o2oF-go",
"id": {
"kind": "youtube#video",
"videoId": "Dix58mO0Pbc"
},
"snippet": {
"publishedAt": "2022-08-10T14:30:04Z",
"channelId": "UC7wafFu5c8AO0YF5U7R7xFA",
"title": "Cat TV for Cats to Watch 😺 Summer birds and ducks by the lake 🐦 Cute squirrels 🐿 8 Hours(4K HDR)",
"description": "8 hours of pleasing video for cats, dogs, parrots, or other nature lovers to enjoy. It can relax your kitten or puppy and minimize ...",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/Dix58mO0Pbc/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/Dix58mO0Pbc/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/Dix58mO0Pbc/hqdefault.jpg",
"width": 480,
"height": 360
}
},
"channelTitle": "Birder King",
"liveBroadcastContent": "none",
"publishTime": "2022-08-10T14:30:04Z"
}
},
{
"kind": "youtube#searchResult",
"etag": "_7QEwCZHKtgnPTcYmsxNaol-I0Q",
"id": {
"kind": "youtube#video",
"videoId": "bGsN7jzp5DE"
},
"snippet": {
"publishedAt": "2022-08-09T17:00:30Z",
"channelId": "UCINb0wqPz-A0dV9nARjJlOQ",
"title": "Cat Is Obsessed With His Tiny Love Bird | The Dodo Odd Couples",
"description": "This cat is glued to his favorite little love bird and even climbs inside her cage to hang out longer This video is dedicated to ...",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/bGsN7jzp5DE/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/bGsN7jzp5DE/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/bGsN7jzp5DE/hqdefault.jpg",
"width": 480,
"height": 360
}
},
"channelTitle": "The Dodo",
"liveBroadcastContent": "none",
"publishTime": "2022-08-09T17:00:30Z"
}
},
{
"kind": "youtube#searchResult",
"etag": "IHNyBgppiApI3KGzkUV5AuPMftM",
"id": {
"kind": "youtube#video",
"videoId": "U1OxDRxNEMM"
},
"snippet": {
"publishedAt": "2022-08-10T14:45:00Z",
"channelId": "UCcnThqTwvub5ykbII9WkR5g",
"title": "Funny animals - Funny cats / dogs - Funny animal videos 218",
"description": "Funny animals! Compilation number 218. Only the best! Sit back and charge positively Funny animal videos (funny cats, dogs ...",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/U1OxDRxNEMM/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/U1OxDRxNEMM/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/U1OxDRxNEMM/hqdefault.jpg",
"width": 480,
"height": 360
}
},
"channelTitle": "Happy Dog",
"liveBroadcastContent": "none",
"publishTime": "2022-08-10T14:45:00Z"
}
},
{
"kind": "youtube#searchResult",
"etag": "0OYjMrtwAzJH7yo_jOPvF2hwSao",
"id": {
"kind": "youtube#video",
"videoId": "ByH9LuSILxU"
},
"snippet": {
"publishedAt": "2020-06-19T02:18:53Z",
"channelId": "UC8hC-augAnujJeprhjI0YkA",
"title": "Baby Cats - Cute and Funny Cat Videos Compilation #34 | Aww Animals",
"description": "Baby cats are amazing creature because they are the cutest and most funny. Watching funny baby cats is the hardest try not to ...",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/ByH9LuSILxU/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/ByH9LuSILxU/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/ByH9LuSILxU/hqdefault.jpg",
"width": 480,
"height": 360
}
},
"channelTitle": "Aww Animals",
"liveBroadcastContent": "none",
"publishTime": "2020-06-19T02:18:53Z"
}
},
{
"kind": "youtube#searchResult",
"etag": "S1UukOVi_sofJQLHSU0jX5GSv2M",
"id": {
"kind": "youtube#video",
"videoId": "VkqVsCPAIag"
},
"snippet": {
"publishedAt": "2022-08-10T11:03:15Z",
"channelId": "UCHBnS9TR-4h2nvuiiq3XCAA",
"title": "Awesome SO Cute Cat ! Cute and Funny Cat Videos to Keep You Smiling! 🐱",
"description": "The featured clips in our video are used with permission from the original video owners. The highlight clips can be done by our ...",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/VkqVsCPAIag/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/VkqVsCPAIag/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/VkqVsCPAIag/hqdefault.jpg",
"width": 480,
"height": 360
}
},
"channelTitle": "Best awesome",
"liveBroadcastContent": "none",
"publishTime": "2022-08-10T11:03:15Z"
}
}
]
}
@@ -0,0 +1,21 @@
{
"kind": "youtube#videoListResponse",
"etag": "rHk7psLWXLIjjx8rGeiNKUxrD-s",
"items": [
{
"kind": "youtube#video",
"etag": "hiKGiry1Gc19FmHigb3sMfjnzP8",
"id": "uHKfrz65KSU",
"statistics": {
"viewCount": "67715094",
"likeCount": "641192",
"favoriteCount": "0",
"commentCount": "22174"
}
}
],
"pageInfo": {
"totalResults": 1,
"resultsPerPage": 1
}
}
@@ -0,0 +1,14 @@
# Kotlin DataFrame Compiler Gradle Plugin Example
An IntelliJ IDEA Gradle Kotlin project demonstrating the use of the
[Kotlin DataFrame Compiler Plugin](https://kotlin.github.io/dataframe/compiler-plugin.html).
We recommend using an up-to-date IntelliJ IDEA for the best experience,
as well as the latest Kotlin plugin version.
> [!WARNING]
> For proper functionality in IntelliJ IDEA requires version 2025.2 or newer.
[Download Kotlin DataFrame Compiler Plugin Gradle Example](https://github.com/Kotlin/dataframe/raw/example-projects-archives/kotlin-dataframe-plugin-gradle-example.zip)
See also [Kotlin DataFrame Compiler Maven Plugin Example](../kotlin-dataframe-plugin-maven-example)
@@ -0,0 +1,39 @@
import org.jlleitschuh.gradle.ktlint.KtlintExtension
plugins {
id("org.jlleitschuh.gradle.ktlint") version "12.3.0"
val kotlinVersion = "2.3.0-RC3"
kotlin("jvm") version kotlinVersion
// Add the Kotlin DataFrame Compiler plugin of the same version as the Kotlin plugin.
kotlin("plugin.dataframe") version kotlinVersion
application
}
group = "org.example"
version = "1.0-SNAPSHOT"
repositories {
mavenCentral()
}
dependencies {
// Add general `dataframe` dependency
implementation("org.jetbrains.kotlinx:dataframe:1.0.0-Beta4")
// Add `kandy` dependency
implementation("org.jetbrains.kotlinx:kandy-lets-plot:0.8.3")
testImplementation(kotlin("test"))
}
tasks.test {
useJUnitPlatform()
}
kotlin {
jvmToolchain(11)
}
configure<KtlintExtension> {
version = "1.6.0"
// rules are set up through .editorconfig
}
@@ -0,0 +1,4 @@
kotlin.code.style=official
# Disabling incremental compilation will no longer be necessary
# when https://youtrack.jetbrains.com/issue/KT-66735 is resolved.
kotlin.incremental=false
@@ -0,0 +1,7 @@
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-9.1.0-bin.zip
networkTimeout=10000
validateDistributionUrl=true
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
@@ -0,0 +1,11 @@
pluginManagement {
repositories {
maven("https://packages.jetbrains.team/maven/p/kt/dev/")
mavenCentral()
gradlePluginPortal()
}
}
plugins {
id("org.gradle.toolchains.foojay-resolver-convention") version "1.0.0"
}
rootProject.name = "kotlin-dataframe-plugin-gradle-example"
@@ -0,0 +1,110 @@
package org.jetbrains.kotlinx.dataframe.examples.plugin
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
import org.jetbrains.kotlinx.dataframe.api.add
import org.jetbrains.kotlinx.dataframe.api.aggregate
import org.jetbrains.kotlinx.dataframe.api.convert
import org.jetbrains.kotlinx.dataframe.api.convertTo
import org.jetbrains.kotlinx.dataframe.api.filter
import org.jetbrains.kotlinx.dataframe.api.groupBy
import org.jetbrains.kotlinx.dataframe.api.into
import org.jetbrains.kotlinx.dataframe.api.max
import org.jetbrains.kotlinx.dataframe.api.rename
import org.jetbrains.kotlinx.dataframe.api.renameToCamelCase
import org.jetbrains.kotlinx.dataframe.api.with
import org.jetbrains.kotlinx.dataframe.io.readCsv
import org.jetbrains.kotlinx.dataframe.io.writeCsv
import org.jetbrains.kotlinx.kandy.dsl.plot
import org.jetbrains.kotlinx.kandy.letsplot.export.save
import org.jetbrains.kotlinx.kandy.letsplot.feature.layout
import org.jetbrains.kotlinx.kandy.letsplot.layers.bars
import java.net.URL
// Declare data schema for the DataFrame from jetbrains_repositories.csv.
@DataSchema
data class Repositories(
val full_name: String,
val html_url: URL,
val stargazers_count: Int,
val topics: String,
val watchers: Int,
)
// Define kinds of repositories.
enum class RepoKind {
Kotlin,
IntelliJ,
Other,
}
// A rule for determining the kind of repository based on its name and topics.
fun getKind(fullName: String, topics: List<String>): RepoKind {
fun checkContains(name: String) = name in topics || fullName.lowercase().contains(name)
return when {
checkContains("kotlin") -> RepoKind.Kotlin
checkContains("idea") || checkContains("intellij") -> RepoKind.IntelliJ
else -> RepoKind.Other
}
}
fun main() {
val repos = DataFrame
// Read DataFrame from the CSV file.
.readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
// And convert it to match the `Repositories` schema.
.convertTo<Repositories>()
// With Compiler Plugin, the DataFrame schema changes immediately after each operation:
// For example, if a new column is added or the old one is renamed (or its type is changed)
// during the operation, you can use the new name immediately in the following operations:
repos
// Add a new "name" column...
.add("name") { full_name.substringAfterLast("/") }
// ... and now we can use "name" extension in DataFrame operations, such as `filter`.
.filter { name.lowercase().contains("kotlin") }
// Let's update the DataFrame with some operations using these features.
val reposUpdated = repos
// Rename columns to CamelCase.
// Note that after that, in the following operations, extension properties will have
// new names corresponding to the column names.
.renameToCamelCase()
// Rename "stargazersCount" column to "stars".
.rename { stargazersCount }.into("stars")
// And we can immediately use the updated name in the filtering.
.filter { stars > 50 }
// Convert values in the "topic" column (which were `String` initially)
// to the list of topics.
.convert { topics }.with {
val inner = it.removeSurrounding("[", "]")
if (inner.isEmpty()) emptyList() else inner.split(',').map(String::trim)
}
// Now "topics" is a `List<String>` column.
// Add a new column with the number of topics.
.add("topicCount") { topics.size }
// Add a new column with the kind of repository.
.add("kind") { getKind(fullName, topics) }
// Write the updated DataFrame to a CSV file.
reposUpdated.writeCsv("jetbrains_repositories_new.csv")
reposUpdated
// Group repositories by kind
.groupBy { kind }
// And then compute the maximum stars in each group.
.aggregate {
max { stars } into "maxStars"
}
// Build a bar plot showing the maximum number of stars per repository kind.
.plot {
bars {
x(kind)
y(maxStars)
}
layout.title = "Max stars per repo kind"
}
// Save the plot to an SVG file.
.save("kindToStars.svg")
}
@@ -0,0 +1,39 @@
target/
!.mvn/wrapper/maven-wrapper.jar
!**/src/main/**/target/
!**/src/test/**/target/
.kotlin
### IntelliJ IDEA ###
.idea/modules.xml
.idea/jarRepositories.xml
.idea/compiler.xml
.idea/libraries/
*.iws
*.iml
*.ipr
### Eclipse ###
.apt_generated
.classpath
.factorypath
.project
.settings
.springBeans
.sts4-cache
### NetBeans ###
/nbproject/private/
/nbbuild/
/dist/
/nbdist/
/.nb-gradle/
build/
!**/src/main/**/build/
!**/src/test/**/build/
### VS Code ###
.vscode/
### Mac OS ###
.DS_Store
@@ -0,0 +1,14 @@
# Kotlin DataFrame Compiler Maven Plugin Example
An IntelliJ IDEA Maven Kotlin project demonstrating the use of the
[Kotlin DataFrame Compiler Plugin](https://kotlin.github.io/dataframe/compiler-plugin.html).
We recommend using an up-to-date IntelliJ IDEA for the best experience,
as well as the latest Kotlin plugin version.
> [!WARNING]
> For proper functionality in IntelliJ IDEA requires version 2025.3 or newer.
> [Download Kotlin DataFrame Compiler Plugin Maven Example](https://github.com/Kotlin/dataframe/raw/example-projects-archives/kotlin-dataframe-plugin-maven-example.zip)
See also [Kotlin DataFrame Compiler Gradle Plugin Example](../kotlin-dataframe-plugin-gradle-example)
@@ -0,0 +1,112 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.example</groupId>
<artifactId>dataframe_maven</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<kotlin.code.style>official</kotlin.code.style>
<kotlin.compiler.jvmTarget>11</kotlin.compiler.jvmTarget>
</properties>
<repositories>
<repository>
<id>mavenCentral</id>
<url>https://repo1.maven.org/maven2/</url>
</repository>
</repositories>
<build>
<sourceDirectory>src/main/kotlin</sourceDirectory>
<testSourceDirectory>src/test/kotlin</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-maven-plugin</artifactId>
<version>2.3.0-RC3</version>
<configuration>
<compilerPlugins>
<plugin>kotlin-dataframe</plugin>
</compilerPlugins>
</configuration>
<dependencies>
<dependency>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-maven-dataframe</artifactId>
<version>2.3.0-RC3</version>
</dependency>
</dependencies>
<executions>
<execution>
<id>compile</id>
<phase>compile</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>test-compile</id>
<phase>test-compile</phase>
<goals>
<goal>test-compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.2</version>
</plugin>
<plugin>
<artifactId>maven-failsafe-plugin</artifactId>
<version>2.22.2</version>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.6.0</version>
<configuration>
<mainClass>MainKt</mainClass>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-test-junit5</artifactId>
<version>2.3.0-RC3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.10.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-stdlib</artifactId>
<version>2.3.0-RC3</version>
</dependency>
<!-- DataFrame and Kandy dependencies -->
<dependency>
<groupId>org.jetbrains.kotlinx</groupId>
<artifactId>dataframe</artifactId>
<version>1.0.0-Beta4</version>
</dependency>
<dependency>
<groupId>org.jetbrains.kotlinx</groupId>
<artifactId>kandy-lets-plot</artifactId>
<version>0.8.3</version>
</dependency>
</dependencies>
</project>
@@ -0,0 +1,110 @@
package org.jetbrains.kotlinx.dataframe.examples.plugin
import org.jetbrains.kotlinx.dataframe.DataFrame
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
import org.jetbrains.kotlinx.dataframe.api.add
import org.jetbrains.kotlinx.dataframe.api.aggregate
import org.jetbrains.kotlinx.dataframe.api.convert
import org.jetbrains.kotlinx.dataframe.api.convertTo
import org.jetbrains.kotlinx.dataframe.api.filter
import org.jetbrains.kotlinx.dataframe.api.groupBy
import org.jetbrains.kotlinx.dataframe.api.into
import org.jetbrains.kotlinx.dataframe.api.max
import org.jetbrains.kotlinx.dataframe.api.rename
import org.jetbrains.kotlinx.dataframe.api.renameToCamelCase
import org.jetbrains.kotlinx.dataframe.api.with
import org.jetbrains.kotlinx.dataframe.io.readCsv
import org.jetbrains.kotlinx.dataframe.io.writeCsv
import org.jetbrains.kotlinx.kandy.dsl.plot
import org.jetbrains.kotlinx.kandy.letsplot.export.save
import org.jetbrains.kotlinx.kandy.letsplot.feature.layout
import org.jetbrains.kotlinx.kandy.letsplot.layers.bars
import java.net.URL
// Declare data schema for the DataFrame from jetbrains_repositories.csv.
@DataSchema
data class Repositories(
val full_name: String,
val html_url: URL,
val stargazers_count: Int,
val topics: String,
val watchers: Int,
)
// Define kinds of repositories.
enum class RepoKind {
Kotlin,
IntelliJ,
Other,
}
// A rule for determining the kind of repository based on its name and topics.
fun getKind(fullName: String, topics: List<String>): RepoKind {
fun checkContains(name: String) = name in topics || fullName.lowercase().contains(name)
return when {
checkContains("kotlin") -> RepoKind.Kotlin
checkContains("idea") || checkContains("intellij") -> RepoKind.IntelliJ
else -> RepoKind.Other
}
}
fun main() {
val repos = DataFrame
// Read DataFrame from the CSV file.
.readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
// And convert it to match the `Repositories` schema.
.convertTo<Repositories>()
// With Compiler Plugin, the DataFrame schema changes immediately after each operation:
// For example, if a new column is added or the old one is renamed (or its type is changed)
// during the operation, you can use the new name immediately in the following operations:
repos
// Add a new "name" column...
.add("name") { full_name.substringAfterLast("/") }
// ... and now we can use "name" extension in DataFrame operations, such as `filter`.
.filter { name.lowercase().contains("kotlin") }
// Let's update the DataFrame with some operations using these features.
val reposUpdated = repos
// Rename columns to CamelCase.
// Note that after that, in the following operations, extension properties will have
// new names corresponding to the column names.
.renameToCamelCase()
// Rename "stargazersCount" column to "stars".
.rename { stargazersCount }.into("stars")
// And we can immediately use the updated name in the filtering.
.filter { stars > 50 }
// Convert values in the "topic" column (which were `String` initially)
// to the list of topics.
.convert { topics }.with {
val inner = it.removeSurrounding("[", "]")
if (inner.isEmpty()) emptyList() else inner.split(',').map(String::trim)
}
// Now "topics" is a `List<String>` column.
// Add a new column with the number of topics.
.add("topicCount") { topics.size }
// Add a new column with the kind of repository.
.add("kind") { getKind(fullName, topics) }
// Write the updated DataFrame to a CSV file.
reposUpdated.writeCsv("jetbrains_repositories_new.csv")
reposUpdated
// Group repositories by kind
.groupBy { kind }
// And then compute the maximum stars in each group.
.aggregate {
max { stars } into "maxStars"
}
// Build a bar plot showing the maximum number of stars per repository kind.
.plot {
bars {
x(kind)
y(maxStars)
}
layout.title = "Max stars per repo kind"
}
// Save the plot to an SVG file.
.save("kindToStars.svg")
}
File diff suppressed because one or more lines are too long
@@ -0,0 +1,304 @@
# DEMO for DataFrame, this might differ from the actual API (it's updated a bit)
openapi: 3.0.0
info:
version: 2.0.2
title: APIs.guru
description: >
Wikipedia for Web APIs. Repository of API specs in OpenAPI format.
**Warning**: If you want to be notified about changes in advance please join our [Slack channel](https://join.slack.com/t/mermade/shared_invite/zt-g78g7xir-MLE_CTCcXCdfJfG3CJe9qA).
Client sample: [[Demo]](https://apis.guru/simple-ui) [[Repo]](https://github.com/APIs-guru/simple-ui)
contact:
name: APIs.guru
url: https://APIs.guru
email: mike.ralphson@gmail.com
license:
name: CC0 1.0
url: https://github.com/APIs-guru/openapi-directory#licenses
x-logo:
url: https://apis.guru/branding/logo_vertical.svg
externalDocs:
url: https://github.com/APIs-guru/openapi-directory/blob/master/API.md
security: [ ]
tags:
- name: APIs
description: Actions relating to APIs in the collection
paths:
/list.json:
get:
operationId: listAPIs
tags:
- APIs
summary: List all APIs
description: >
List all APIs in the directory.
Returns links to OpenAPI specification for each API in the directory.
If API exist in multiple versions `preferred` one is explicitly marked.
Some basic info from OpenAPI spec is cached inside each object.
This allows to generate some simple views without need to fetch OpenAPI spec for each API.
responses:
"200":
description: OK
content:
application/json; charset=utf-8:
schema:
$ref: "#/components/schemas/APIs"
application/json:
schema:
$ref: "#/components/schemas/APIs"
/metrics.json:
get:
operationId: getMetrics
summary: Get basic metrics
description: >
Some basic metrics for the entire directory.
Just stunning numbers to put on a front page and are intended purely for WoW effect :)
tags:
- APIs
responses:
"200":
description: OK
content:
application/json; charset=utf-8:
schema:
$ref: "#/components/schemas/Metrics"
application/json:
schema:
$ref: "#/components/schemas/Metrics"
components:
schemas:
APIs:
description: |
List of API details.
It is a JSON object with API IDs(`<provider>[:<service>]`) as keys.
type: object
additionalProperties:
$ref: "#/components/schemas/API"
minProperties: 1
example:
googleapis.com:drive:
added: 2015-02-22T20:00:45.000Z
preferred: v3
versions:
v2:
added: 2015-02-22T20:00:45.000Z
info:
title: Drive
version: v2
x-apiClientRegistration:
url: https://console.developers.google.com
x-logo:
url: https://api.apis.guru/v2/cache/logo/https_www.gstatic.com_images_icons_material_product_2x_drive_32dp.png
x-origin:
format: google
url: https://www.googleapis.com/discovery/v1/apis/drive/v2/rest
version: v1
x-preferred: false
x-providerName: googleapis.com
x-serviceName: drive
swaggerUrl: https://api.apis.guru/v2/specs/googleapis.com/drive/v2/swagger.json
swaggerYamlUrl: https://api.apis.guru/v2/specs/googleapis.com/drive/v2/swagger.yaml
updated: 2016-06-17T00:21:44.000Z
v3:
added: 2015-12-12T00:25:13.000Z
info:
title: Drive
version: v3
x-apiClientRegistration:
url: https://console.developers.google.com
x-logo:
url: https://api.apis.guru/v2/cache/logo/https_www.gstatic.com_images_icons_material_product_2x_drive_32dp.png
x-origin:
format: google
url: https://www.googleapis.com/discovery/v1/apis/drive/v3/rest
version: v1
x-preferred: true
x-providerName: googleapis.com
x-serviceName: drive
swaggerUrl: https://api.apis.guru/v2/specs/googleapis.com/drive/v3/swagger.json
swaggerYamlUrl: https://api.apis.guru/v2/specs/googleapis.com/drive/v3/swagger.yaml
updated: 2016-06-17T00:21:44.000Z
API:
description: Meta information about API
type: object
required:
- added
- preferred
- versions
properties:
added:
description: Timestamp when the API was first added to the directory
type: string
format: date-time
preferred:
description: Recommended version
type: string
versions:
description: List of supported versions of the API
type: object
additionalProperties:
$ref: "#/components/schemas/ApiVersion"
minProperties: 1
additionalProperties: false
ApiVersion:
type: object
required:
- added
# - updated apparently not required!
- swaggerUrl
- swaggerYamlUrl
- info
- openapiVer
properties:
added:
description: Timestamp when the version was added
type: string
format: date-time
updated: # apparently not required!
description: Timestamp when the version was updated
type: string
format: date-time
swaggerUrl:
description: URL to OpenAPI definition in JSON format
type: string
format: url
swaggerYamlUrl:
description: URL to OpenAPI definition in YAML format
type: string
format: url
info:
description: Copy of `info` section from OpenAPI definition
type: object
minProperties: 1
externalDocs:
description: Copy of `externalDocs` section from OpenAPI definition
type: object
minProperties: 1
openapiVer:
description: OpenAPI version
type: string
additionalProperties: false
Metrics:
description: List of basic metrics
type: object
required:
- numSpecs
- numAPIs
- numEndpoints
- unreachable
- invalid
- unofficial
- fixes
- fixedPct
- datasets
- stars
- issues
- thisWeek
properties:
numSpecs:
description: Number of API specifications including different versions of the
same API
type: integer
minimum: 1
numAPIs:
description: Number of APIs
type: integer
minimum: 1
numEndpoints:
description: Total number of endpoints inside all specifications
type: integer
minimum: 1
unreachable:
description: Number of unreachable specifications
type: integer
minimum: 0
invalid:
description: Number of invalid specifications
type: integer
minimum: 0
unofficial:
description: Number of unofficial specifications
type: integer
minimum: 0
fixes:
description: Number of fixes applied to specifications
type: integer
minimum: 0
fixedPct:
description: Percentage of fixed specifications
type: number
minimum: 0
maximum: 100
datasets:
description: An overview of the datasets used to gather the APIs
type: array
items:
description: A single metric per dataset
type: object
required:
- title
- data
properties:
title:
description: Title of the metric
type: string
data:
description: Value of the metric per dataset
type: object
additionalProperties:
type: integer
minimum: 0
stars:
description: Number of stars on GitHub
type: integer
minimum: 0
issues:
description: Number of issues on GitHub
type: integer
minimum: 0
thisWeek:
description: Number of new specifications added/updated this week
type: object
required:
- added
- updated
properties:
added:
description: Number of new specifications added this week
type: integer
minimum: 0
updated:
description: Number of specifications updated this week
type: integer
minimum: 0
additionalProperties: false
example:
numSpecs: 1000
numAPIs: 100
numEndpoints: 10000
unreachable: 10
invalid: 10
unofficial: 10
fixes: 10
fixedPct: 10
datasets:
- title: providerCount
data:
"a.com": 10
"b.com": 20
"c.com": 30
stars: 1000
issues: 100
thisWeek:
added: 10
updated: 10
File diff suppressed because one or more lines are too long
@@ -0,0 +1,42 @@
{
"numSpecs": 3809,
"numAPIs": 2362,
"numEndpoints": 79405,
"unreachable": 138,
"invalid": 634,
"unofficial": 24,
"fixes": 34001,
"fixedPct": 21,
"datasets": [
{
"title": "providerCount",
"data": {
"adyen.com": 69,
"amazonaws.com": 295,
"apideck.com": 14,
"apisetu.gov.in": 181,
"azure.com": 1832,
"ebay.com": 20,
"fungenerators.com": 12,
"googleapis.com": 443,
"hubapi.com": 11,
"interzoid.com": 20,
"mastercard.com": 14,
"microsoft.com": 27,
"nexmo.com": 20,
"nytimes.com": 11,
"parliament.uk": 11,
"sportsdata.io": 35,
"twilio.com": 41,
"windows.net": 10,
"Others": 743
}
}
],
"stars": 2964,
"issues": 206,
"thisWeek": {
"added": 123,
"updated": 119
}
}
File diff suppressed because one or more lines are too long
+21
View File
@@ -0,0 +1,21 @@
movieId,title,genres
9b30aff7943f44579e92c261f3adc193,Women in Black (1997),Fantasy|Suspenseful|Comedy
2a1ba1fc5caf492a80188e032995843e,Bumblebee Movie (2007),Comedy|Jazz|Family|Animation
f44ceb4771504342bb856d76c112d5a6,Magical School Boy and the Rock of Wise Men (2001),Fantasy|Growing up|Magic
43d02fb064514ff3bd30d1e3a7398357,Master of the Jewlery: The Company of the Jewel (2001),Fantasy|Magic|Suspenseful
6aa0d26a483148998c250b9c80ddf550,Sun Conflicts: Part IV: A Novel Espair (1977),Fantasy
eace16e59ce24eff90bf8924eb6a926c,The Outstanding Bulk (2008),Fantasy|Superhero|Family
ae916bc4844a4bb7b42b70d9573d05cd,In Automata (2014),Horror|Existential
c1f0a868aeb44c5ea8d154ec3ca295ac,Interplanetary (2014),Sci-fi|Futuristic
9595b771f87f42a3b8dd07d91e7cb328,Woods Run (1994),Family|Drama
aa9fc400e068443488b259ea0802a975,Anthropod-Dude (2002),Superhero|Fantasy|Family|Growing up
22d20c2ba11d44cab83aceea39dc00bd,The Chamber (2003),Comedy|Drama
8cf4d0c1bd7b41fab6af9d92c892141f,That Thing About an Iceberg (1997),Drama|History|Family|Romance
c2f3e7588da84684a7d78d6bd8d8e1f4,Vehicles (2006),Animation|Family
ce06175106af4105945f245161eac3c7,Playthings Tale (1995),Animation|Family
ee28d7e69103485c83e10b8055ef15fb,Metal Man 2 (2010),Fantasy|Superhero|Family
c32bdeed466f4ec09de828bb4b6fc649,Surgeon Odd in the Omniverse of Crazy (2022),Fantasy|Superhero|Family|Horror
d4a325ab648a42c4a2d6f35dfabb387f,Bad Dream on Pine Street (1984),Horror
60ebe74947234ddcab49dea1a958faed,The Shimmering (1980),Horror
f24327f2b05147b197ca34bf13ae3524,Krubit: Societal Teachings for Do Many Good Amazing Country of Uzbekistan (2006),Comedy
2bb29b3a245e434fa80542e711fd2cee,This is No Movie (1950),(no genres listed)
1 movieId title genres
2 9b30aff7943f44579e92c261f3adc193 Women in Black (1997) Fantasy|Suspenseful|Comedy
3 2a1ba1fc5caf492a80188e032995843e Bumblebee Movie (2007) Comedy|Jazz|Family|Animation
4 f44ceb4771504342bb856d76c112d5a6 Magical School Boy and the Rock of Wise Men (2001) Fantasy|Growing up|Magic
5 43d02fb064514ff3bd30d1e3a7398357 Master of the Jewlery: The Company of the Jewel (2001) Fantasy|Magic|Suspenseful
6 6aa0d26a483148998c250b9c80ddf550 Sun Conflicts: Part IV: A Novel Espair (1977) Fantasy
7 eace16e59ce24eff90bf8924eb6a926c The Outstanding Bulk (2008) Fantasy|Superhero|Family
8 ae916bc4844a4bb7b42b70d9573d05cd In Automata (2014) Horror|Existential
9 c1f0a868aeb44c5ea8d154ec3ca295ac Interplanetary (2014) Sci-fi|Futuristic
10 9595b771f87f42a3b8dd07d91e7cb328 Woods Run (1994) Family|Drama
11 aa9fc400e068443488b259ea0802a975 Anthropod-Dude (2002) Superhero|Fantasy|Family|Growing up
12 22d20c2ba11d44cab83aceea39dc00bd The Chamber (2003) Comedy|Drama
13 8cf4d0c1bd7b41fab6af9d92c892141f That Thing About an Iceberg (1997) Drama|History|Family|Romance
14 c2f3e7588da84684a7d78d6bd8d8e1f4 Vehicles (2006) Animation|Family
15 ce06175106af4105945f245161eac3c7 Playthings Tale (1995) Animation|Family
16 ee28d7e69103485c83e10b8055ef15fb Metal Man 2 (2010) Fantasy|Superhero|Family
17 c32bdeed466f4ec09de828bb4b6fc649 Surgeon Odd in the Omniverse of Crazy (2022) Fantasy|Superhero|Family|Horror
18 d4a325ab648a42c4a2d6f35dfabb387f Bad Dream on Pine Street (1984) Horror
19 60ebe74947234ddcab49dea1a958faed The Shimmering (1980) Horror
20 f24327f2b05147b197ca34bf13ae3524 Krubit: Societal Teachings for Do Many Good Amazing Country of Uzbekistan (2006) Comedy
21 2bb29b3a245e434fa80542e711fd2cee This is No Movie (1950) (no genres listed)

Some files were not shown because too many files have changed in this diff Show More