init research
This commit is contained in:
+245
@@ -0,0 +1,245 @@
|
||||
# Kotlin DataFrame for SQL & Backend Developers
|
||||
|
||||
<web-summary>
|
||||
Quickly transition from SQL to Kotlin DataFrame: load your datasets, perform essential transformations, and visualize your results — directly within a Kotlin Notebook.
|
||||
</web-summary>
|
||||
|
||||
<card-summary>
|
||||
Switching from SQL? Kotlin DataFrame makes it easy to load, process, analyze, and visualize your data — fully interactive and type-safe!
|
||||
</card-summary>
|
||||
|
||||
<link-summary>
|
||||
Explore Kotlin DataFrame as a SQL or ORM user: read your data, transform columns, group or join tables, and build insightful visualizations with Kotlin Notebook.
|
||||
</link-summary>
|
||||
|
||||
This guide helps Kotlin backend developers with SQL experience quickly adapt to **Kotlin DataFrame**, mapping familiar
|
||||
SQL and ORM operations to DataFrame concepts.
|
||||
|
||||
If you plan to work on a Gradle project without a Kotlin Notebook,
|
||||
we recommend installing the library together with our [**experimental Kotlin compiler plugin**](Compiler-Plugin.md) (available since version 2.2.*).
|
||||
This plugin generates type-safe schemas at compile time,
|
||||
tracking schema changes throughout your data pipeline.
|
||||
|
||||
## Add Kotlin DataFrame Gradle dependency
|
||||
|
||||
You could read more about the setup of the Gradle build in the [Gradle Setup Guide](SetupGradle.md).
|
||||
|
||||
In your Gradle build file (`build.gradle` or `build.gradle.kts`), add the Kotlin DataFrame library as a dependency:
|
||||
|
||||
<tabs>
|
||||
<tab title="Kotlin DSL">
|
||||
|
||||
```kotlin
|
||||
dependencies {
|
||||
implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
|
||||
}
|
||||
```
|
||||
|
||||
</tab>
|
||||
|
||||
<tab title="Groovy DSL">
|
||||
|
||||
```groovy
|
||||
dependencies {
|
||||
implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%'
|
||||
}
|
||||
```
|
||||
|
||||
</tab>
|
||||
</tabs>
|
||||
|
||||
---
|
||||
|
||||
## 1. What is a dataframe?
|
||||
|
||||
If you’re used to SQL, a **dataframe** is conceptually like a **table**:
|
||||
|
||||
- **Rows**: ordered records of data
|
||||
- **Columns**: named, typed fields
|
||||
- **Schema**: a mapping of column names to types
|
||||
|
||||
Kotlin DataFrame also supports [**hierarchical, JSON-like data**](hierarchical.md) —
|
||||
columns can contain *[nested dataframes](DataColumn.md#framecolumn)* or *column groups*,
|
||||
allowing you to represent and transform tree-like structures without flattening.
|
||||
|
||||
Unlike a relational DB table:
|
||||
|
||||
- A DataFrame object **lives in memory** — there’s no storage engine or transaction log
|
||||
- It’s **immutable** — each operation produces a *new* DataFrame
|
||||
- There is **no concept of foreign keys or relations** between DataFrames
|
||||
- It can be created from
|
||||
*any* [source](Data-Sources.md): [CSV](CSV-TSV.md), [JSON](JSON.md), [SQL tables](SQL.md), [Apache Arrow](ApacheArrow.md),
|
||||
in-memory objects
|
||||
|
||||
---
|
||||
|
||||
## 2. Reading Data From SQL
|
||||
|
||||
Kotlin DataFrame integrates with JDBC, so you can bring SQL data into memory for analysis.
|
||||
|
||||
| Approach | Example |
|
||||
|----------------------------------|---------------------------------------------------------------------|
|
||||
| **From a table** | `val df = DataFrame.readSqlTable(dbConfig, "customers")` |
|
||||
| **From a SQL query** | `val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM orders")` |
|
||||
| **From a JDBC Connection** | `val df = connection.readDataFrame("SELECT * FROM orders")` |
|
||||
| **From a ResultSet (extension)** | `val df = resultSet.readDataFrame(connection)` |
|
||||
|
||||
```kotlin
|
||||
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
|
||||
|
||||
val dbConfig = DbConnectionConfig(
|
||||
url = "jdbc:postgresql://localhost:5432/mydb",
|
||||
user = "postgres",
|
||||
password = "secret"
|
||||
)
|
||||
|
||||
// Table
|
||||
val customers = DataFrame.readSqlTable(dbConfig, "customers")
|
||||
|
||||
// Query
|
||||
val salesByRegion = DataFrame.readSqlQuery(
|
||||
dbConfig, """
|
||||
SELECT region, SUM(amount) AS total
|
||||
FROM sales
|
||||
GROUP BY region
|
||||
"""
|
||||
)
|
||||
|
||||
// From JDBC connection
|
||||
connection.readDataFrame("SELECT * FROM orders")
|
||||
|
||||
// From ResultSet
|
||||
val rs = connection.createStatement().executeQuery("SELECT * FROM orders")
|
||||
rs.readDataFrame(connection)
|
||||
```
|
||||
|
||||
More information can be found [here](readSqlDatabases.md).
|
||||
|
||||
## 3. Why It’s Not an ORM
|
||||
|
||||
Frameworks like **[Hibernate](https://hibernate.org/orm/)** or **[Exposed](https://github.com/JetBrains/Exposed)**:
|
||||
|
||||
- Map DB tables to Kotlin objects (entities)
|
||||
- Track object changes and sync them back to the database
|
||||
- Focus on **persistence** and **transactions**
|
||||
|
||||
Kotlin DataFrame:
|
||||
|
||||
- Has no persistence layer
|
||||
- Doesn’t try to map rows to mutable entities
|
||||
- Focuses on **in-memory analytics**, **transformations**, and **type-safe pipelines**
|
||||
- The **main idea** is that the schema *changes together with your transformations* — and the [**Compiler Plugin
|
||||
**](Compiler-Plugin.md) updates the type-safe API automatically under the hood.
|
||||
- You don’t have to manually define or recreate schemas every time — the plugin infers them dynamically from the data or
|
||||
transformations.
|
||||
- In ORMs, the mapping layer is **frozen** — schema changes require manual model edits and migrations.
|
||||
|
||||
Think of Kotlin DataFrame as a **data analysis/ETL tool**, not an ORM.
|
||||
|
||||
---
|
||||
|
||||
## 4. Key Differences from SQL & ORMs
|
||||
|
||||
| Feature / Concept | SQL Databases (PostgreSQL, MySQL…) | ORM (Hibernate, Exposed…) | Kotlin DataFrame |
|
||||
|----------------------------|------------------------------------|------------------------------------|---------------------------------------------------------------------|
|
||||
| **Storage** | Persistent | Persistent | In-memory only |
|
||||
| **Schema definition** | `CREATE TABLE` DDL | Defined in entity classes | Derived from data or transformations or defined manually |
|
||||
| **Schema change** | `ALTER TABLE` | Manual migration of entity classes | Automatic via transformations + Compiler Plugin or defined manually |
|
||||
| **Relations** | Foreign keys | Mapped via annotations | Not applicable |
|
||||
| **Transactions** | Yes | Yes | Not applicable |
|
||||
| **DB Indexes** | Yes | Yes (via DB) | Not applicable |
|
||||
| **Data manipulation** | SQL DML (`INSERT`, `UPDATE`) | CRUD mapped to DB | Transformations only (immutable) |
|
||||
| **Joins** | `JOIN` keyword | Eager/lazy loading | [`.join()` / `.leftJoin()` DSL](join.md) |
|
||||
| **Grouping & aggregation** | `GROUP BY` | DB query with groupBy | [`.groupBy().aggregate()`](groupBy.md) |
|
||||
| **Filtering** | `WHERE` | Criteria API / query DSL | [`.filter { ... }`](filter.md) |
|
||||
| **Permissions** | `GRANT` / `REVOKE` | DB-level permissions | Not applicable |
|
||||
| **Execution** | On DB engine | On DB engine | In JVM process |
|
||||
|
||||
---
|
||||
|
||||
## 5. SQL → Kotlin DataFrame Cheatsheet
|
||||
|
||||
### DDL Analogues
|
||||
|
||||
| SQL DDL Command / Example | Kotlin DataFrame Equivalent |
|
||||
|---------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
|
||||
| **Create table:**<br>`CREATE TABLE person (name text, age int);` | `@DataSchema`<br>`interface Person {`<br>` val name: String`<br>` val age: Int`<br>`}` |
|
||||
| **Add column:**<br>`ALTER TABLE sales ADD COLUMN profit numeric GENERATED ALWAYS AS (revenue - cost) STORED;` | `.add("profit") { revenue - cost }` |
|
||||
| **Rename column:**<br>`ALTER TABLE sales RENAME COLUMN old_name TO new_name;` | `.rename { old_name }.into("new_name")` |
|
||||
| **Drop column:**<br>`ALTER TABLE sales DROP COLUMN old_col;` | `.remove { old_col }` |
|
||||
| **Modify column type:**<br>`ALTER TABLE sales ALTER COLUMN amount TYPE numeric;` | `.convert { amount }.to<Double>()` |
|
||||
|
||||
---
|
||||
|
||||
### DML Analogues
|
||||
|
||||
| SQL DML Command / Example | Kotlin DataFrame Equivalent |
|
||||
|--------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|
|
||||
| `SELECT col1, col2` | `df.select { col1 and col2 }` |
|
||||
| `WHERE amount > 100` | `df.filter { amount > 100 }` |
|
||||
| `ORDER BY amount DESC` | `df.sortByDesc { amount }` |
|
||||
| `GROUP BY region` | `df.groupBy { region }` |
|
||||
| `SUM(amount)` | `.aggregate { sum { amount } }` |
|
||||
| `JOIN` | `.join(otherDf) { id match right.id }` |
|
||||
| `LIMIT 5` | `.take(5)` |
|
||||
| **Pivot:** <br>`SELECT * FROM crosstab('SELECT region, year, SUM(amount) FROM sales GROUP BY region, year') AS ct(region text, y2023 int, y2024 int);` | `.groupBy { region }.pivot { year }. sum { amount }` |
|
||||
| **Explode array column:** <br>`SELECT id, unnest(tags) AS tag FROM products;` | `.explode { tags }` |
|
||||
| **Update column:** <br>`UPDATE sales SET amount = amount * 1.2;` | `.update { amount }.with { it * 1.2 }` |
|
||||
|
||||
## 6. Example: SQL vs. DataFrame Side-by-Side
|
||||
|
||||
**SQL (PostgreSQL):**
|
||||
|
||||
```sql
|
||||
SELECT region, SUM(amount) AS total
|
||||
FROM sales
|
||||
WHERE amount > 0
|
||||
GROUP BY region
|
||||
ORDER BY total DESC LIMIT 5;
|
||||
```
|
||||
|
||||
```kotlin
|
||||
sales.filter { amount > 0 }
|
||||
.groupBy { region }
|
||||
.aggregate { sum { amount } into "total" }
|
||||
.sortByDesc { total }
|
||||
.take(5)
|
||||
```
|
||||
|
||||
## In Conclusion
|
||||
|
||||
- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe
|
||||
** and fully integrated into Kotlin.
|
||||
- The main focus is **readability** and schema change safety via
|
||||
the [Compiler Plugin](Compiler-Plugin.md).
|
||||
- It is neither a database nor an ORM — a Kotlin DataFrame library does not store data or manage transactions but works as an in-memory
|
||||
layer for analytics and transformations.
|
||||
- It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working
|
||||
with JSON-like structures and combining multiple data sources.
|
||||
- Use Kotlin DataFrame as a **type-safe DSL** for post-processing, merging data sources, and analytics directly on the
|
||||
JVM, while keeping your code easily refactorable and IDE-assisted.
|
||||
- Use Kotlin DataFrame for small- and average-sized datasets, but for large datasets, consider using a more
|
||||
**performant** database engine.
|
||||
|
||||
## What's Next?
|
||||
|
||||
If you're ready to go through a complete example, we recommend our **[Quickstart Guide](quickstart.md)**
|
||||
— you'll learn the basics of reading data, transforming it, and creating visualization step-by-step.
|
||||
|
||||
Ready to go deeper? Check out what’s next:
|
||||
|
||||
- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets,
|
||||
API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.
|
||||
|
||||
- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do.
|
||||
|
||||
- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md).
|
||||
|
||||
- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
|
||||
and make working with your data both convenient and type-safe.
|
||||
|
||||
- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
|
||||
for auto-generated column access in your IntelliJ IDEA projects.
|
||||
|
||||
- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations
|
||||
[Kandy Documentation](https://kotlin.github.io/kandy).
|
||||
@@ -0,0 +1,120 @@
|
||||
# Guides And Examples
|
||||
|
||||
<web-summary>
|
||||
Browse a collection of guides and examples covering key features and real-world use cases of Kotlin DataFrame — from basics to advanced data analysis.
|
||||
</web-summary>
|
||||
|
||||
<card-summary>
|
||||
Explore Kotlin DataFrame with detailed user guides and real-world examples,
|
||||
showcasing practical use cases and data workflows.
|
||||
</card-summary>
|
||||
|
||||
<link-summary>
|
||||
A curated list of Kotlin DataFrame guides and examples that walk you through common operations and data analysis patterns step by step.
|
||||
</link-summary>
|
||||
|
||||
<!--- TODO: add more guides (migration from pandas and others) and replace GH notebooks with topics --->
|
||||
|
||||
## Guides
|
||||
|
||||
Explore our structured, in-depth guides to steadily improve your Kotlin DataFrame skills — step by step.
|
||||
|
||||
* [](quickstart.md) — get started with Kotlin DataFrame in a few simple steps:
|
||||
load data, transform it, and visualize it.
|
||||
|
||||
<img src="quickstart_preview.png" border-effect="rounded" width="705"/>
|
||||
|
||||
* [](Guide-for-backend-SQL-developers.md) — migration guide for backend developers with SQL/ORM experience moving to Kotlin DataFrame
|
||||
|
||||
* [](extensionPropertiesApi.md) — learn about extension properties for [`DataFrame`](DataFrame.md)
|
||||
and make working with your data both convenient and type-safe.
|
||||
|
||||
* [Enhanced Column Selection DSL](https://blog.jetbrains.com/kotlin/2024/07/enhanced-column-selection-dsl-in-kotlin-dataframe/)
|
||||
— explore powerful DSL for typesafe and flexible column selection in Kotlin DataFrame.
|
||||
* [](Kotlin-DataFrame-Features-in-Kotlin-Notebook.md)
|
||||
— discover interactive Kotlin DataFrame outputs in
|
||||
[Kotlin Notebook](https://kotlinlang.org/docs/kotlin-notebook-overview.html).
|
||||
|
||||
<img src="ktnb_features_preview.png" border-effect="rounded" width="705"/>
|
||||
|
||||
* [40 Puzzles](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/puzzles/40%20puzzles.ipynb)
|
||||
— inspired by [100 pandas puzzles](https://github.com/ajcr/100-pandas-puzzles).
|
||||
An interactive guide that takes you from simple tasks to complex challenges,
|
||||
teaching you how to solve them using Kotlin DataFrame in a concise and elegant style.
|
||||
* [Reading from files: CSV, JSON, ApacheArrow](read.md)
|
||||
— read your data from various formats into `DataFrame`.
|
||||
* [SQL Databases Interaction](readSqlDatabases.md)
|
||||
— set up SQL database access and read query results efficiently into `DataFrame`.
|
||||
* [Custom SQL Database Support](readSqlFromCustomDatabase.md)
|
||||
— extend DataFrame library for custom SQL database support.
|
||||
* [GeoDataFrame Guide](https://kotlin.github.io/kandy/geo-plotting-guide.html)
|
||||
— explore the GeoDataFrame module that brings a convenient Kotlin DataFrame API to geospatial workflows,
|
||||
enhanced with beautiful Kandy-Geo visualizations (*experimental*).
|
||||
|
||||
|
||||
<img src="geoguide_preview.png" border-effect="rounded" width="705"/>
|
||||
|
||||
|
||||
* [Using Unsupported Data Sources](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples):
|
||||
— A guide by examples. While these might one day become proper integrations of DataFrame, for now,
|
||||
we provide them as examples for how to make such integrations yourself.
|
||||
* [Apache Spark Interop (With and Without Kotlin Spark API)](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/spark)
|
||||
* [Multik Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/multik)
|
||||
* [JetBrains Exposed Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/exposed)
|
||||
* [Hibernate ORM](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/hibernate)
|
||||
* [OpenAPI Guide](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/json/KeyValueAndOpenApi.ipynb)
|
||||
— learn how to parse and explore [OpenAPI](https://swagger.io) JSON structures using Kotlin DataFrame,
|
||||
enabling structured access and intuitive analysis of complex API schemas (*experimental*, supports OpenAPI 3.0.0).
|
||||
|
||||
## Examples
|
||||
|
||||
Explore our extensive collection of practical examples and real-world analytics workflows.
|
||||
|
||||
* [Kotlin DataFrame Compiler Plugin Gradle Example](https://github.com/Kotlin/dataframe/blob/master/examples/kotlin-dataframe-plugin-gradle-example)
|
||||
— a simple Gradle project demonstrating the usage of the [compiler plugin](Compiler-Plugin.md),
|
||||
showcasing DataFrame expressions with [extension properties](extensionPropertiesApi.md)
|
||||
that are generated on-the-fly in the IDEA project.
|
||||
|
||||
* [Kotlin DataFrame Compiler Plugin Maven Example](https://github.com/Kotlin/dataframe/blob/master/examples/kotlin-dataframe-plugin-gradle-example)
|
||||
— a simple Maven project demonstrating the usage of the [compiler plugin](Compiler-Plugin.md),
|
||||
showcasing DataFrame expressions with [extension properties](extensionPropertiesApi.md)
|
||||
that are generated on-the-fly in the IDEA project.
|
||||
|
||||
* [Titanic Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/titanic/Titanic.ipynb)
|
||||
— discover the famous "Titanic"
|
||||
dataset with the Kotlin DataFrame analysis toolkit
|
||||
and [Kandy](https://kotlin.github.io/kandy/) visualizations.
|
||||
|
||||
* [Track and Analyze GitHub Star Growth](https://blog.jetbrains.com/kotlin/2024/08/track-and-analyze-github-star-growth-with-kandy-and-kotlin-dataframe/)
|
||||
— query GitHub’s API with the Kotlin Notebook Ktor client,
|
||||
then analyze and visualize the data using Kotlin DataFrame and [Kandy](https://kotlin.github.io/kandy/).
|
||||
|
||||
* [GitHub Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/github/github.ipynb)
|
||||
— a practical example of working with deeply nested, hierarchical DataFrames using GitHub data.
|
||||
|
||||
* [Netflix Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/netflix/netflix.ipynb)
|
||||
— explore TV shows and movies from Netflix with the powerful Kotlin DataFrame API and beautiful
|
||||
[Kandy](https://kotlin.github.io/kandy/) visualizations.
|
||||
|
||||
* [Top-12 German Companies Financial Analyze](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/top_12_german_companies)
|
||||
— analyze key financial metrics for several major German companies.
|
||||
|
||||
* [Movies Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb)
|
||||
— basic Kotlin DataFrame operations on data from [movielens](https://movielens.org/).
|
||||
|
||||
* [YouTube Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/youtube/Youtube.ipynb)
|
||||
— explore YouTube videos with YouTube REST API and Kotlin DataFrame.
|
||||
|
||||
* [IMDb SQL Database Example](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples/blob/master/notebooks/imdb.ipynb)
|
||||
— analyze IMDb data stored in MariaDB using Kotlin DataFrame
|
||||
and visualize with [Kandy](https://kotlin.github.io/kandy/).
|
||||
|
||||
* [Reading Parquet files from Apache Spark](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/spark-parquet-dataframe)
|
||||
— this project showcases how to export data and ML models from Apache Spark via reading from Parquet files.
|
||||
Also, [Kandy](https://kotlin.github.io/kandy/) used to visualize the exported data and Linear Regression model.
|
||||
|
||||
See also [Kandy User Guides](https://kotlin.github.io/kandy/user-guide.html)
|
||||
and [Examples Gallery](https://kotlin.github.io/kandy/examples.html)
|
||||
for the best data visualizations using Kotlin DataFrame and Kandy together!
|
||||
|
||||
<img src="kandy_gallery_preview.png" border-effect="rounded" width="705"/>
|
||||
+108
@@ -0,0 +1,108 @@
|
||||
# Kotlin DataFrame Features in Kotlin Notebook
|
||||
|
||||
<web-summary>
|
||||
Discover how Kotlin DataFrame integrates with Kotlin Notebook for seamless interactive data analysis in IntelliJ IDEA.
|
||||
</web-summary>
|
||||
|
||||
<card-summary>
|
||||
Load, explore, and export your data interactively using Kotlin DataFrame in Kotlin Notebook.
|
||||
</card-summary>
|
||||
|
||||
<link-summary>
|
||||
Learn how to load, explore, drill into, export, and interact with data using Kotlin DataFrame in Kotlin Notebook.
|
||||
</link-summary>
|
||||
|
||||
|
||||
The [Kotlin Notebook Plugin for IntelliJ IDEA](https://plugins.jetbrains.com/plugin/16340-kotlin-notebook),
|
||||
combined with Kotlin DataFrame, offers powerful data analysis capabilities within an interactive environment.
|
||||
Here are the key features:
|
||||
|
||||
### Drag-and-Drop Data Files
|
||||
|
||||
You can quickly load data into `DataFrame` into a notebook by simply dragging and dropping a file
|
||||
(.csv/.json/.xlsx and .geojson/.shp) directly into the notebook editor:
|
||||
|
||||
<video src="ktnb_drag_n_drop.mp4" controls=""/>
|
||||
|
||||
### Visual Data Exploration
|
||||
**Page through your data**:
|
||||
The pagination feature lets you move through your data one page at a time, making it possible to view large datasets.
|
||||
|
||||
**Sort by column with a single click**:
|
||||
You can sort any column with a click.
|
||||
This is a convenient alternative to using `sortBy` in separate cells.
|
||||
|
||||
**Go straight to the data you need**:
|
||||
You can jump directly to a particular row or column if you want something specific.
|
||||
This makes working with large datasets more straightforward.
|
||||
|
||||
|
||||
<video src="https://github.com/user-attachments/assets/aeae1c79-9755-4558-bac4-420bf1331f39" controls=""/>
|
||||
|
||||
|
||||
### Drill down into nested data
|
||||
When your data has multiple layers, like a table within a table,
|
||||
you can now click on a cell containing a nested table to view these details directly.
|
||||
This makes it easy to go deeper into your data and then return to where you were.
|
||||
|
||||
|
||||
<video src="https://github.com/user-attachments/assets/ef9509be-e19b-469c-9bad-0ce81eec36b0" controls=""/>
|
||||
|
||||
|
||||
### Visualize multiple tables via tabs
|
||||
You can open and visualize multiple tables in separate tabs.
|
||||
This feature is tailored to those who need to compare, contrast, or monitor different datasets simultaneously.
|
||||
|
||||
|
||||
<video src="https://github.com/user-attachments/assets/51b7a6e3-0187-49b3-bf5e-0c4d60f8b769" controls=""/>
|
||||
|
||||
|
||||
### Exporting to files
|
||||
|
||||
You can export data directly from the dataframe into various file formats.
|
||||
This simplifies sharing and further analysis.
|
||||
The interface supports exporting data to JSON for web applications,
|
||||
CSV for spreadsheet tools, and XML for data interchange.
|
||||
|
||||
|
||||
<video src="https://github.com/user-attachments/assets/ec28c59a-1555-44ce-98f6-a60d8feae347" controls=""/>
|
||||
|
||||
|
||||
### Convenient copying of data from tables
|
||||
You can click and drag to select the data you need,
|
||||
or you can use keyboard shortcuts for quicker selection
|
||||
and then copy what’s needed with a simple right-click or another shortcut.
|
||||
It’s designed to feel intuitive,
|
||||
like copying text from a document, but with the structure and format of your data preserved.
|
||||
|
||||
|
||||
<video src="https://github.com/user-attachments/assets/88e53dfb-361f-40f8-bffb-52a512cdd3cd" controls=""/>
|
||||
|
||||
### Rendering of images in the cell
|
||||
|
||||
Table widget can render `BufferedImage`s.
|
||||
Given a column of images, right-click on the cell and click `View Image` in the context menu.
|
||||
|
||||

|
||||
|
||||
### Clickable URI links
|
||||
|
||||
String values starting with `https://`, `https://`, `file:/` are treated as clickable links that open, for example, your browser or file manager.
|
||||
Click on the cell to trigger a toolbar to appear.
|
||||
|
||||

|
||||
|
||||
Clicking on `Open URL` or `Open File URI` for the first time triggers a notification with a link to `Settings` → `URL Click Settings`.
|
||||
Choose what protocols should be allowed.
|
||||
|
||||

|
||||
|
||||
To get started, ensure you have the latest version of the Kotlin Notebook Plugin installed in IntelliJ IDEA,
|
||||
and begin exploring your data using Kotlin DataFrame in your notebook cells.
|
||||
|
||||
## Related documentation
|
||||
|
||||
- [Kotlin for Data Analysis in notebooks](https://kotlinlang.org/docs/kotlin-notebook-overview.html):
|
||||
Learn more about Kotlin Notebook capabilities for data analysis.
|
||||
- [Kotlin Notebooks in IntelliJ IDEA](https://www.jetbrains.com/help/idea/kotlin-notebook.html):
|
||||
Detailed documentation on working with Kotlin Notebooks in the IDE.
|
||||
@@ -0,0 +1,361 @@
|
||||
# Quickstart Guide
|
||||
|
||||
<web-summary>
|
||||
Get started with Kotlin DataFrame in a few simple steps: load data, transform it, and visualize it — all in an interactive Kotlin Notebook.
|
||||
</web-summary>
|
||||
|
||||
<card-summary>
|
||||
Get started with Kotlin DataFrame right away — integrate it seamlessly and load process, analyze and visualize some data!
|
||||
</card-summary>
|
||||
|
||||
<link-summary>
|
||||
Learn the basics of Kotlin DataFrame: reading data, applying transformations, and building plots — with full interactivity in Kotlin Notebook.
|
||||
</link-summary>
|
||||
|
||||
This guide shows how to quickly get started with **Kotlin DataFrame**:
|
||||
you'll learn how to load data, perform basic transformations, and build a simple plot using Kandy.
|
||||
|
||||
We recommend [starting with **Kotlin Notebook**](SetupKotlinNotebook.md) for the best beginner experience —
|
||||
everything works out of the box,
|
||||
including interactivity and rich DataFrame and plots rendering.
|
||||
You can instantly see the results of each operation: view the contents of your DataFrames after every transformation,
|
||||
inspect individual rows and columns, and explore data step-by-step in a live and interactive way.
|
||||
|
||||
You can view this guide as a
|
||||
[notebook on GitHub](https://github.com/Kotlin/dataframe/tree/master/examples/notebooks/quickstart/quickstart.ipynb)
|
||||
or download <resource src="quickstart.ipynb"></resource>.
|
||||
|
||||
|
||||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.guides.QuickStartGuide-->
|
||||
|
||||
To start working with Kotlin DataFrame in a notebook, run the cell with the next code:
|
||||
|
||||
```kotlin
|
||||
%useLatestDescriptors
|
||||
%use dataframe
|
||||
```
|
||||
|
||||
This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame
|
||||
rendering. Learn more [here](SetupKotlinNotebook.md#integrate-kotlin-dataframe).
|
||||
|
||||
## Read DataFrame
|
||||
|
||||
Kotlin DataFrame supports all popular data formats, including CSV, JSON, and Excel, as well as reading from various
|
||||
databases. Read a CSV with the "Jetbrains Repositories" dataset into `df` variable:
|
||||
|
||||
<!---FUN notebook_test_quickstart_2-->
|
||||
|
||||
```kotlin
|
||||
val df = DataFrame.readCsv(
|
||||
"https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
|
||||
)
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
## Display And Explore
|
||||
|
||||
To display your dataframe as a cell output, place it in the last line of the cell:
|
||||
|
||||
<!---FUN notebook_test_quickstart_3-->
|
||||
|
||||
```kotlin
|
||||
df
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_quickstart_3.html" width="705px" height="500px"></inline-frame>
|
||||
|
||||
Kotlin Notebook has special interactive outputs for `DataFrame`. Learn more about them here.
|
||||
|
||||
Use `.describe()` method to get dataset summaries — column types, number of nulls, and simple statistics.
|
||||
|
||||
<!---FUN notebook_test_quickstart_4-->
|
||||
|
||||
```kotlin
|
||||
df.describe()
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_quickstart_4.html" width="705px" height="500px"></inline-frame>
|
||||
|
||||
## Select Columns
|
||||
|
||||
Kotlin DataFrame features a typesafe Columns Selection DSL, enabling flexible and safe selection of any combination of
|
||||
columns.
|
||||
Column selectors are widely used across operations — one of the simplest examples is `.select { }`, which returns a new
|
||||
DataFrame with only the columns chosen in Columns Selection expression.
|
||||
|
||||
*After executing the cell* where a `DataFrame` variable is declared,
|
||||
[extension properties](extensionPropertiesApi.md) for its columns are automatically generated.
|
||||
These properties can then be used in the Columns Selection DSL expression for typesafe and convenient column access.
|
||||
|
||||
Select some columns:
|
||||
|
||||
<!---FUN notebook_test_quickstart_5-->
|
||||
|
||||
```kotlin
|
||||
// Select "full_name", "stargazers_count" and "topics" columns
|
||||
val dfSelected = df.select { full_name and stargazers_count and topics }
|
||||
dfSelected
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_quickstart_5.html" width="705px" height="500px"></inline-frame>
|
||||
|
||||
> With a [Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md) enabled,
|
||||
> you can use auto-generated properties in your IntelliJ IDEA projects.
|
||||
|
||||
## Row Filtering
|
||||
|
||||
Some operations use the [DataRow API](DataRow.md), with expressions and conditions
|
||||
that apply for all `DataFrame` rows.
|
||||
For example, `.filter { }` that returns a new `DataFrame` with rows that satisfy a condition given by row expression.
|
||||
|
||||
Inside a row expression, you can access the values of the current row by column names through auto-generated properties.
|
||||
Similar to the [Columns Selection DSL](ColumnSelectors.md),
|
||||
but in this case the properties represent actual values, not column references.
|
||||
|
||||
Filter rows by "stargazers_count" value:
|
||||
|
||||
<!---FUN notebook_test_quickstart_6-->
|
||||
|
||||
```kotlin
|
||||
// Keep only rows where "stargazers_count" value is more than 1000
|
||||
val dfFiltered = dfSelected.filter { stargazers_count >= 1000 }
|
||||
dfFiltered
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_quickstart_6.html" width="705px" height="500px"></inline-frame>
|
||||
|
||||
## Columns Rename
|
||||
|
||||
Columns can be renamed using the `.rename { }` operation, which also uses the Columns Selection DSL to select a column
|
||||
to rename.
|
||||
The `rename` operation does not perform the renaming immediately; instead, it creates an intermediate object that must
|
||||
be finalized into a new `DataFrame` by calling the `.into()` function with the new column name.
|
||||
|
||||
Rename "full_name" and "stargazers_count" columns:
|
||||
|
||||
<!---FUN notebook_test_quickstart_7-->
|
||||
|
||||
```kotlin
|
||||
// Rename "full_name" column into "name"
|
||||
val dfRenamed = dfFiltered.rename { full_name }.into("name")
|
||||
// And "stargazers_count" into "starsCount"
|
||||
.rename { stargazers_count }.into("starsCount")
|
||||
dfRenamed
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_quickstart_7.html" width="705px" height="500px"></inline-frame>
|
||||
|
||||
## Modify Columns
|
||||
|
||||
Columns can be modified using the `update { }` and `convert { }` operations.
|
||||
Both operations select columns to modify via the Columns Selection DSL and, similar to `rename`, create an intermediate
|
||||
object that must be finalized to produce a new `DataFrame`.
|
||||
|
||||
The `update` operation preserves the original column types, while `convert` allows changing the type.
|
||||
In both cases, column names and their positions remain unchanged.
|
||||
|
||||
Update "name" and convert "topics":
|
||||
|
||||
<!---FUN notebook_test_quickstart_8-->
|
||||
|
||||
```kotlin
|
||||
val dfUpdated = dfRenamed
|
||||
// Update "name" values with only its second part (after '/')
|
||||
.update { name }.with { it.split("/")[1] }
|
||||
// Convert "topics" `String` values into `List<String>` by splitting:
|
||||
.convert { topics }.with { it.removePrefix("[").removeSuffix("]").split(", ") }
|
||||
dfUpdated
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_quickstart_8.html" width="705px" height="500px"></inline-frame>
|
||||
|
||||
Check the new "topics" type out:
|
||||
|
||||
<!---FUN notebook_test_quickstart_9-->
|
||||
|
||||
```kotlin
|
||||
dfUpdated.topics.type()
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
Output:
|
||||
|
||||
```
|
||||
kotlin.collections.List<kotlin.String>
|
||||
```
|
||||
|
||||
## Adding New Columns
|
||||
|
||||
The `.add { }` function allows creating a `DataFrame` with a new column, where the value for each row is computed based
|
||||
on the existing values in that row. These values can be accessed within the row expressions.
|
||||
|
||||
Add a new `Boolean` column "isIntellij":
|
||||
|
||||
<!---FUN notebook_test_quickstart_10-->
|
||||
|
||||
```kotlin
|
||||
// Add a `Boolean` column indicating whether the `name` contains the "intellij" substring
|
||||
// or the topics include "intellij".
|
||||
val dfWithIsIntellij = dfUpdated.add("isIntellij") {
|
||||
name.contains("intellij") || "intellij" in topics
|
||||
}
|
||||
dfWithIsIntellij
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_quickstart_10.html" width="705px" height="500px"></inline-frame>
|
||||
|
||||
## Grouping And Aggregating
|
||||
|
||||
A `DataFrame` can be grouped by column keys, meaning its rows are split into groups based on the values in the key
|
||||
columns.
|
||||
The `.groupBy { }` operation selects columns and groups the `DataFrame` by their values, using them as grouping keys.
|
||||
|
||||
The result is a `GroupBy` — a `DataFrame`-like structure that associates each key with the corresponding subset of the
|
||||
original `DataFrame`.
|
||||
|
||||
Group `dfWithIsIntellij` by "isIntellij":
|
||||
|
||||
<!---FUN notebook_test_quickstart_11-->
|
||||
|
||||
```kotlin
|
||||
val groupedByIsIntellij = dfWithIsIntellij.groupBy { isIntellij }
|
||||
groupedByIsIntellij
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_quickstart_11.html" width="705px" height="500px"></inline-frame>
|
||||
|
||||
A `GroupBy` can be aggregated — that is, you can compute one or several summary statistics for each group.
|
||||
The result of the aggregation is a `DataFrame` containing the key columns along with new columns holding the computed
|
||||
statistics for a corresponding group.
|
||||
|
||||
For example, `count()` computes size of group:
|
||||
|
||||
<!---FUN notebook_test_quickstart_12-->
|
||||
|
||||
```kotlin
|
||||
groupedByIsIntellij.count()
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_quickstart_12.html" width="705px" height="500px"></inline-frame>
|
||||
|
||||
Compute several statistics with `.aggregate { }` that provides an expression for aggregating:
|
||||
|
||||
<!---FUN notebook_test_quickstart_13-->
|
||||
|
||||
```kotlin
|
||||
groupedByIsIntellij.aggregate {
|
||||
// Compute sum and max of "starsCount" within each group into "sumStars" and "maxStars" columns
|
||||
sumOf { starsCount } into "sumStars"
|
||||
maxOf { starsCount } into "maxStars"
|
||||
}
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_quickstart_13.html" width="705px" height="500px"></inline-frame>
|
||||
|
||||
## Sorting Rows
|
||||
|
||||
`.sort {}`/`.sortByDesc` sortes rows by value in selected columns, returning a DataFrame with sorted rows. `take(n)`
|
||||
returns a new `DataFrame` with the first `n` rows.
|
||||
|
||||
Combine them to get Top-10 repositories by number of stars:
|
||||
|
||||
<!---FUN notebook_test_quickstart_14-->
|
||||
|
||||
```kotlin
|
||||
val dfTop10 = dfWithIsIntellij
|
||||
// Sort by "starsCount" value descending
|
||||
.sortByDesc { starsCount }.take(10)
|
||||
dfTop10
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
<inline-frame src="./resources/notebook_test_quickstart_14.html" width="705px" height="500px"></inline-frame>
|
||||
|
||||
## Plotting With Kandy
|
||||
|
||||
Kandy is a Kotlin plotting library designed to bring Kotlin DataFrame features into chart creation, providing a
|
||||
convenient and typesafe way to build data visualizations.
|
||||
|
||||
Kandy can be loaded into notebook using `%use kandy`:
|
||||
|
||||
```kotlin
|
||||
%use kandy
|
||||
```
|
||||
|
||||
Build a simple bar chart with `.plot { }` extension for DataFrame, that allows to use extension properties inside Kandy
|
||||
plotting DSL (plot will be rendered as an output after cell execution):
|
||||
|
||||
<!---FUN notebook_test_quickstart_16-->
|
||||
|
||||
```kotlin
|
||||
dfTop10.plot {
|
||||
bars {
|
||||
x(name)
|
||||
y(starsCount)
|
||||
}
|
||||
|
||||
layout.title = "Top 10 JetBrains repositories by stars count"
|
||||
}
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||

|
||||
|
||||
## Write DataFrame
|
||||
|
||||
A `DataFrame` supports writing to all formats that it is capable of reading.
|
||||
|
||||
Write into Excel:
|
||||
|
||||
<!---FUN notebook_test_quickstart_17-->
|
||||
|
||||
```kotlin
|
||||
dfWithIsIntellij.writeExcel("jb_repos.xlsx")
|
||||
```
|
||||
|
||||
<!---END-->
|
||||
|
||||
## What's Next?
|
||||
|
||||
In this quickstart, we covered the basics — reading data, transforming it, and building a simple visualization.
|
||||
Ready to go deeper? Check out what’s next:
|
||||
|
||||
- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets,
|
||||
API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.
|
||||
|
||||
- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do.
|
||||
|
||||
- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md).
|
||||
|
||||
- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
|
||||
and make working with your data both convenient and type-safe.
|
||||
|
||||
- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
|
||||
for auto-generated column access in your IntelliJ IDEA projects.
|
||||
|
||||
- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations learning
|
||||
[Kandy Documentation](https://kotlin.github.io/kandy).
|
||||
Reference in New Issue
Block a user