init research

This commit is contained in:
2026-02-08 11:20:43 -10:00
commit bdf064f54d
3041 changed files with 1592200 additions and 0 deletions
@@ -0,0 +1,245 @@
# Kotlin DataFrame for SQL & Backend Developers
<web-summary>
Quickly transition from SQL to Kotlin DataFrame: load your datasets, perform essential transformations, and visualize your results — directly within a Kotlin Notebook.
</web-summary>
<card-summary>
Switching from SQL? Kotlin DataFrame makes it easy to load, process, analyze, and visualize your data — fully interactive and type-safe!
</card-summary>
<link-summary>
Explore Kotlin DataFrame as a SQL or ORM user: read your data, transform columns, group or join tables, and build insightful visualizations with Kotlin Notebook.
</link-summary>
This guide helps Kotlin backend developers with SQL experience quickly adapt to **Kotlin DataFrame**, mapping familiar
SQL and ORM operations to DataFrame concepts.
If you plan to work on a Gradle project without a Kotlin Notebook,
we recommend installing the library together with our [**experimental Kotlin compiler plugin**](Compiler-Plugin.md) (available since version 2.2.*).
This plugin generates type-safe schemas at compile time,
tracking schema changes throughout your data pipeline.
## Add Kotlin DataFrame Gradle dependency
You could read more about the setup of the Gradle build in the [Gradle Setup Guide](SetupGradle.md).
In your Gradle build file (`build.gradle` or `build.gradle.kts`), add the Kotlin DataFrame library as a dependency:
<tabs>
<tab title="Kotlin DSL">
```kotlin
dependencies {
implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
}
```
</tab>
<tab title="Groovy DSL">
```groovy
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%'
}
```
</tab>
</tabs>
---
## 1. What is a dataframe?
If youre used to SQL, a **dataframe** is conceptually like a **table**:
- **Rows**: ordered records of data
- **Columns**: named, typed fields
- **Schema**: a mapping of column names to types
Kotlin DataFrame also supports [**hierarchical, JSON-like data**](hierarchical.md) —
columns can contain *[nested dataframes](DataColumn.md#framecolumn)* or *column groups*,
allowing you to represent and transform tree-like structures without flattening.
Unlike a relational DB table:
- A DataFrame object **lives in memory** — theres no storage engine or transaction log
- Its **immutable** — each operation produces a *new* DataFrame
- There is **no concept of foreign keys or relations** between DataFrames
- It can be created from
*any* [source](Data-Sources.md): [CSV](CSV-TSV.md), [JSON](JSON.md), [SQL tables](SQL.md), [Apache Arrow](ApacheArrow.md),
in-memory objects
---
## 2. Reading Data From SQL
Kotlin DataFrame integrates with JDBC, so you can bring SQL data into memory for analysis.
| Approach | Example |
|----------------------------------|---------------------------------------------------------------------|
| **From a table** | `val df = DataFrame.readSqlTable(dbConfig, "customers")` |
| **From a SQL query** | `val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM orders")` |
| **From a JDBC Connection** | `val df = connection.readDataFrame("SELECT * FROM orders")` |
| **From a ResultSet (extension)** | `val df = resultSet.readDataFrame(connection)` |
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
val dbConfig = DbConnectionConfig(
url = "jdbc:postgresql://localhost:5432/mydb",
user = "postgres",
password = "secret"
)
// Table
val customers = DataFrame.readSqlTable(dbConfig, "customers")
// Query
val salesByRegion = DataFrame.readSqlQuery(
dbConfig, """
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
"""
)
// From JDBC connection
connection.readDataFrame("SELECT * FROM orders")
// From ResultSet
val rs = connection.createStatement().executeQuery("SELECT * FROM orders")
rs.readDataFrame(connection)
```
More information can be found [here](readSqlDatabases.md).
## 3. Why Its Not an ORM
Frameworks like **[Hibernate](https://hibernate.org/orm/)** or **[Exposed](https://github.com/JetBrains/Exposed)**:
- Map DB tables to Kotlin objects (entities)
- Track object changes and sync them back to the database
- Focus on **persistence** and **transactions**
Kotlin DataFrame:
- Has no persistence layer
- Doesnt try to map rows to mutable entities
- Focuses on **in-memory analytics**, **transformations**, and **type-safe pipelines**
- The **main idea** is that the schema *changes together with your transformations* — and the [**Compiler Plugin
**](Compiler-Plugin.md) updates the type-safe API automatically under the hood.
- You dont have to manually define or recreate schemas every time — the plugin infers them dynamically from the data or
transformations.
- In ORMs, the mapping layer is **frozen** — schema changes require manual model edits and migrations.
Think of Kotlin DataFrame as a **data analysis/ETL tool**, not an ORM.
---
## 4. Key Differences from SQL & ORMs
| Feature / Concept | SQL Databases (PostgreSQL, MySQL…) | ORM (Hibernate, Exposed…) | Kotlin DataFrame |
|----------------------------|------------------------------------|------------------------------------|---------------------------------------------------------------------|
| **Storage** | Persistent | Persistent | In-memory only |
| **Schema definition** | `CREATE TABLE` DDL | Defined in entity classes | Derived from data or transformations or defined manually |
| **Schema change** | `ALTER TABLE` | Manual migration of entity classes | Automatic via transformations + Compiler Plugin or defined manually |
| **Relations** | Foreign keys | Mapped via annotations | Not applicable |
| **Transactions** | Yes | Yes | Not applicable |
| **DB Indexes** | Yes | Yes (via DB) | Not applicable |
| **Data manipulation** | SQL DML (`INSERT`, `UPDATE`) | CRUD mapped to DB | Transformations only (immutable) |
| **Joins** | `JOIN` keyword | Eager/lazy loading | [`.join()` / `.leftJoin()` DSL](join.md) |
| **Grouping & aggregation** | `GROUP BY` | DB query with groupBy | [`.groupBy().aggregate()`](groupBy.md) |
| **Filtering** | `WHERE` | Criteria API / query DSL | [`.filter { ... }`](filter.md) |
| **Permissions** | `GRANT` / `REVOKE` | DB-level permissions | Not applicable |
| **Execution** | On DB engine | On DB engine | In JVM process |
---
## 5. SQL → Kotlin DataFrame Cheatsheet
### DDL Analogues
| SQL DDL Command / Example | Kotlin DataFrame Equivalent |
|---------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
| **Create table:**<br>`CREATE TABLE person (name text, age int);` | `@DataSchema`<br>`interface Person {`<br>` val name: String`<br>` val age: Int`<br>`}` |
| **Add column:**<br>`ALTER TABLE sales ADD COLUMN profit numeric GENERATED ALWAYS AS (revenue - cost) STORED;` | `.add("profit") { revenue - cost }` |
| **Rename column:**<br>`ALTER TABLE sales RENAME COLUMN old_name TO new_name;` | `.rename { old_name }.into("new_name")` |
| **Drop column:**<br>`ALTER TABLE sales DROP COLUMN old_col;` | `.remove { old_col }` |
| **Modify column type:**<br>`ALTER TABLE sales ALTER COLUMN amount TYPE numeric;` | `.convert { amount }.to<Double>()` |
---
### DML Analogues
| SQL DML Command / Example | Kotlin DataFrame Equivalent |
|--------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|
| `SELECT col1, col2` | `df.select { col1 and col2 }` |
| `WHERE amount > 100` | `df.filter { amount > 100 }` |
| `ORDER BY amount DESC` | `df.sortByDesc { amount }` |
| `GROUP BY region` | `df.groupBy { region }` |
| `SUM(amount)` | `.aggregate { sum { amount } }` |
| `JOIN` | `.join(otherDf) { id match right.id }` |
| `LIMIT 5` | `.take(5)` |
| **Pivot:** <br>`SELECT * FROM crosstab('SELECT region, year, SUM(amount) FROM sales GROUP BY region, year') AS ct(region text, y2023 int, y2024 int);` | `.groupBy { region }.pivot { year }. sum { amount }` |
| **Explode array column:** <br>`SELECT id, unnest(tags) AS tag FROM products;` | `.explode { tags }` |
| **Update column:** <br>`UPDATE sales SET amount = amount * 1.2;` | `.update { amount }.with { it * 1.2 }` |
## 6. Example: SQL vs. DataFrame Side-by-Side
**SQL (PostgreSQL):**
```sql
SELECT region, SUM(amount) AS total
FROM sales
WHERE amount > 0
GROUP BY region
ORDER BY total DESC LIMIT 5;
```
```kotlin
sales.filter { amount > 0 }
.groupBy { region }
.aggregate { sum { amount } into "total" }
.sortByDesc { total }
.take(5)
```
## In Conclusion
- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe
** and fully integrated into Kotlin.
- The main focus is **readability** and schema change safety via
the [Compiler Plugin](Compiler-Plugin.md).
- It is neither a database nor an ORM — a Kotlin DataFrame library does not store data or manage transactions but works as an in-memory
layer for analytics and transformations.
- It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working
with JSON-like structures and combining multiple data sources.
- Use Kotlin DataFrame as a **type-safe DSL** for post-processing, merging data sources, and analytics directly on the
JVM, while keeping your code easily refactorable and IDE-assisted.
- Use Kotlin DataFrame for small- and average-sized datasets, but for large datasets, consider using a more
**performant** database engine.
## What's Next?
If you're ready to go through a complete example, we recommend our **[Quickstart Guide](quickstart.md)**
— you'll learn the basics of reading data, transforming it, and creating visualization step-by-step.
Ready to go deeper? Check out whats next:
- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets,
API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.
- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do.
- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md).
- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
and make working with your data both convenient and type-safe.
- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
for auto-generated column access in your IntelliJ IDEA projects.
- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations
[Kandy Documentation](https://kotlin.github.io/kandy).
@@ -0,0 +1,120 @@
# Guides And Examples
<web-summary>
Browse a collection of guides and examples covering key features and real-world use cases of Kotlin DataFrame — from basics to advanced data analysis.
</web-summary>
<card-summary>
Explore Kotlin DataFrame with detailed user guides and real-world examples,
showcasing practical use cases and data workflows.
</card-summary>
<link-summary>
A curated list of Kotlin DataFrame guides and examples that walk you through common operations and data analysis patterns step by step.
</link-summary>
<!--- TODO: add more guides (migration from pandas and others) and replace GH notebooks with topics --->
## Guides
Explore our structured, in-depth guides to steadily improve your Kotlin DataFrame skills — step by step.
* [](quickstart.md) — get started with Kotlin DataFrame in a few simple steps:
load data, transform it, and visualize it.
<img src="quickstart_preview.png" border-effect="rounded" width="705"/>
* [](Guide-for-backend-SQL-developers.md) — migration guide for backend developers with SQL/ORM experience moving to Kotlin DataFrame
* [](extensionPropertiesApi.md) — learn about extension properties for [`DataFrame`](DataFrame.md)
and make working with your data both convenient and type-safe.
* [Enhanced Column Selection DSL](https://blog.jetbrains.com/kotlin/2024/07/enhanced-column-selection-dsl-in-kotlin-dataframe/)
— explore powerful DSL for typesafe and flexible column selection in Kotlin DataFrame.
* [](Kotlin-DataFrame-Features-in-Kotlin-Notebook.md)
— discover interactive Kotlin DataFrame outputs in
[Kotlin Notebook](https://kotlinlang.org/docs/kotlin-notebook-overview.html).
<img src="ktnb_features_preview.png" border-effect="rounded" width="705"/>
* [40 Puzzles](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/puzzles/40%20puzzles.ipynb)
— inspired by [100 pandas puzzles](https://github.com/ajcr/100-pandas-puzzles).
An interactive guide that takes you from simple tasks to complex challenges,
teaching you how to solve them using Kotlin DataFrame in a concise and elegant style.
* [Reading from files: CSV, JSON, ApacheArrow](read.md)
— read your data from various formats into `DataFrame`.
* [SQL Databases Interaction](readSqlDatabases.md)
— set up SQL database access and read query results efficiently into `DataFrame`.
* [Custom SQL Database Support](readSqlFromCustomDatabase.md)
— extend DataFrame library for custom SQL database support.
* [GeoDataFrame Guide](https://kotlin.github.io/kandy/geo-plotting-guide.html)
— explore the GeoDataFrame module that brings a convenient Kotlin DataFrame API to geospatial workflows,
enhanced with beautiful Kandy-Geo visualizations (*experimental*).
<img src="geoguide_preview.png" border-effect="rounded" width="705"/>
* [Using Unsupported Data Sources](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples):
— A guide by examples. While these might one day become proper integrations of DataFrame, for now,
we provide them as examples for how to make such integrations yourself.
* [Apache Spark Interop (With and Without Kotlin Spark API)](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/spark)
* [Multik Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/multik)
* [JetBrains Exposed Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/exposed)
* [Hibernate ORM](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/hibernate)
* [OpenAPI Guide](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/json/KeyValueAndOpenApi.ipynb)
— learn how to parse and explore [OpenAPI](https://swagger.io) JSON structures using Kotlin DataFrame,
enabling structured access and intuitive analysis of complex API schemas (*experimental*, supports OpenAPI 3.0.0).
## Examples
Explore our extensive collection of practical examples and real-world analytics workflows.
* [Kotlin DataFrame Compiler Plugin Gradle Example](https://github.com/Kotlin/dataframe/blob/master/examples/kotlin-dataframe-plugin-gradle-example)
— a simple Gradle project demonstrating the usage of the [compiler plugin](Compiler-Plugin.md),
showcasing DataFrame expressions with [extension properties](extensionPropertiesApi.md)
that are generated on-the-fly in the IDEA project.
* [Kotlin DataFrame Compiler Plugin Maven Example](https://github.com/Kotlin/dataframe/blob/master/examples/kotlin-dataframe-plugin-gradle-example)
— a simple Maven project demonstrating the usage of the [compiler plugin](Compiler-Plugin.md),
showcasing DataFrame expressions with [extension properties](extensionPropertiesApi.md)
that are generated on-the-fly in the IDEA project.
* [Titanic Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/titanic/Titanic.ipynb)
— discover the famous "Titanic"
dataset with the Kotlin DataFrame analysis toolkit
and [Kandy](https://kotlin.github.io/kandy/) visualizations.
* [Track and Analyze GitHub Star Growth](https://blog.jetbrains.com/kotlin/2024/08/track-and-analyze-github-star-growth-with-kandy-and-kotlin-dataframe/)
— query GitHubs API with the Kotlin Notebook Ktor client,
then analyze and visualize the data using Kotlin DataFrame and [Kandy](https://kotlin.github.io/kandy/).
* [GitHub Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/github/github.ipynb)
— a practical example of working with deeply nested, hierarchical DataFrames using GitHub data.
* [Netflix Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/netflix/netflix.ipynb)
— explore TV shows and movies from Netflix with the powerful Kotlin DataFrame API and beautiful
[Kandy](https://kotlin.github.io/kandy/) visualizations.
* [Top-12 German Companies Financial Analyze](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/top_12_german_companies)
— analyze key financial metrics for several major German companies.
* [Movies Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/movies/movies.ipynb)
— basic Kotlin DataFrame operations on data from [movielens](https://movielens.org/).
* [YouTube Example](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/youtube/Youtube.ipynb)
— explore YouTube videos with YouTube REST API and Kotlin DataFrame.
* [IMDb SQL Database Example](https://github.com/zaleslaw/KotlinDataFrame-SQL-Examples/blob/master/notebooks/imdb.ipynb)
— analyze IMDb data stored in MariaDB using Kotlin DataFrame
and visualize with [Kandy](https://kotlin.github.io/kandy/).
* [Reading Parquet files from Apache Spark](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/spark-parquet-dataframe)
— this project showcases how to export data and ML models from Apache Spark via reading from Parquet files.
Also, [Kandy](https://kotlin.github.io/kandy/) used to visualize the exported data and Linear Regression model.
See also [Kandy User Guides](https://kotlin.github.io/kandy/user-guide.html)
and [Examples Gallery](https://kotlin.github.io/kandy/examples.html)
for the best data visualizations using Kotlin DataFrame and Kandy together!
<img src="kandy_gallery_preview.png" border-effect="rounded" width="705"/>
@@ -0,0 +1,108 @@
# Kotlin DataFrame Features in Kotlin Notebook
<web-summary>
Discover how Kotlin DataFrame integrates with Kotlin Notebook for seamless interactive data analysis in IntelliJ IDEA.
</web-summary>
<card-summary>
Load, explore, and export your data interactively using Kotlin DataFrame in Kotlin Notebook.
</card-summary>
<link-summary>
Learn how to load, explore, drill into, export, and interact with data using Kotlin DataFrame in Kotlin Notebook.
</link-summary>
The [Kotlin Notebook Plugin for IntelliJ IDEA](https://plugins.jetbrains.com/plugin/16340-kotlin-notebook),
combined with Kotlin DataFrame, offers powerful data analysis capabilities within an interactive environment.
Here are the key features:
### Drag-and-Drop Data Files
You can quickly load data into `DataFrame` into a notebook by simply dragging and dropping a file
(.csv/.json/.xlsx and .geojson/.shp) directly into the notebook editor:
<video src="ktnb_drag_n_drop.mp4" controls=""/>
### Visual Data Exploration
**Page through your data**:
The pagination feature lets you move through your data one page at a time, making it possible to view large datasets.
**Sort by column with a single click**:
You can sort any column with a click.
This is a convenient alternative to using `sortBy` in separate cells.
**Go straight to the data you need**:
You can jump directly to a particular row or column if you want something specific.
This makes working with large datasets more straightforward.
<video src="https://github.com/user-attachments/assets/aeae1c79-9755-4558-bac4-420bf1331f39" controls=""/>
### Drill down into nested data
When your data has multiple layers, like a table within a table,
you can now click on a cell containing a nested table to view these details directly.
This makes it easy to go deeper into your data and then return to where you were.
<video src="https://github.com/user-attachments/assets/ef9509be-e19b-469c-9bad-0ce81eec36b0" controls=""/>
### Visualize multiple tables via tabs
You can open and visualize multiple tables in separate tabs.
This feature is tailored to those who need to compare, contrast, or monitor different datasets simultaneously.
<video src="https://github.com/user-attachments/assets/51b7a6e3-0187-49b3-bf5e-0c4d60f8b769" controls=""/>
### Exporting to files
You can export data directly from the dataframe into various file formats.
This simplifies sharing and further analysis.
The interface supports exporting data to JSON for web applications,
CSV for spreadsheet tools, and XML for data interchange.
<video src="https://github.com/user-attachments/assets/ec28c59a-1555-44ce-98f6-a60d8feae347" controls=""/>
### Convenient copying of data from tables
You can click and drag to select the data you need,
or you can use keyboard shortcuts for quicker selection
and then copy whats needed with a simple right-click or another shortcut.
Its designed to feel intuitive,
like copying text from a document, but with the structure and format of your data preserved.
<video src="https://github.com/user-attachments/assets/88e53dfb-361f-40f8-bffb-52a512cdd3cd" controls=""/>
### Rendering of images in the cell
Table widget can render `BufferedImage`s.
Given a column of images, right-click on the cell and click `View Image` in the context menu.
![ktnb_cell_image.png](ktnb_cell_image.png)
### Clickable URI links
String values starting with `https://`, `https://`, `file:/` are treated as clickable links that open, for example, your browser or file manager.
Click on the cell to trigger a toolbar to appear.
![ktnb_clickable_link.png](ktnb_clickable_link.png)
Clicking on `Open URL` or `Open File URI` for the first time triggers a notification with a link to `Settings``URL Click Settings`.
Choose what protocols should be allowed.
![ktnb_link_settings.png](ktnb_link_settings.png)
To get started, ensure you have the latest version of the Kotlin Notebook Plugin installed in IntelliJ IDEA,
and begin exploring your data using Kotlin DataFrame in your notebook cells.
## Related documentation
- [Kotlin for Data Analysis in notebooks](https://kotlinlang.org/docs/kotlin-notebook-overview.html):
Learn more about Kotlin Notebook capabilities for data analysis.
- [Kotlin Notebooks in IntelliJ IDEA](https://www.jetbrains.com/help/idea/kotlin-notebook.html):
Detailed documentation on working with Kotlin Notebooks in the IDE.
+361
View File
@@ -0,0 +1,361 @@
# Quickstart Guide
<web-summary>
Get started with Kotlin DataFrame in a few simple steps: load data, transform it, and visualize it — all in an interactive Kotlin Notebook.
</web-summary>
<card-summary>
Get started with Kotlin DataFrame right away — integrate it seamlessly and load process, analyze and visualize some data!
</card-summary>
<link-summary>
Learn the basics of Kotlin DataFrame: reading data, applying transformations, and building plots — with full interactivity in Kotlin Notebook.
</link-summary>
This guide shows how to quickly get started with **Kotlin DataFrame**:
you'll learn how to load data, perform basic transformations, and build a simple plot using Kandy.
We recommend [starting with **Kotlin Notebook**](SetupKotlinNotebook.md) for the best beginner experience —
everything works out of the box,
including interactivity and rich DataFrame and plots rendering.
You can instantly see the results of each operation: view the contents of your DataFrames after every transformation,
inspect individual rows and columns, and explore data step-by-step in a live and interactive way.
You can view this guide as a
[notebook on GitHub](https://github.com/Kotlin/dataframe/tree/master/examples/notebooks/quickstart/quickstart.ipynb)
or download <resource src="quickstart.ipynb"></resource>.
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.guides.QuickStartGuide-->
To start working with Kotlin DataFrame in a notebook, run the cell with the next code:
```kotlin
%useLatestDescriptors
%use dataframe
```
This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame
rendering. Learn more [here](SetupKotlinNotebook.md#integrate-kotlin-dataframe).
## Read DataFrame
Kotlin DataFrame supports all popular data formats, including CSV, JSON, and Excel, as well as reading from various
databases. Read a CSV with the "Jetbrains Repositories" dataset into `df` variable:
<!---FUN notebook_test_quickstart_2-->
```kotlin
val df = DataFrame.readCsv(
"https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
)
```
<!---END-->
## Display And Explore
To display your dataframe as a cell output, place it in the last line of the cell:
<!---FUN notebook_test_quickstart_3-->
```kotlin
df
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_3.html" width="705px" height="500px"></inline-frame>
Kotlin Notebook has special interactive outputs for `DataFrame`. Learn more about them here.
Use `.describe()` method to get dataset summaries — column types, number of nulls, and simple statistics.
<!---FUN notebook_test_quickstart_4-->
```kotlin
df.describe()
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_4.html" width="705px" height="500px"></inline-frame>
## Select Columns
Kotlin DataFrame features a typesafe Columns Selection DSL, enabling flexible and safe selection of any combination of
columns.
Column selectors are widely used across operations — one of the simplest examples is `.select { }`, which returns a new
DataFrame with only the columns chosen in Columns Selection expression.
*After executing the cell* where a `DataFrame` variable is declared,
[extension properties](extensionPropertiesApi.md) for its columns are automatically generated.
These properties can then be used in the Columns Selection DSL expression for typesafe and convenient column access.
Select some columns:
<!---FUN notebook_test_quickstart_5-->
```kotlin
// Select "full_name", "stargazers_count" and "topics" columns
val dfSelected = df.select { full_name and stargazers_count and topics }
dfSelected
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_5.html" width="705px" height="500px"></inline-frame>
> With a [Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md) enabled,
> you can use auto-generated properties in your IntelliJ IDEA projects.
## Row Filtering
Some operations use the [DataRow API](DataRow.md), with expressions and conditions
that apply for all `DataFrame` rows.
For example, `.filter { }` that returns a new `DataFrame` with rows that satisfy a condition given by row expression.
Inside a row expression, you can access the values of the current row by column names through auto-generated properties.
Similar to the [Columns Selection DSL](ColumnSelectors.md),
but in this case the properties represent actual values, not column references.
Filter rows by "stargazers_count" value:
<!---FUN notebook_test_quickstart_6-->
```kotlin
// Keep only rows where "stargazers_count" value is more than 1000
val dfFiltered = dfSelected.filter { stargazers_count >= 1000 }
dfFiltered
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_6.html" width="705px" height="500px"></inline-frame>
## Columns Rename
Columns can be renamed using the `.rename { }` operation, which also uses the Columns Selection DSL to select a column
to rename.
The `rename` operation does not perform the renaming immediately; instead, it creates an intermediate object that must
be finalized into a new `DataFrame` by calling the `.into()` function with the new column name.
Rename "full_name" and "stargazers_count" columns:
<!---FUN notebook_test_quickstart_7-->
```kotlin
// Rename "full_name" column into "name"
val dfRenamed = dfFiltered.rename { full_name }.into("name")
// And "stargazers_count" into "starsCount"
.rename { stargazers_count }.into("starsCount")
dfRenamed
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_7.html" width="705px" height="500px"></inline-frame>
## Modify Columns
Columns can be modified using the `update { }` and `convert { }` operations.
Both operations select columns to modify via the Columns Selection DSL and, similar to `rename`, create an intermediate
object that must be finalized to produce a new `DataFrame`.
The `update` operation preserves the original column types, while `convert` allows changing the type.
In both cases, column names and their positions remain unchanged.
Update "name" and convert "topics":
<!---FUN notebook_test_quickstart_8-->
```kotlin
val dfUpdated = dfRenamed
// Update "name" values with only its second part (after '/')
.update { name }.with { it.split("/")[1] }
// Convert "topics" `String` values into `List<String>` by splitting:
.convert { topics }.with { it.removePrefix("[").removeSuffix("]").split(", ") }
dfUpdated
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_8.html" width="705px" height="500px"></inline-frame>
Check the new "topics" type out:
<!---FUN notebook_test_quickstart_9-->
```kotlin
dfUpdated.topics.type()
```
<!---END-->
Output:
```
kotlin.collections.List<kotlin.String>
```
## Adding New Columns
The `.add { }` function allows creating a `DataFrame` with a new column, where the value for each row is computed based
on the existing values in that row. These values can be accessed within the row expressions.
Add a new `Boolean` column "isIntellij":
<!---FUN notebook_test_quickstart_10-->
```kotlin
// Add a `Boolean` column indicating whether the `name` contains the "intellij" substring
// or the topics include "intellij".
val dfWithIsIntellij = dfUpdated.add("isIntellij") {
name.contains("intellij") || "intellij" in topics
}
dfWithIsIntellij
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_10.html" width="705px" height="500px"></inline-frame>
## Grouping And Aggregating
A `DataFrame` can be grouped by column keys, meaning its rows are split into groups based on the values in the key
columns.
The `.groupBy { }` operation selects columns and groups the `DataFrame` by their values, using them as grouping keys.
The result is a `GroupBy` — a `DataFrame`-like structure that associates each key with the corresponding subset of the
original `DataFrame`.
Group `dfWithIsIntellij` by "isIntellij":
<!---FUN notebook_test_quickstart_11-->
```kotlin
val groupedByIsIntellij = dfWithIsIntellij.groupBy { isIntellij }
groupedByIsIntellij
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_11.html" width="705px" height="500px"></inline-frame>
A `GroupBy` can be aggregated — that is, you can compute one or several summary statistics for each group.
The result of the aggregation is a `DataFrame` containing the key columns along with new columns holding the computed
statistics for a corresponding group.
For example, `count()` computes size of group:
<!---FUN notebook_test_quickstart_12-->
```kotlin
groupedByIsIntellij.count()
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_12.html" width="705px" height="500px"></inline-frame>
Compute several statistics with `.aggregate { }` that provides an expression for aggregating:
<!---FUN notebook_test_quickstart_13-->
```kotlin
groupedByIsIntellij.aggregate {
// Compute sum and max of "starsCount" within each group into "sumStars" and "maxStars" columns
sumOf { starsCount } into "sumStars"
maxOf { starsCount } into "maxStars"
}
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_13.html" width="705px" height="500px"></inline-frame>
## Sorting Rows
`.sort {}`/`.sortByDesc` sortes rows by value in selected columns, returning a DataFrame with sorted rows. `take(n)`
returns a new `DataFrame` with the first `n` rows.
Combine them to get Top-10 repositories by number of stars:
<!---FUN notebook_test_quickstart_14-->
```kotlin
val dfTop10 = dfWithIsIntellij
// Sort by "starsCount" value descending
.sortByDesc { starsCount }.take(10)
dfTop10
```
<!---END-->
<inline-frame src="./resources/notebook_test_quickstart_14.html" width="705px" height="500px"></inline-frame>
## Plotting With Kandy
Kandy is a Kotlin plotting library designed to bring Kotlin DataFrame features into chart creation, providing a
convenient and typesafe way to build data visualizations.
Kandy can be loaded into notebook using `%use kandy`:
```kotlin
%use kandy
```
Build a simple bar chart with `.plot { }` extension for DataFrame, that allows to use extension properties inside Kandy
plotting DSL (plot will be rendered as an output after cell execution):
<!---FUN notebook_test_quickstart_16-->
```kotlin
dfTop10.plot {
bars {
x(name)
y(starsCount)
}
layout.title = "Top 10 JetBrains repositories by stars count"
}
```
<!---END-->
![notebook_test_quickstart_16](notebook_test_quickstart_16.svg)
## Write DataFrame
A `DataFrame` supports writing to all formats that it is capable of reading.
Write into Excel:
<!---FUN notebook_test_quickstart_17-->
```kotlin
dfWithIsIntellij.writeExcel("jb_repos.xlsx")
```
<!---END-->
## What's Next?
In this quickstart, we covered the basics — reading data, transforming it, and building a simple visualization.
Ready to go deeper? Check out whats next:
- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets,
API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.
- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do.
- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md).
- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
and make working with your data both convenient and type-safe.
- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
for auto-generated column access in your IntelliJ IDEA projects.
- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations learning
[Kandy Documentation](https://kotlin.github.io/kandy).