# Quickstart Guide Get started with Kotlin DataFrame in a few simple steps: load data, transform it, and visualize it — all in an interactive Kotlin Notebook.

Get started with Kotlin DataFrame right away — integrate it seamlessly and load process, analyze and visualize some data!

Learn the basics of Kotlin DataFrame: reading data, applying transformations, and building plots — with full interactivity in Kotlin Notebook. This guide shows how to quickly get started with **Kotlin DataFrame**: you'll learn how to load data, perform basic transformations, and build a simple plot using Kandy. We recommend [starting with **Kotlin Notebook**](SetupKotlinNotebook.md) for the best beginner experience — everything works out of the box, including interactivity and rich DataFrame and plots rendering. You can instantly see the results of each operation: view the contents of your DataFrames after every transformation, inspect individual rows and columns, and explore data step-by-step in a live and interactive way. You can view this guide as a [notebook on GitHub](https://github.com/Kotlin/dataframe/tree/master/examples/notebooks/quickstart/quickstart.ipynb) or download . To start working with Kotlin DataFrame in a notebook, run the cell with the next code: ```kotlin %useLatestDescriptors %use dataframe ``` This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame rendering. Learn more [here](SetupKotlinNotebook.md#integrate-kotlin-dataframe). ## Read DataFrame Kotlin DataFrame supports all popular data formats, including CSV, JSON, and Excel, as well as reading from various databases. Read a CSV with the "Jetbrains Repositories" dataset into `df` variable: ```kotlin val df = DataFrame.readCsv( "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv", ) ``` ## Display And Explore To display your dataframe as a cell output, place it in the last line of the cell: ```kotlin df ``` Kotlin Notebook has special interactive outputs for `DataFrame`. Learn more about them here. Use `.describe()` method to get dataset summaries — column types, number of nulls, and simple statistics. ```kotlin df.describe() ``` ## Select Columns Kotlin DataFrame features a typesafe Columns Selection DSL, enabling flexible and safe selection of any combination of columns. Column selectors are widely used across operations — one of the simplest examples is `.select { }`, which returns a new DataFrame with only the columns chosen in Columns Selection expression. *After executing the cell* where a `DataFrame` variable is declared, [extension properties](extensionPropertiesApi.md) for its columns are automatically generated. These properties can then be used in the Columns Selection DSL expression for typesafe and convenient column access. Select some columns: ```kotlin // Select "full_name", "stargazers_count" and "topics" columns val dfSelected = df.select { full_name and stargazers_count and topics } dfSelected ``` > With a [Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md) enabled, > you can use auto-generated properties in your IntelliJ IDEA projects. ## Row Filtering Some operations use the [DataRow API](DataRow.md), with expressions and conditions that apply for all `DataFrame` rows. For example, `.filter { }` that returns a new `DataFrame` with rows that satisfy a condition given by row expression. Inside a row expression, you can access the values of the current row by column names through auto-generated properties. Similar to the [Columns Selection DSL](ColumnSelectors.md), but in this case the properties represent actual values, not column references. Filter rows by "stargazers_count" value: ```kotlin // Keep only rows where "stargazers_count" value is more than 1000 val dfFiltered = dfSelected.filter { stargazers_count >= 1000 } dfFiltered ``` ## Columns Rename Columns can be renamed using the `.rename { }` operation, which also uses the Columns Selection DSL to select a column to rename. The `rename` operation does not perform the renaming immediately; instead, it creates an intermediate object that must be finalized into a new `DataFrame` by calling the `.into()` function with the new column name. Rename "full_name" and "stargazers_count" columns: ```kotlin // Rename "full_name" column into "name" val dfRenamed = dfFiltered.rename { full_name }.into("name") // And "stargazers_count" into "starsCount" .rename { stargazers_count }.into("starsCount") dfRenamed ``` ## Modify Columns Columns can be modified using the `update { }` and `convert { }` operations. Both operations select columns to modify via the Columns Selection DSL and, similar to `rename`, create an intermediate object that must be finalized to produce a new `DataFrame`. The `update` operation preserves the original column types, while `convert` allows changing the type. In both cases, column names and their positions remain unchanged. Update "name" and convert "topics": ```kotlin val dfUpdated = dfRenamed // Update "name" values with only its second part (after '/') .update { name }.with { it.split("/")[1] } // Convert "topics" `String` values into `List` by splitting: .convert { topics }.with { it.removePrefix("[").removeSuffix("]").split(", ") } dfUpdated ``` Check the new "topics" type out: ```kotlin dfUpdated.topics.type() ``` Output: ``` kotlin.collections.List ``` ## Adding New Columns The `.add { }` function allows creating a `DataFrame` with a new column, where the value for each row is computed based on the existing values in that row. These values can be accessed within the row expressions. Add a new `Boolean` column "isIntellij": ```kotlin // Add a `Boolean` column indicating whether the `name` contains the "intellij" substring // or the topics include "intellij". val dfWithIsIntellij = dfUpdated.add("isIntellij") { name.contains("intellij") || "intellij" in topics } dfWithIsIntellij ``` ## Grouping And Aggregating A `DataFrame` can be grouped by column keys, meaning its rows are split into groups based on the values in the key columns. The `.groupBy { }` operation selects columns and groups the `DataFrame` by their values, using them as grouping keys. The result is a `GroupBy` — a `DataFrame`-like structure that associates each key with the corresponding subset of the original `DataFrame`. Group `dfWithIsIntellij` by "isIntellij": ```kotlin val groupedByIsIntellij = dfWithIsIntellij.groupBy { isIntellij } groupedByIsIntellij ``` A `GroupBy` can be aggregated — that is, you can compute one or several summary statistics for each group. The result of the aggregation is a `DataFrame` containing the key columns along with new columns holding the computed statistics for a corresponding group. For example, `count()` computes size of group: ```kotlin groupedByIsIntellij.count() ``` Compute several statistics with `.aggregate { }` that provides an expression for aggregating: ```kotlin groupedByIsIntellij.aggregate { // Compute sum and max of "starsCount" within each group into "sumStars" and "maxStars" columns sumOf { starsCount } into "sumStars" maxOf { starsCount } into "maxStars" } ``` ## Sorting Rows `.sort {}`/`.sortByDesc` sortes rows by value in selected columns, returning a DataFrame with sorted rows. `take(n)` returns a new `DataFrame` with the first `n` rows. Combine them to get Top-10 repositories by number of stars: ```kotlin val dfTop10 = dfWithIsIntellij // Sort by "starsCount" value descending .sortByDesc { starsCount }.take(10) dfTop10 ``` ## Plotting With Kandy Kandy is a Kotlin plotting library designed to bring Kotlin DataFrame features into chart creation, providing a convenient and typesafe way to build data visualizations. Kandy can be loaded into notebook using `%use kandy`: ```kotlin %use kandy ``` Build a simple bar chart with `.plot { }` extension for DataFrame, that allows to use extension properties inside Kandy plotting DSL (plot will be rendered as an output after cell execution): ```kotlin dfTop10.plot { bars { x(name) y(starsCount) } layout.title = "Top 10 JetBrains repositories by stars count" } ``` ![notebook_test_quickstart_16](notebook_test_quickstart_16.svg) ## Write DataFrame A `DataFrame` supports writing to all formats that it is capable of reading. Write into Excel: ```kotlin dfWithIsIntellij.writeExcel("jb_repos.xlsx") ``` ## What's Next? In this quickstart, we covered the basics — reading data, transforming it, and building a simple visualization. Ready to go deeper? Check out what’s next: - 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets, API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame. - 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do. - 🧠 **Understand the design** and core concepts in the [library overview](concepts.md). - 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)** and make working with your data both convenient and type-safe. - 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)** for auto-generated column access in your IntelliJ IDEA projects. - 📊 **Master Kandy** for stunning and expressive DataFrame visualizations learning [Kandy Documentation](https://kotlin.github.io/kandy).