df-research/dataframe/docs/StardustDocs/topics/extensionPropertiesApi.md

[//]: # (title: Extension Properties API)

When working with a [`DataFrame`](DataFrame.md), the most convenient and reliable way
to access its columns — including for operations and retrieving column values
in row expressions — is through *auto-generated extension properties*.
They are generated based on a [dataframe schema](schemas.md),
with the name and type of properties inferred from the name and type of the corresponding columns.
It also works for all types of hierarchical dataframes.

> The behavior of data schema generation differs between the
> [Compiler Plugin](Compiler-Plugin.md) and [Kotlin Notebook](SetupKotlinNotebook.md).
>
> * In **Kotlin Notebook**, a schema is generated **only after cell execution** for
> `DataFrame` variables defined within that cell.
> * With the **Compiler Plugin**, a new schema is generated **after every operation**
> — but support for all operations is still in progress.
> Retrieving the schema for `DataFrame` read from a file or URL is **not yet supported** either.
>
> This behavior may change in future releases. See the [example](#example) below that demonstrates these differences.
{style="warning"}

## Example

Consider a simple hierarchical dataframe from
<resource src="example.csv"></resource>.

This table consists of two columns: `name`, which is a `String` column, and `info`,
which is a [**column group**](DataColumn.md#columngroup) containing two nested
[value columns](DataColumn.md#valuecolumn) —
`age` of type `Int`, and `height` of type `Double`.

<table width="705">
  <thead>
    <tr>
      <th>name</th>
      <th colspan="2">info</th>
    </tr>
    <tr>
      <th></th>
      <th>age</th>
      <th>height</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Alice</td>
      <td>23</td>
      <td>175.5</td>
    </tr>
    <tr>
      <td>Bob</td>
      <td>27</td>
      <td>160.2</td>
    </tr>
  </tbody>
</table>

<tabs>
<tab title="Kotlin Notebook">

Read the [`DataFrame`](DataFrame.md) from the CSV file:

```kotlin
val df = DataFrame.readCsv("example.csv")
```

**After cell execution** data schema and extensions for this `DataFrame` will be generated
so you can use extensions for accessing columns,
using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
and [DataRow API](DataRow.md):


```kotlin
// Get nested column
df.info.age
// Sort by multiple columns
df.sortBy { name and info.height }
// Filter rows using a row condition.
// These extensions express the exact value in the row
// with the corresponding type:
df.filter { name.startsWith("A") && info.age >= 16 }
```

If you change the dataframe's schema by changing any column [name](rename.md),
or [type](convert.md) or [add](add.md) a new one, you need to
run a cell with a new [`DataFrame`](DataFrame.md) declaration first.
For example, rename the `name` column into "firstName":

```kotlin
val dfRenamed = df.rename { name }.into("firstName")
```

After running the cell with the code above, you can use `firstName` extensions in the following cells:

```kotlin
dfRenamed.firstName
dfRenamed.rename { firstName }.into("name")
dfRenamed.filter { firstName == "Nikita" }
```

See the [](quickstart.md) in Kotlin Notebook with basic Extension Properties API examples.

</tab>
<tab title="Compiler Plugin">

For now, if you read [`DataFrame`](DataFrame.md) from a file or URL, you need to define its schema manually.
You can do it quickly with [`generate..()` methods](DataSchemaGenerationMethods.md).

Define schemas:
```kotlin
@DataSchema
data class PersonInfo(
    val age: Int,
    val height: Float
)

@DataSchema
data class Person(
    val info: PersonInfo,
    val name: String
)
```

Read the [`DataFrame`](DataFrame.md) from the CSV file and specify the schema with
[`.convertTo()`](convertTo.md) or [`cast()`](cast.md):

```kotlin
val df = DataFrame.readCsv("example.csv").convertTo<Person>()
```

Extensions for this `DataFrame` will be generated automatically by the plugin,
so you can use extensions for accessing columns,
using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
and [DataRow API](DataRow.md).


```kotlin
// Get nested column
df.info.age
// Sort by multiple columns
df.sortBy { name and info.height }
// Filter rows using a row condition.
// These extensions express the exact value in the row
// with the corresponding type:
df.filter { name.startsWith("A") && info.age >= 16 }
```

Moreover, new extensions will be generated on-the-fly after each schema change:
by changing any column [name](rename.md),
or [type](convert.md) or [add](add.md) a new one.
For example, rename the `name` column into "firstName" and then we can use `firstName` extensions
in the following operations:

```kotlin
// Rename "name" column into "firstName"
df.rename { name }.into("firstName")
    // Can use `firstName` extension in the row condition
    // right after renaming
    .filter { firstName == "Nikita" }
```

See [Compiler Plugin Example](https://github.com/Kotlin/dataframe/tree/plugin_example/examples/kotlin-dataframe-plugin-gradle-example)
IDEA project with basic Extension Properties API examples.
</tab>
</tabs>

## Properties name generation

By default, each extension property is generated with a name equal to the original column name.

```kotlin
val df = dataFrameOf("size_in_inches" to listOf(..))
df.size_in_inches
```

If the original column name cannot be used as a property name (for example, if it contains spaces
or has a name equal to a keyword in Kotlin),
it will be enclosed in backticks.

```kotlin
val df = dataFrameOf("size in inches" to listOf(..))
df.`size in inches`
```

However, sometimes the original column name contains special symbols
and can't be used as a property name in backticks.
In such cases, special symbols in the auto-generated property name will be replaced.

```kotlin
val df = dataFrameOf("size\nin:inches" to listOf(..))
df.`size in - inches`
```

> In such cases, use [**`rename`**](rename.md) to update column names,
> or [**`renameToCamelCase`**](rename.md#renametocamelcase) to convert all column names
> in a `DataFrame` to `camelCase`, which is the idiomatic and widely preferred naming style in Kotlin.

If you don't want to change the actual column name, but you need a convenient accessor for this column,
you can use the `@ColumnName` annotation in a manually declared [data schema](schemas.md).
It allows you to use a property name different
from the original column name without changing the column's actual name:

```kotlin
@DataSchema
interface Info {
    @ColumnName("size\nin:inches")
    val sizeInInches: Double
}
```

```kotlin
val df = dataFrameOf("size\nin:inches" to listOf(..)).cast<Info>()
df.sizeInInches
```