Files
2026-02-08 11:20:43 -10:00

4.0 KiB
Vendored

@DataSchema Declarations

DataSchema can be used as an argument for cast and convertTo functions. It provides typed data access for raw dataframes you read from I/O sources and serves as a starting point for the compiler plugin to derive schema changes.

Example 1:

@DataSchema
interface Person { 
  val firstName: String
}

Generated code:

val DataRow<Person>.firstName: Int = this["firstName"] as String
val ColumnsScope<Person>.firstName: DataColumn<Int> = this["firstName"] as DataColumn<String>

Example 2:

@DataSchema
interface Person {
    @ColumnName("first_name")
    val firstName: String
}

ColumnName annotation changes how generated extension properties pull the data from a dataframe:

Generated code:

val DataRow<Person>.firstName: Int = this["first_name"] as String
val ColumnsScope<Person>.firstName: DataColumn<Int> = this["first_name"] as DataColumn<String>

Generated extension properties are used to access values in DataRow and to access columns in ColumnsScope, which is either DataFrame or ColumnSelectionDsl

DataRow:

val row = df[0]
row.firstName
df.filter { firstName.startsWith("L") }
df.add("newCol") { firstName }

DataFrame:

val col = df.firstName
val value = col[0]

ColumnSelectionDsl:

df.convert { firstName }.with { it.uppercase() }
df.select { firstName }
df.rename { firstName }.into("name")

Data Class

DataSchema can be a top-level data class, in which case two additional API become available

@DataSchema
class WikiData(val name: String, val paradigms: List<String>)
  1. dataFrameOf overload that creates a dataframe instance from objects
val languages = dataFrameOf(
    WikiData("Kotlin", listOf("object-oriented", "functional", "imperative")), 
    WikiData("Haskell", listOf("Purely functional")),
    WikiData("C", listOf("imperative")),
    WikiData("Pascal", listOf("imperative")),
    WikiData("Idris", listOf("functional")),
)
  1. append overload that takes an object and appends it as a row
val ocaml = WikiData("OCaml", listOf("functional", "imperative", "modular", "object-oriented"))
val languages1 = languages.append(ocaml)

Schemas for nested structures

Nested structure can be a JSON that you read from a file.

[
    {
        "id": "1",
        "participants": [
            {
                "name": {
                    "firstName": "Alice",
                    "lastName": "Cooper"
                },
                "age": 15,
                "city": "London"
            },
            {
                "name": {
                    "firstName": "Bob",
                    "lastName": "Dylan"
                },
                "age": 45,
                "city": "Dubai"
            }
        ]
    },
    {
        "id": "2",
        "participants": [
            {
                "name": {
                    "firstName": "Charlie",
                    "lastName": "Daniels"
                },
                "age": 20,
                "city": "Moscow"
            },
            {
                "name": {
                    "firstName": "Charlie",
                    "lastName": "Chaplin"
                },
                "age": 40,
                "city": "Milan"
            }
        ]
    }
]

You get dataframe with this schema

id: String
participants: *
    name:
        firstName: String
        lastName: String
    age: Int
    city: String
  • participants is FrameColumn
  • name is ColumnGroup

Here's the data schema that matches it:

@DataSchema
data class Group(
    val id: String, 
    val participants: List<Person>
)

@DataSchema
data class Person(
    val name: Name,
    val age: Int, 
    val city: String?
)

@DataSchema
data class Name(
    val firstName: String,
    val lastName: String,
)
val url = "https://raw.githubusercontent.com/Kotlin/dataframe/refs/heads/master/data/participants.json"
val df = DataFrame.readJson(url).cast<Group>()