Files
2026-02-08 11:20:43 -10:00

6.2 KiB
Vendored
Raw Permalink Blame History

Data Schemas Generation From Existing DataFrame

Generate useful Kotlin definitions based on your DataFrame structure. Generate useful Kotlin definitions based on your DataFrame structure. Generate useful Kotlin definitions based on your DataFrame structure.

Special utility functions that generate code of useful Kotlin definitions (returned as a String) based on the current DataFrame schema.

generateDataClasses

inline fun <reified T> DataFrame<T>.generateDataClasses(
    markerName: String? = null,
    extensionProperties: Boolean = false,
    visibility: MarkerVisibility = MarkerVisibility.IMPLICIT_PUBLIC,
    useFqNames: Boolean = false,
    nameNormalizer: NameNormalizer = NameNormalizer.default,
): CodeString

Generates Kotlin data classes corresponding to the DataFrame schema (including all nested DataFrame columns and column groups).

Useful when you want to:

  • Work with the data as regular Kotlin data classes.
  • Convert a dataframe to instantiated data classes with df.toListOf<DataClassType>().
  • Work with data classes serialization.
  • Extract structured types for further use in your application.

Arguments

  • markerName: String? — The base name to use for generated data classes.
    If null, uses the T type argument of DataFrame simple name.
    Default: null.
  • extensionProperties: Boolean Whether to generate extension properties in addition to interface declarations.
    Useful if you don't use the compiler plugin, otherwise they are not needed; the compiler plugin, notebooks, and older Gradle/KSP plugin generate them automatically. Default: false.
  • visibility: MarkerVisibility Visibility modifier for the generated declarations.
    Default: MarkerVisibility.IMPLICIT_PUBLIC.
  • useFqNames: Boolean If true, fully qualified type names will be used in generated code.
    Default: false.
  • nameNormalizer: NameNormalizer Strategy for converting column names (with spaces, underscores, etc.) to Kotlin-style identifiers. Generated properties will still refer to columns by their actual name using the @ColumnName annotation. Default: NameNormalizer.default.

Returns

  • CodeString A value class wrapper for String, containing
    the generated Kotlin code of data class declarations and optionally extension properties.

Examples

df.generateDataClasses("Customer")

Output:

@DataSchema
data class Customer1(
    val amount: Double,
    val orderId: Int
)

@DataSchema
data class Customer(
    val orders: List<Customer1>,
    val user: String
)

Add these classes to your project and convert the DataFrame to a list of typed objects:

val customers: List<Customer> = df.cast<Customer>().toList()

generateInterfaces

inline fun <reified T> DataFrame<T>.generateInterfaces(): CodeString

fun <T> DataFrame<T>.generateInterfaces(markerName: String): CodeString

Generates @DataSchema interfaces for this DataFrame (including all nested DataFrame columns and column groups) as Kotlin interfaces.

This is useful when working with the compiler plugin in cases where the schema cannot be inferred automatically from the source.

Arguments

  • markerName: String? — The base name to use for generated interfaces.
    If null, uses the T type argument of DataFrame simple name.
    Default: null.
  • extensionProperties: Boolean Whether to generate extension properties in addition to interface declarations.
    Useful if you don't use the compiler plugin, otherwise they are not needed; the compiler plugin, notebooks, and older Gradle/KSP plugin generate them automatically. Default: false.
  • visibility: MarkerVisibility Visibility modifier for the generated declarations.
    Default: MarkerVisibility.IMPLICIT_PUBLIC.
  • useFqNames: Boolean If true, fully qualified type names will be used in generated code.
    Default: false.
  • nameNormalizer: NameNormalizer Strategy for converting column names (with spaces, underscores, etc.) to Kotlin-style identifiers. Generated properties will still refer to columns by their actual name using the @ColumnName annotation. Default: NameNormalizer.default.

Returns

  • CodeString A value class wrapper for String, containing
    the generated Kotlin code of @DataSchema interfaces without extension properties.

Examples

df

df.generateInterfaces()

Output:

@DataSchema(isOpen = false)
interface _DataFrameType11 {
    val amount: kotlin.Double
    val orderId: kotlin.Int
}

@DataSchema
interface _DataFrameType1 {
    val orders: List<_DataFrameType11>
    val user: kotlin.String
}

By adding these interfaces to your project with the compiler plugin enabled,
you'll gain full support for the extension properties API and type-safe operations.

Use cast to apply the generated schema to a DataFrame:

df.cast<_DataFrameType1>().filter { orders.all { orderId >= 102 } }