init research

This commit is contained in:
2026-02-08 11:20:43 -10:00
commit bdf064f54d
3041 changed files with 1592200 additions and 0 deletions
@@ -0,0 +1,22 @@
# Custom SQL Source
<web-summary>
Connect Kotlin DataFrame to any JDBC-compatible database using a custom SQL source configuration.
</web-summary>
<card-summary>
Easily integrate unsupported SQL databases in Kotlin DataFrame using a flexible custom source setup.
</card-summary>
<link-summary>
Define a custom SQL source in Kotlin DataFrame to work with any JDBC-based database.
</link-summary>
If your SQL database is not officially supported, you can either
[create an issue](https://github.com/Kotlin/dataframe/issues)
or define a simple, configurable custom SQL source.
See the [How to Extend DataFrame Library for Custom SQL Database Support guide](readSqlFromCustomDatabase.md)
for detailed instructions and an example with HSQLDB.
@@ -0,0 +1,107 @@
# DuckDB
<web-summary>
Work with DuckDB databases in Kotlin — read tables and queries into DataFrames using JDBC.
</web-summary>
<card-summary>
Use Kotlin DataFrame to query and transform DuckDB data directly via JDBC.
</card-summary>
<link-summary>
Read DuckDB data into Kotlin DataFrame with JDBC support.
</link-summary>
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.io.DuckDb-->
Kotlin DataFrame supports reading from [DuckDB](https://duckdb.org/) databases using JDBC.
This requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [the official DuckDB JDBC driver](https://duckdb.org/docs/stable/clients/java):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("org.duckdb:duckdb_jdbc:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("org.duckdb:duckdb_jdbc:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version can be found
[here](https://mvnrepository.com/artifact/org.duckdb/duckdb_jdbc).
## Read
A [`DataFrame`](DataFrame.md) instance can be loaded from a database in several ways:
a user can read data from a SQL table by a given name ([`readSqlTable`](readSqlDatabases.md)),
as the result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([
`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
<!---FUN readSqlTable-->
```kotlin
val url = "jdbc:duckdb:/testDatabase"
val username = "duckdb"
val password = "password"
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
<!---END-->
### Extensions
DuckDB has a special trick up its sleeve: it has support
for [extensions](https://duckdb.org/docs/stable/extensions/overview).
These can be installed, loaded, and used to connect to a different database via DuckDB.
See [Core Extensions](https://duckdb.org/docs/stable/core_extensions/overview) for a list of available extensions.
For example, let's load a dataframe
from [Apache Iceberg via DuckDB](https://duckdb.org/docs/stable/core_extensions/iceberg/overview.html),
as Iceberg is an unsupported data source in DataFrame at the moment:
<!---FUN readIcebergExtension-->
```kotlin
// Creating an in-memory DuckDB database
val connection = DriverManager.getConnection("jdbc:duckdb:")
val df = connection.use { connection ->
// install and load Iceberg
connection.createStatement().execute("INSTALL iceberg; LOAD iceberg;")
// query a table from Iceberg using a specific SQL query
DataFrame.readSqlQuery(
connection = connection,
sqlQuery = "SELECT * FROM iceberg_scan('data/iceberg/lineitem_iceberg', allow_moved_paths = true);",
)
}
```
<!---END-->
As you can see, the process is very similar to reading from any other JDBC database,
just without needing explicit DataFrame support.
@@ -0,0 +1,98 @@
# H2
<web-summary>
Use Kotlin DataFrame to query H2 databases via JDBC — read tables, run SQL queries, or fetch result sets directly.
</web-summary>
<card-summary>
Connect to H2 databases in Kotlin DataFrame and load data using simple JDBC configurations.
</card-summary>
<link-summary>
Read from H2 databases in Kotlin DataFrame using built-in SQL reading methods.
</link-summary>
Kotlin DataFrame supports reading from an [H2](https://www.h2database.com/html/main.html) database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [the official H2 JDBC driver](https://www.h2database.com/html/main.html):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("com.h2database:h2:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("com.h2database:h2:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/com.h2database/h2).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
### H2 Compatibility Modes
When working with H2 database, the library automatically detects the compatibility mode from the connection.
If no `MODE` is specified in the JDBC URL, the default `Regular` mode is used.
H2 supports the following compatibility modes: `MySQL`, `PostgreSQL`, `MSSQLServer`, `MariaDB`, and `Regular`.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
// Basic H2 connection (uses Regular mode by default)
val url = "jdbc:h2:mem:testDatabase"
val username = "sa"
val password = ""
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
// H2 with PostgreSQL compatibility mode
val postgresUrl = "jdbc:h2:mem:testDatabase;MODE=PostgreSQL"
val username = "sa"
val password = ""
val postgresConfig = DbConnectionConfig(postgresUrl, username, password)
val tableName = "Customer"
val dfPostgres = DataFrame.readSqlTable(postgresConfig, tableName)
```
@@ -0,0 +1,72 @@
# MariaDB
<web-summary>
Access MariaDB databases using Kotlin DataFrame and JDBC — fetch data from tables or custom SQL queries with ease.
</web-summary>
<card-summary>
Seamlessly integrate MariaDB with Kotlin DataFrame — load data using JDBC and analyze it in Kotlin.
</card-summary>
<link-summary>
Read data from MariaDB into Kotlin DataFrame using standard JDBC configurations.
</link-summary>
Kotlin DataFrame supports reading from [MariaDB](https://mariadb.org) database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [the official MariaDB JDBC driver](https://mariadb.com/docs/connectors/mariadb-connector-j):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("org.mariadb.jdbc:mariadb-java-client:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("org.mariadb.jdbc:mariadb-java-client:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/org.mariadb.jdbc/mariadb-java-client).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
val url = "jdbc:mariadb://localhost:3306/testDatabase"
val username = "root"
val password = "password"
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
@@ -0,0 +1,74 @@
# Microsoft SQL Server (MS SQL)
<web-summary>
Connect to Microsoft SQL Server using Kotlin DataFrame and JDBC — load structured data directly into your Kotlin workflow.
</web-summary>
<card-summary>
Use Kotlin DataFrame to read from Microsoft SQL Server — run queries or load entire tables via JDBC.
</card-summary>
<link-summary>
Fetch data from Microsoft SQL Server into Kotlin DataFrame using JDBC configuration.
</link-summary>
Kotlin DataFrame supports reading from [Microsoft SQL Server (MS SQL)](https://www.microsoft.com/en-us/sql-server)
database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need
[the official MS SQL JDBC driver](https://learn.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-ver17):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("com.microsoft.sqlserver:mssql-jdbc:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("com.microsoft.sqlserver:mssql-jdbc:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
val url = "jdbc:sqlserver://localhost:1433;databaseName=testDatabase"
val username = "sa"
val password = "password"
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
@@ -0,0 +1,72 @@
# MySQL
<web-summary>
Connect to MySQL databases and load data into Kotlin DataFrame using JDBC — query, analyze, and transform SQL data in Kotlin.
</web-summary>
<card-summary>
Use Kotlin DataFrame with MySQL — easily read tables and queries over JDBC into powerful data structures.
</card-summary>
<link-summary>
Read data from MySQL into Kotlin DataFrame using JDBC configuration.
</link-summary>
Kotlin DataFrame supports reading from [MySQL](https://www.mysql.com) database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [the official MySQL JDBC driver](https://dev.mysql.com/downloads/connector/j/):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("com.mysql:mysql-connector-j:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("com.mysql:mysql-connector-j:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/com.mysql/mysql-connector-j).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
val url = "jdbc:mysql://localhost:3306/testDatabase"
val username = "root"
val password = "password"
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
@@ -0,0 +1,71 @@
# PostgreSQL
<web-summary>
Work with PostgreSQL databases in Kotlin — read tables and queries into DataFrames using JDBC.
</web-summary>
<card-summary>
Use Kotlin DataFrame to query and transform PostgreSQL data directly via JDBC.
</card-summary>
<link-summary>
Read PostgreSQL data into Kotlin DataFrame with JDBC support.
</link-summary>
Kotlin DataFrame supports reading from [PostgreSQL](https://www.postgresql.org) database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [the official PostgreSQL JDBC driver](https://jdbc.postgresql.org):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("org.postgresql:postgresql:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("org.postgresql:postgresql:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/org.postgresql/postgresql).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
val url = "jdbc:postgresql://localhost:5432/testDatabase"
val username = "postgres"
val password = "password"
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```
@@ -0,0 +1,46 @@
# SQL
<web-summary>
Work with SQL databases in Kotlin using DataFrame and JDBC — read tables and queries with ease.
</web-summary>
<card-summary>
Connect to PostgreSQL, MySQL, SQLite, and other SQL databases using Kotlin DataFrame's JDBC support.
</card-summary>
<link-summary>
Load data from SQL databases into Kotlin DataFrame using JDBC and built-in reading functions.
</link-summary>
Kotlin DataFrame supports reading from SQL databases using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need a JDBC driver for the specific database.
## Supported databases
Kotlin DataFrame provides out-of-the-box support for the most common SQL databases:
- [PostgreSQL](PostgreSQL.md)
- [MySQL](MySQL.md)
- [Microsoft SQL Server](Microsoft-SQL-Server.md)
- [SQLite](SQLite.md)
- [H2](H2.md)
- [MariaDB](MariaDB.md)
- [DuckDB](DuckDB.md)
You can also define a [Custom SQL Source](Custom-SQL-Source.md)
to work with any other JDBC-compatible database.
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame`
([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
@@ -0,0 +1,70 @@
# SQLite
<web-summary>
Use Kotlin DataFrame to read data from SQLite databases with minimal setup via JDBC.
</web-summary>
<card-summary>
Query and transform SQLite data directly in Kotlin using DataFrame and JDBC.
</card-summary>
<link-summary>
Read SQLite tables into Kotlin DataFrame using the built-in JDBC integration.
</link-summary>
Kotlin DataFrame supports reading from [SQLite](https://www.sqlite.org) database using JDBC.
Requires the [`dataframe-jdbc` module](Modules.md#dataframe-jdbc),
which is included by default in the general [`dataframe` artifact](Modules.md#dataframe-general)
and in [`%use dataframe`](SetupKotlinNotebook.md#integrate-kotlin-dataframe) for Kotlin Notebook.
Youll also need [SQLite JDBC driver](https://github.com/xerial/sqlite-jdbc):
<tabs>
<tab title="Gradle project">
```kotlin
dependencies {
implementation("org.xerial:sqlite-jdbc:$version")
}
```
</tab>
<tab title="Kotlin Notebook">
```kotlin
USE {
dependencies("org.xerial:sqlite-jdbc:$version")
}
```
</tab>
</tabs>
The actual Maven Central driver version could be found
[here](https://mvnrepository.com/artifact/org.xerial/sqlite-jdbc).
## Read
[`DataFrame`](DataFrame.md) can be loaded from a database in several ways:
a user can read data from a SQL table by given name ([`readSqlTable`](readSqlDatabases.md)),
as a result of a user-defined SQL query ([`readSqlQuery`](readSqlDatabases.md)),
or from a given `ResultSet` ([`readResultSet`](readSqlDatabases.md)).
It is also possible to load all data from non-system tables, each into a separate `DataFrame` ([`readAllSqlTables`](readSqlDatabases.md)).
See [](readSqlDatabases.md) for more details.
```kotlin
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
import org.jetbrains.kotlinx.dataframe.api.*
val url = "jdbc:sqlite:testDatabase.db"
val dbConfig = DbConnectionConfig(url)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
```