3.2 KiB
Vendored
DuckDB
Work with DuckDB databases in Kotlin — read tables and queries into DataFrames using JDBC. Use Kotlin DataFrame to query and transform DuckDB data directly via JDBC. Read DuckDB data into Kotlin DataFrame with JDBC support.Kotlin DataFrame supports reading from DuckDB databases using JDBC.
This requires the dataframe-jdbc module,
which is included by default in the general dataframe artifact
and in %use dataframe for Kotlin Notebook.
You’ll also need the official DuckDB JDBC driver:
dependencies {
implementation("org.duckdb:duckdb_jdbc:$version")
}
USE {
dependencies("org.duckdb:duckdb_jdbc:$version")
}
The actual Maven Central driver version can be found here.
Read
A DataFrame instance can be loaded from a database in several ways:
a user can read data from a SQL table by a given name (readSqlTable),
as the result of a user-defined SQL query (readSqlQuery),
or from a given ResultSet (readResultSet).
It is also possible to load all data from non-system tables, each into a separate DataFrame (
readAllSqlTables).
val url = "jdbc:duckdb:/testDatabase"
val username = "duckdb"
val password = "password"
val dbConfig = DbConnectionConfig(url, username, password)
val tableName = "Customer"
val df = DataFrame.readSqlTable(dbConfig, tableName)
Extensions
DuckDB has a special trick up its sleeve: it has support for extensions. These can be installed, loaded, and used to connect to a different database via DuckDB. See Core Extensions for a list of available extensions.
For example, let's load a dataframe from Apache Iceberg via DuckDB, as Iceberg is an unsupported data source in DataFrame at the moment:
// Creating an in-memory DuckDB database
val connection = DriverManager.getConnection("jdbc:duckdb:")
val df = connection.use { connection ->
// install and load Iceberg
connection.createStatement().execute("INSTALL iceberg; LOAD iceberg;")
// query a table from Iceberg using a specific SQL query
DataFrame.readSqlQuery(
connection = connection,
sqlQuery = "SELECT * FROM iceberg_scan('data/iceberg/lineitem_iceberg', allow_moved_paths = true);",
)
}
As you can see, the process is very similar to reading from any other JDBC database, just without needing explicit DataFrame support.