Working with DuckDB¶
Using DuckDB with Python, Power BI, and Spark SQL
DuckDB is a modern open-source analytical database designed for fast OLAP workloads. It is lightweight, cross-platform, and easy to embed inside Python notebooks and scripts.
DuckDB files (.duckdb) behave similarly to SQLite files (.db):
no server, no configuration, and everything runs in-process.
Python¶
DuckDB integrates tightly with Python using the standard duckdb package.
import duckdb
conn = duckdb.connect("data/warehouse/smart_sales.duckdb")
conn.execute("SELECT COUNT(*) FROM fact_sale").fetchall()
DuckDB can query Pandas DataFrames directly, join across CSVs, and write Parquet files.
Power BI Desktop (Windows)¶
We can connect PowerBI to DuckDB using the DuckDB ODBC Driver.
- Download it from: https://duckdb.org/docs/api/odbc/overview.
- Install the 64-bit ODBC driver.
- Create a Windows DSN (Data Source Name):
- Open Windows Start menu and type ODBC Data Sources (64-bit)
- Click the app when it appears to launch it
- Select the "System DSN" tab.
- Click Add… and choose DuckDB ODBC Driver
- Click Finish
- Name it SmartSalesDuckDB
- Click Browse… and select your file, for example:
data/warehouse/smart_sales.duckdb
In PowerBI, select this DNS from Get Data / ODBC.
Apache Spark¶
Apache Spark can connect to DuckDB using the DuckDB JDBC Driver.
- Download the DuckDB JDBC
.jarfrom: https://github.com/duckdb/duckdb/releases - Save the
.jarfile into your repo at:lib/duckdb-jdbc.jar - Configure Spark:
spark = (
SparkSession.builder
.config("spark.driver.extraClassPath", "lib/duckdb-jdbc.jar")
.getOrCreate()
)
- Load a table:
df = spark.read.format("jdbc").options(
url="jdbc:duckdb:data/warehouse/smart_sales.duckdb",
dbtable="fact_sale"
).load()
OPTIONAL: DuckDB CLI (Command Line Interface)¶
This is not required.
DuckDB provides a simple, fast SQL shell for working with .duckdb files from the terminal.
To use it:
- Download the CLI for your OS from: https://duckdb.org/docs/installation/index
- Unzip the package
- Add the unzipped folder to your PATH
Once installed, we can open a database from the terminal:
duckdb data/warehouse/smart_sales.duckdb
Once inside the shell:
.tables
SELECT COUNT(*) FROM fact_sale;
Learn more at: https://duckdb.org/docs/sql/introduction.html