Skip to main content

Query databases using Polars syntax — the Python equivalent of R's dbplyr

Project description

polars-db

CI Python 3.13+ codecov License: MIT

A Python library that lets you query relational databases using Polars syntax. Write expressions like pdb.col("age") > 30 and polars-db translates them into the appropriate SQL dialect for your backend.

Motivation

In Python, there is no standard way to manipulate database tables using DataFrame syntax. You either write raw SQL strings or learn a library-specific API like SQLAlchemy or Ibis — none of which feel like the DataFrame code you already write. R solved this years ago with dbplyr, which lets dplyr code run transparently against databases. polars-db brings the same idea to the Polars ecosystem.

Library API style Target audience
SQLAlchemy SQLAlchemy's own API General Python → SQL
Ibis Ibis's own API General DataFrame → SQL
SQLFrame PySpark API PySpark users → SQL
polars-db Polars API Polars users → SQL

Zero learning cost — If you already know Polars, you can query databases immediately. No new API to learn.

Type-safe unified pipelines — Local processing (Polars) and database queries (polars-db) share the same API, eliminating the implicit type coercion, NaN/None confusion, and dtype mismatches that plague pandas-based workflows.

Installation

pip install polars-db

DuckDB works out of the box. For other databases, install the corresponding extras:

# PostgreSQL
pip install polars-db[postgres]

# SQLite
pip install polars-db[sqlite]

# MySQL
pip install polars-db[mysql]

# SQL Server
pip install polars-db[sqlserver]

# BigQuery
pip install polars-db[bigquery]

# Multiple backends at once
pip install polars-db[postgres,mysql]

Supported Databases

Database Extras Driver Connection string example
DuckDB (none) duckdb duckdb:///:memory:
SQLite sqlite adbc-driver-sqlite sqlite:///path/to/db.sqlite
PostgreSQL postgres adbc-driver-postgresql postgresql://user:pass@host:5432/dbname
MySQL mysql PyMySQL mysql://user:pass@host:3306/dbname
SQL Server sqlserver pymssql mssql://user:pass@host:1433/dbname
BigQuery bigquery google-cloud-bigquery bigquery://project/dataset

Usage

Connecting

import polars_db as pdb

conn = pdb.connect("postgresql://user:pass@localhost:5432/mydb")

SELECT Queries

Build queries with the same API as Polars LazyFrame, then call collect() to execute.

# Basic select / filter / sort / limit
df = (
    conn.table("users")
    .filter(pdb.col("age") > 30)
    .select("name", "age")
    .sort("age", descending=True)
    .limit(10)
    .collect()
)
# GROUP BY with aggregation
df = (
    conn.table("sales")
    .group_by("product_id")
    .agg(
        pdb.col("amount").sum().alias("total"),
        pdb.col("id").count().alias("num_sales"),
    )
    .sort("total", descending=True)
    .collect()
)
# JOIN
users = conn.table("users")
orders = conn.table("orders")

df = (
    users.join(orders, on="user_id", how="left")
    .select("name", "amount")
    .collect()
)
# Window functions
df = (
    conn.table("sales")
    .with_columns(
        pdb.col("amount").sum().over("dept").alias("dept_total"),
    )
    .collect()
)
# Cumulative sum with ordering
df = (
    conn.table("sales")
    .with_columns(
        pdb.col("amount")
        .cum_sum()
        .over("dept", order_by="date")
        .alias("running_total"),
    )
    .collect()
)
# CASE WHEN
df = (
    conn.table("users")
    .with_columns(
        pdb.when(pdb.col("age") >= 18)
        .then(pdb.lit("adult"))
        .otherwise(pdb.lit("minor"))
        .alias("category"),
    )
    .collect()
)

Inspecting Generated SQL

query = conn.table("users").filter(pdb.col("age") > 30)
print(query.show_query())
# SELECT * FROM users WHERE age > 30

DDL / Raw SQL

The query builder focuses on SELECT statements. For DDL (CREATE TABLE, etc.) and DML (INSERT, etc.), use execute_raw():

conn.execute_raw("CREATE TABLE users (id INT, name TEXT, age INT)")
conn.execute_raw("INSERT INTO users VALUES (1, 'Alice', 30)")
conn.execute_raw("DROP TABLE IF EXISTS users")

Note: execute_raw() executes SQL as-is. Never pass unsanitized external input via string concatenation.

Development

Prerequisites

  • Python 3.13+
  • uv
  • Docker (for integration tests)

Setup

git clone https://github.com/b-trout/polars-db.git
cd polars-db
uv sync --all-groups --all-extras

Common Commands

This project uses Poe the Poet as a task runner.

# Lint & format
uv run poe lint          # ruff check
uv run poe format        # ruff format
uv run poe type-check    # ty check

# Unit tests (no database required)
uv run poe test-unit

# Integration tests (per backend)
docker compose up -d
POLARS_DB_TEST_BACKEND=postgres uv run poe test-integration

# Run all pre-commit checks (format -> lint -> type-check -> yaml -> docker-lint)
uv run poe pre-commit

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_db-0.1.1.tar.gz (79.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_db-0.1.1-py3-none-any.whl (40.7 kB view details)

Uploaded Python 3

File details

Details for the file polars_db-0.1.1.tar.gz.

File metadata

  • Download URL: polars_db-0.1.1.tar.gz
  • Upload date:
  • Size: 79.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polars_db-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2a844128aef6b2d1ee3b2b7d87ae8ebe0c6450f8ba29ab40d0e4bee4243e1258
MD5 0a5eaefaf4248703896d8bce934b9273
BLAKE2b-256 41125eeb46378424b4f6279b36dfc069806f36a0d2e5ef4bda34a96ce4007ccb

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_db-0.1.1.tar.gz:

Publisher: release.yml on b-trout/polars-db

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_db-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: polars_db-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 40.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polars_db-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 51e23916065beb1da3ea26736b887a6b6c100bd986f7a20392a0106d88c5325d
MD5 5df160a6f33cca496c17d43a1ecde0d4
BLAKE2b-256 7662fbfbb9a8e3824695183356a50fa4bfa8854b57ad6012f180a7253c1bafcd

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_db-0.1.1-py3-none-any.whl:

Publisher: release.yml on b-trout/polars-db

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page