Skip to main content

Query databases using Polars syntax — the Python equivalent of R's dbplyr

Project description

polars-db

CI Python 3.13+ codecov License: MIT

A Python library that lets you query relational databases using Polars syntax. Write expressions like pdb.col("age") > 30 and polars-db translates them into the appropriate SQL dialect for your backend.

Motivation

In Python, there is no standard way to manipulate database tables using DataFrame syntax. You either write raw SQL strings or learn a library-specific API like SQLAlchemy or Ibis — none of which feel like the DataFrame code you already write. R solved this years ago with dbplyr, which lets dplyr code run transparently against databases. polars-db brings the same idea to the Polars ecosystem.

Library API style Target audience
SQLAlchemy SQLAlchemy's own API General Python → SQL
Ibis Ibis's own API General DataFrame → SQL
SQLFrame PySpark API PySpark users → SQL
polars-db Polars API Polars users → SQL

Zero learning cost — If you already know Polars, you can query databases immediately. No new API to learn.

Type-safe unified pipelines — Local processing (Polars) and database queries (polars-db) share the same API, eliminating the implicit type coercion, NaN/None confusion, and dtype mismatches that plague pandas-based workflows.

Installation

pip install polars-db

DuckDB and SQLite work out of the box. For other databases, install the corresponding extras:

# PostgreSQL
pip install polars-db[postgres]

# MySQL
pip install polars-db[mysql]

# SQL Server
pip install polars-db[sqlserver]

# BigQuery
pip install polars-db[bigquery]

# Multiple backends at once
pip install polars-db[postgres,mysql]

Supported Databases

Database Extras Driver Connection string example
DuckDB (none) duckdb duckdb:///:memory:
SQLite (none) sqlite3 (stdlib) sqlite:///path/to/db.sqlite
PostgreSQL postgres psycopg2 postgresql://user:pass@host:5432/dbname
MySQL mysql PyMySQL mysql://user:pass@host:3306/dbname
SQL Server sqlserver pymssql mssql://user:pass@host:1433/dbname
BigQuery bigquery google-cloud-bigquery bigquery://project/dataset

Usage

Connecting

import polars_db as pdb

conn = pdb.connect("postgresql://user:pass@localhost:5432/mydb")

SELECT Queries

Build queries with the same API as Polars LazyFrame, then call collect() to execute.

# Basic select / filter / sort / limit
df = (
    conn.table("users")
    .filter(pdb.col("age") > 30)
    .select("name", "age")
    .sort("age", descending=True)
    .limit(10)
    .collect()
)
# GROUP BY with aggregation
df = (
    conn.table("sales")
    .group_by("product_id")
    .agg(
        pdb.col("amount").sum().alias("total"),
        pdb.col("id").count().alias("num_sales"),
    )
    .sort("total", descending=True)
    .collect()
)
# JOIN
users = conn.table("users")
orders = conn.table("orders")

df = (
    users.join(orders, on="user_id", how="left")
    .select("name", "amount")
    .collect()
)
# Window functions
df = (
    conn.table("sales")
    .with_columns(
        pdb.col("amount").sum().over("dept").alias("dept_total"),
    )
    .collect()
)
# Cumulative sum with ordering
df = (
    conn.table("sales")
    .with_columns(
        pdb.col("amount")
        .cum_sum()
        .over("dept", order_by="date")
        .alias("running_total"),
    )
    .collect()
)
# CASE WHEN
df = (
    conn.table("users")
    .with_columns(
        pdb.when(pdb.col("age") >= 18)
        .then(pdb.lit("adult"))
        .otherwise(pdb.lit("minor"))
        .alias("category"),
    )
    .collect()
)

Inspecting Generated SQL

query = conn.table("users").filter(pdb.col("age") > 30)
print(query.show_query())
# SELECT * FROM users WHERE age > 30

DDL / Raw SQL

The query builder focuses on SELECT statements. For DDL (CREATE TABLE, etc.) and DML (INSERT, etc.), use execute_raw():

conn.execute_raw("CREATE TABLE users (id INT, name TEXT, age INT)")
conn.execute_raw("INSERT INTO users VALUES (1, 'Alice', 30)")
conn.execute_raw("DROP TABLE IF EXISTS users")

Note: execute_raw() executes SQL as-is. Never pass unsanitized external input via string concatenation.

Development

Prerequisites

  • Python 3.13+
  • uv
  • Docker (for integration tests)

Setup

git clone https://github.com/b-trout/polars-db.git
cd polars-db
uv sync --all-groups --all-extras

Common Commands

This project uses Poe the Poet as a task runner.

# Lint & format
uv run poe lint          # ruff check
uv run poe format        # ruff format
uv run poe type-check    # ty check

# Unit tests (no database required)
uv run poe test-unit

# Integration tests (per backend)
docker compose up -d
POLARS_DB_TEST_BACKEND=postgres uv run poe test-integration

# Run all pre-commit checks (format -> lint -> type-check -> yaml -> docker-lint)
uv run poe pre-commit

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_db-0.1.0.tar.gz (56.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_db-0.1.0-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file polars_db-0.1.0.tar.gz.

File metadata

  • Download URL: polars_db-0.1.0.tar.gz
  • Upload date:
  • Size: 56.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polars_db-0.1.0.tar.gz
Algorithm Hash digest
SHA256 10056f7516b281d9717baa6cbf0c8e7478bb2d79a9fddd99b5ff8097bb0bef73
MD5 a4846edd2ae963ff63ad70f4e7ee763c
BLAKE2b-256 38ef4bb958e9e14d4aadf6a25e784fec8be1d4f00fe97438b7f68619e7e4ffc3

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_db-0.1.0.tar.gz:

Publisher: release.yml on b-trout/polars-db

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_db-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: polars_db-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for polars_db-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bfdef67b8061a88b2c9bdff356a8b41caa6be8b9f7780dded11e16a12c6eca16
MD5 f3cd1a1029dc91d855acd3db28e6d511
BLAKE2b-256 5a5a509e9c884b8a3c1a7fac24d15d662ec6be1a793b2cf13b9e1eda19517f44

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_db-0.1.0-py3-none-any.whl:

Publisher: release.yml on b-trout/polars-db

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page