Skip to main content

SQLAlchemy driver for duckdb

Project description

duckdb_driver

This project is a fork of duckdb-engine.

Supported Python Versions PyPI version PyPI Downloads codecov

Basic SQLAlchemy driver for DuckDB

Installation

$ pip install duckdb-driver

Usage

Once you've installed this package, you should be able to just use it, as SQLAlchemy does a python path search

from sqlalchemy import Column, Integer, Sequence, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import Session

Base = declarative_base()


class FakeModel(Base):  # type: ignore
    __tablename__ = "fake"

    id = Column(Integer, Sequence("fakemodel_id_sequence"), primary_key=True)
    name = Column(String)


eng = create_engine("duckdb:///:memory:")
Base.metadata.create_all(eng)
session = Session(bind=eng)

session.add(FakeModel(name="Frank"))
session.commit()

frank = session.query(FakeModel).one()

assert frank.name == "Frank"

Usage in IPython/Jupyter

With IPython-SQL and DuckDB-Driver you can query DuckDB natively in your notebook! Check out DuckDB's documentation or Alex Monahan's great demo of this on his blog.

Configuration

You can configure DuckDB by passing connect_args to the create_engine function

create_engine(
    'duckdb:///:memory:',
    connect_args={
        'read_only': False,
        'config': {
            'memory_limit': '500mb'
        }
    }
)

The supported configuration parameters are listed in the DuckDB docs

How to register a pandas DataFrame

conn = create_engine("duckdb:///:memory:").connect()

# with SQLAlchemy 1.3
conn.execute("register", ("dataframe_name", pd.DataFrame(...)))

# with SQLAlchemy 1.4+
conn.execute(text("register(:name, :df)"), {"name": "test_df", "df": df})

conn.execute("select * from dataframe_name")

Things to keep in mind

Duckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the duckdb_driver dialect is derived from the postgresql dialect, SQLAlchemy may try to use PostgreSQL-only features. Below are some caveats to look out for.

Auto-incrementing ID columns

When defining an Integer column as a primary key, SQLAlchemy uses the SERIAL datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the SQLAlchemy.Sequence() object to auto-increment the key. For more information on sequences, you can find the SQLAlchemy Sequence documentation here.

The following example demonstrates how to create an auto-incrementing ID column for a simple table:

>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> metadata = sqlalchemy.MetaData(engine)
>>> user_id_seq = sqlalchemy.Sequence('user_id_seq')
>>> users_table = sqlalchemy.Table(
...     'users',
...     metadata,
...     sqlalchemy.Column(
...         'id',
...         sqlalchemy.Integer,
...         user_id_seq,
...         server_default=user_id_seq.next_value(),
...         primary_key=True,
...     ),
... )
>>> metadata.create_all(bind=engine)

Pandas read_sql() chunksize

NOTE: this is no longer an issue in versions >=0.5.0 of duckdb

The pandas.read_sql() method can read tables from duckdb_engine into DataFrames, but the sqlalchemy.engine.result.ResultProxy trips up when fetchmany() is called. Therefore, for now chunksize=None (default) is necessary when reading duckdb tables into DataFrames. For example:

>>> import pandas as pd
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> df = pd.read_sql('users', engine)                ### Works as expected
>>> df = pd.read_sql('users', engine, chunksize=25)  ### Throws an exception

Unsigned integer support

Unsigned integers are supported by DuckDB, and are available in duckdb_engine.datatypes.

Alembic Integration

SQLAlchemy's companion library alembic can optionally be used to manage database migrations.

This support can be enabling by adding an Alembic implementation class for the duckdb dialect.

from alembic.ddl.impl import DefaultImpl

class AlembicDuckDBImpl(DefaultImpl):
    """Alembic implementation for DuckDB."""

    __dialect__ = "duckdb"

After loading this class with your program, Alembic will no longer raise an error when generating or applying migrations.

Preloading extensions (experimental)

DuckDB 0.9.0+ includes builtin support for autoinstalling and autoloading of extensions, see the extension documentation for more information.

Until the DuckDB python client allows you to natively preload extensions, I've added experimental support via a connect_args parameter

from sqlalchemy import create_engine

create_engine(
    'duckdb:///:memory:',
    connect_args={
        'preload_extensions': ['https'],
        'config': {
            's3_region': 'ap-southeast-1'
        }
    }
)

Registering Filesystems

DuckDB allows registering filesystems from fsspec, see documentation for more information.

Support is provided under connect_args parameter

from sqlalchemy import create_engine
from fsspec import filesystem

create_engine(
    'duckdb:///:memory:',
    connect_args={
        'register_filesystems': [filesystem('gcs')],
    }
)

The name

Yes, i thought forking and looking to maintain this was a good moment to also rename it to duckdb-driver.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckdb_driver-0.16.0.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duckdb_driver-0.16.0-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file duckdb_driver-0.16.0.tar.gz.

File metadata

  • Download URL: duckdb_driver-0.16.0.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duckdb_driver-0.16.0.tar.gz
Algorithm Hash digest
SHA256 342886695f0cf0938b3e7a4442e871caedef431ba667b29bc5e4da754341be26
MD5 619e8aca97a6d3bdab73af918b89ec1c
BLAKE2b-256 26542a90eb6c1127492c73e377a11dd31dec4302ee972a3fe89bfdf480b130b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckdb_driver-0.16.0.tar.gz:

Publisher: publish.yaml on wolkwork/duckdb_driver

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file duckdb_driver-0.16.0-py3-none-any.whl.

File metadata

  • Download URL: duckdb_driver-0.16.0-py3-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duckdb_driver-0.16.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c9bafcf6e11f2564a6ea38fab50e3ddae3764ed4400ff3f440688bc10e63511
MD5 bfb721bd9ac1390ecde23a263049d81d
BLAKE2b-256 20fdbe6c663500e9a5e178a58bdb35becfe73c1244808c4225ebe11f34ada664

See more details on using hashes here.

Provenance

The following attestation bundles were made for duckdb_driver-0.16.0-py3-none-any.whl:

Publisher: publish.yaml on wolkwork/duckdb_driver

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page