No project description provided
Project description
duckdb_engine
Very very very basic sqlalchemy driver for duckdb
Once you install this package, you should be able to just use it, as sqlalchemy does a python path search
from sqlalchemy import Column, Integer, Sequence, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import Session
Base = declarative_base()
class FakeModel(Base): # type: ignore
__tablename__ = "fake"
id = Column(Integer, Sequence("fakemodel_id_sequence"), primary_key=True)
name = Column(String)
eng = create_engine("duckdb:///:memory:")
Base.metadata.create_all(eng)
session = Session(bind=eng)
session.add(FakeModel(name="Frank"))
session.commit()
frank = session.query(FakeModel).one()
assert frank.name == "Frank"
How to register a pandas DataFrame
eng = create_engine("duckdb:///:memory:")
eng.execute("register", "dataframe_name", pd.DataFrame(...))
eng.execute("select * from dataframe_name")
Things to keep in mind
Duckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the duckdb_engine
dialect is derived from the postgresql
dialect, sqlalchemy
may try to use PostgreSQL-only features. Below are some caveats to look out for.
Auto-incrementing ID columns
When defining an Integer column as a primary key, sqlalchemy
uses the SERIAL
datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the sqlalchemy.Sequence()
object to auto-increment the key. For more information on sequences, you can find the sqlalchemy Sequence
documentation here.
The following example demonstrates how to create an auto-incrementing ID column for a simple table:
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> metadata = sqlalchemy.MetaData(engine)
>>> user_id_seq = sqlalchemy.Sequence('user_id_seq')
>>> users_table = sqlalchemy.Table(
... 'users',
... metadata,
... sqlalchemy.Column(
... 'id',
... sqlalchemy.Integer,
... user_id_seq,
... server_default=user_id_seq.next_value(),
... primary_key=True,
... ),
... )
>>> metadata.create_all(bind=engine)
Pandas read_sql()
chunksize
The pandas.read_sql()
method can read tables from duckdb_engine
into DataFrames, but the sqlalchemy.engine.result.ResultProxy
trips up when fetchmany()
is called. Therefore, for now chunksize=None
(default) is necessary when reading duckdb tables into DataFrames. For example:
>>> import pandas as pd
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> df = pd.read_sql('users', engine) ### Works as expected
>>> df = pd.read_sql('users', engine, chunksize=25) ### Throws an exception
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for duckdb_engine-0.1.8rc4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2da87850685a866c4339e5403853e467896bce3ab021203d84294c835132588b |
|
MD5 | cdf62206561e8811d5a8fede3451856c |
|
BLAKE2b-256 | a09a78e05f3d06e374aea2675a46e88b834604422c1221cba8c368571404af9c |