Package to make working with duckdb easier
Project description
ezduckdb
Tools to make working with duckdb easier for codenym.
Not intended for general use, but feel free to steal code or ideas.
If you'd be super excited about this being made for general use, reach out.
Acknowledgements
There's a ton code and ideas in here from a dagster blog post
Installation
pip install ezduckdb
Usage
There are 3 classes in this library:
S3AwarePath:pathlib.Path+ s3 pathsSQL: Work with sql files programatically via templating.DuckDB: Connection and Query manager
S3AwarePath
S3AwarePath adds functionality to the pathlib.Path class.
is_s3: Is path an s3 path (ies3://....)get_s3_bucketandget_s3_prefix: Break path for use with boto3- Retain
s3://when cast to string (ie in f strings) get_table_name: Get db table name from file name based on codenym convention<schema>_<table>.<extension>
from ezduckdb import S3AwarePath
s3_path = S3AwarePath("s3://bucket/curated/s1chema_table1.csv")
assert inp.get_s3_bucket() == "bucket"
assert inp.get_s3_prefix() == "curated/s1chema_table1.csv"
assert str(inp) == "s3://bucket/curated/s1chema_table1.csv"
assert inp.is_s3()
assert inp.get_table_name() == ("s1chema", "table1")
SQL
SQL enable type based templating for programatical sql query generation for duckdb.
Non-exhaustive list of replacements:
pd.DataFrameis converted todf_<id>in the query to enable pandas queryingStrare replaced with the string value enclosed in single quotesIntare replaced with the value without quotesSQLreplaces recusively for nested querying
Basic
from ezduckdb import SQL
example = SQL("SELECT * FROM $table WHERE id = $id", table="foo", id=1)
assert inp.to_string() == "SELECT * FROM 'foo' WHERE id = 1"
Pandas
from ezduckdb import SQL
import pandas as pd
df = pd.DataFrame({"id": [1, 2, 3]})
inp = SQL("SELECT * FROM $table", table=df)
assert inp.to_string() == "SELECT * FROM df_" + str(id(df))
Nested
from ezduckdb import SQL
example = SQL("SELECT * FROM $table", table=SQL("SELECT * FROM $table", table="foo"))
assert inp.to_string() == "SELECT * FROM (SELECT * FROM 'foo')"
DuckDB
DuckDB is a connection manager for duckdb that has some convenience methods for querying.
- If
s3_storage_used=Truethenquerymethod will:- Load
httpfsandawsduckdb extensions - call
load_aws_credentialspassing theaws_profile.
- Load
querymethod will:- Do all sql templating for
SQLobject. - Return a
pd.DataFrameof the results if applicable
- Do all sql templating for
- Provide a context manager for pure sql querying with strings
Templated Querying (Querying with SQL objects)
Basic Querying
from ezduckdb import DuckDB
import pandas as pd
db = DuckDB(s3_storage_used=False)
assert db.query(SQL("select 1")).values == pd.DataFrame([(1,)]).values
Pandas Querying
from ezduckdb import DuckDB
import pandas as pd
db = DuckDB(s3_storage_used=False)
df = pd.DataFrame({"id": [1, 2, 3]})
actual = db.query(SQL("SELECT * FROM $table", table=df))
expected = pd.DataFrame([(1,), (2,), (3,)])
assert (actual.values == expected.values).all()
S3 querying
from ezduckdb import DuckDB
import pandas as pd
db = DuckDB(s3_storage_used=True)
s3_path = "s3://codenym-automated-testing/ezduckdb/parquet/schema1_table1.parquet"
actual = db.query(SQL("SELECT * FROM read_parquet($s3_path)", s3_path=s3_path))
expected = pd.DataFrame([[1, 4], [2, 5], [3, 6]])
assert (actual.values == expected.values).all()
Context Manager (Querying with Strings)
from ezduckdb import DuckDB
import pandas as pd
with DuckDB(s3_storage_used=False) as conn:
assert conn.query("select 1").df().values == pd.DataFrame([(1,)]).values
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ezduckdb-0.0.8.tar.gz.
File metadata
- Download URL: ezduckdb-0.0.8.tar.gz
- Upload date:
- Size: 10.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37500810eb9f98b89118ea1ad7d858a514e6f9d21570cc2fa21bf214af47583b
|
|
| MD5 |
389b403bce809f9cc762b2c9635cda08
|
|
| BLAKE2b-256 |
3b86f78d8417c6d9c6794364f9401e4de2e315febd3536c3d426dd1cedbbeab7
|
File details
Details for the file ezduckdb-0.0.8-py3-none-any.whl.
File metadata
- Download URL: ezduckdb-0.0.8-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ce3debef3b3f484cca994b52d216a8ca053efbcc792fb45d5d700b688c35f44
|
|
| MD5 |
d6d72c982d3f3c2ffdc0de3c44efffd8
|
|
| BLAKE2b-256 |
be46e22b44e7d65d395292733a6da3e6e1444138e1ea3b01fd28342734dc805b
|