Dask + Deltalake
Project description
Dask Deltalake
Reads and write to deltalake from Dask leveraging delta-rs
Dask Deltalake Reader
Reads data from Deltalake with Dask
To Try out the package:
pip install dask_deltalake
Features:
- Reads the parquet files based on delta logs parallely using dask engine
- Supports all three filesystem like s3, azurefs, gcsfs
- Supports some delta features like
- Time Travel
- Schema evolution
- parquet filters
- row filter
- partition filter
- Query Delta commit info - History
- vacuum the old/ unused parquet files
- load different versions of data using datetime.
Usage:
import dask_deltalake as ddl
# read delta table
ddl.read_delta("delta_path")
# read delta table for specific version
ddl.read_delta("delta_path",version=3)
# read delta table for specific datetime
ddl.read_delta("delta_path",datetime="2018-12-19T16:39:57-08:00")
# read delta complete history
ddl.read_delta_history("delta_path")
# read delta history upto given limit
ddl.read_delta_history("delta_path",limit=5)
# read delta history to delete the files
ddl.vacuum("delta_path",dry_run=False)
# Can read from S3,azure,gcfs etc.
ddl.read_delta("s3://bucket_name/delta_path",version=3)
# please ensure the credentials are properly configured as environment variable or
# configured as in ~/.aws/credential
# can connect with AWS Glue catalog and read the complete delta table (currently only AWS catalog available)
# will take expilicit AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from environment
# variables if available otherwise fallback to ~/.aws/credential
ddl.read_delta(catalog=glue,database_name="science",table_name="physics")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dask_deltalake-0.0.1.tar.gz
(9.6 kB
view hashes)
Built Distribution
Close
Hashes for dask_deltalake-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fdef3b67450035a1365bec5ad3649f353941bea92118398aa5b7e1293e06d17 |
|
MD5 | db0559a8732a58679ed14667e8f3c6b6 |
|
BLAKE2b-256 | cb06a585b7d1698db4171f9e97f6ddb0dc7dbc8eedf70b4ee69062897e459c04 |