Disdat: data versioning
Project description
Note: Disdat 1.0 no longer contains the instrumented form of Luigi. Disdat-Luigi now resides here. Want to build versioned pipelines? pip install disdat-luigi Want to just use the Disdat API? pip install disdat
Disdat is a Python (3.9+) package for data versioning that allows data scientists to create, share, and track data products. Disdat organizes data into bundles, collections of literal values and files – bundles are the unit at which data is versioned and shared. Disdat provides an API for creating, finding, and publishing bundles to cloud storage (e.g., AWS S3).
Disdat-Luigi uses this API to instrument Spotify’s Luigi, so you can build pipelines that automatically create bundles, making it easy to share the latest outputs with other users and pipelines. Instead of lengthy email conversations with multiple file attachments, searching through Slack for the most recent S3 file path, users can instead dsdt pull awesome_data to get the latest ‘awesome_data.’
Disdat’s bundle API and pipelines provide:
Simplified pipelines – Users implement two functions per task: requires and run.
Enhanced re-execution logic – Disdat re-runs processing steps when code or data changes.
Data versioning/lineage – Disdat records code and data versions for each output data set.
Share data sets – Users may push and pull data to remote contexts hosted in AWS S3.
Auto-docking – Disdat dockerizes pipelines so that they can run locally or execute on the cloud.
Find our latest documentation on gitbook here!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file disdat-1.1.5.tar.gz.
File metadata
- Download URL: disdat-1.1.5.tar.gz
- Upload date:
- Size: 712.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d5eb731037fb385b9284e919b4e2f3afb7d1b4b547231db2f08f8fb76e095fe
|
|
| MD5 |
d97cacf7ad259db040fc587a041371c2
|
|
| BLAKE2b-256 |
51906fc869f430705154c2dc016afd9f91573cc8a42f222fa2806fbc931505e8
|
File details
Details for the file disdat-1.1.5-py3-none-any.whl.
File metadata
- Download URL: disdat-1.1.5-py3-none-any.whl
- Upload date:
- Size: 90.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91e9b1a1e79d5b7180c8c23826ebbccb9189ffca88c4f22548e324ba16dafdee
|
|
| MD5 |
4a5afafff86b4d69b01e883324b962f7
|
|
| BLAKE2b-256 |
a666e46e114ec88afd830125f76bbcf34596fe92ce09d5f73c5ce975845efa86
|