Skip to main content

Disdat: data versioning

Project description

Disdat Logo

https://badge.fury.io/py/disdat.svg

Note: Disdat 1.0 no longer contains the instrumented form of Luigi. Disdat-Luigi now resides here. Want to build versioned pipelines? pip install disdat-luigi Want to just use the Disdat API? pip install disdat

Disdat is a Python (3.9+) package for data versioning that allows data scientists to create, share, and track data products. Disdat organizes data into bundles, collections of literal values and files – bundles are the unit at which data is versioned and shared. Disdat provides an API for creating, finding, and publishing bundles to cloud storage (e.g., AWS S3).

Disdat-Luigi uses this API to instrument Spotify’s Luigi, so you can build pipelines that automatically create bundles, making it easy to share the latest outputs with other users and pipelines. Instead of lengthy email conversations with multiple file attachments, searching through Slack for the most recent S3 file path, users can instead dsdt pull awesome_data to get the latest ‘awesome_data.’

Disdat’s bundle API and pipelines provide:

  • Simplified pipelines – Users implement two functions per task: requires and run.

  • Enhanced re-execution logic – Disdat re-runs processing steps when code or data changes.

  • Data versioning/lineage – Disdat records code and data versions for each output data set.

  • Share data sets – Users may push and pull data to remote contexts hosted in AWS S3.

  • Auto-docking – Disdat dockerizes pipelines so that they can run locally or execute on the cloud.

Find our latest documentation on gitbook here!

Authors

Disdat could not have come to be without the support of Human Longevity, Inc. It has benefited from numerous discussions, code contributions, and emotional support from Sean Rowan, Ted Wong, Jonathon Lunt, Jason Knight, Axel Bernel, and Intuit, Inc..

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

disdat-1.1.5.tar.gz (712.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

disdat-1.1.5-py3-none-any.whl (90.4 kB view details)

Uploaded Python 3

File details

Details for the file disdat-1.1.5.tar.gz.

File metadata

  • Download URL: disdat-1.1.5.tar.gz
  • Upload date:
  • Size: 712.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for disdat-1.1.5.tar.gz
Algorithm Hash digest
SHA256 8d5eb731037fb385b9284e919b4e2f3afb7d1b4b547231db2f08f8fb76e095fe
MD5 d97cacf7ad259db040fc587a041371c2
BLAKE2b-256 51906fc869f430705154c2dc016afd9f91573cc8a42f222fa2806fbc931505e8

See more details on using hashes here.

File details

Details for the file disdat-1.1.5-py3-none-any.whl.

File metadata

  • Download URL: disdat-1.1.5-py3-none-any.whl
  • Upload date:
  • Size: 90.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for disdat-1.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 91e9b1a1e79d5b7180c8c23826ebbccb9189ffca88c4f22548e324ba16dafdee
MD5 4a5afafff86b4d69b01e883324b962f7
BLAKE2b-256 a666e46e114ec88afd830125f76bbcf34596fe92ce09d5f73c5ce975845efa86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page