Skip to main content

pipeline library

Project description

image0 image1 image2 image4

Drain is a lightweight framework for writing reproducible data science workflows in Python. The core features are:

  • Turn a Python workflow (DAG) into steps that can be run by a tool like make.
  • Transparently pass the results of one step as the input to another, handling any caching that the user requests using efficient tools like HDF and joblib.
  • Enable easy parallel execution of workflows.
  • Execute only those steps that are determined to be necessary based on timestamps (both source code and data) and dependencies, virtually guaranteeing reproducibility of results and efficient development.

Drain is designed around these principles:

  • Simplicity: drain is very lightweight and easy to use. The core is just a few hundred lines of code. The steps you write in drain get executed with minimal overhead, making drain workflows easy to debug and manage.
  • Reusability: Drain leverages mature tools drake to execute workflows. Drain provides a library of steps for data science workflows including feature generation and selection, model fitting and comparison.
  • Generality: Virtually any workflow can be realized in drain. The core was written with extensibility in mind so new storage backends and job schedulers, for example, will be easy to incorporate.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
drain-0.0.6-py2.py3-none-any.whl (49.0 kB) Copy SHA256 hash SHA256 Wheel py2.py3
drain-0.0.6.tar.gz (117.5 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page