Skip to main content

Data science collaboration tool based on iPython notebooks.

Project description

DAGpy is a data science collaboration tool based on iPython notebooks enabling data science teams to:

  • easily collaborate by branching out of others’ notebooks
  • minimize code duplication
  • give a clean overview of the project
  • cache intermediate outputs so team members can use them without re-evaluation
  • automate the process of code execution upon data changes or on schedule
  • provide a clean interface to the data visualization dashboard designers and developers

DAGpy manages a DAG (directed acyclic graph) of blocks of code, with each block being a sequence of iPython notebook cells, together with their outputs. It is designed to work seamlessly with popular VC systems like git and can be run locally or as a server application.

GitHub: github.com/ibestvina/dagpy/.

Author: Ivan Bestvina

Example project

To play around with the example project, you can:

  • view the project DAG: dagpy view
  • run all the blocks: dagpy execute -a
  • add blocks through flows (with block B as a parent) and run them automatically: dagpy makeflow B -r
  • commit the changes: dagpy submitflow dagpy_flow.ipynb
  • explore other DAGpy options with dagpy -h

Please note that notebook execution time includes a significant overhead of over a second, because a kernel must be started for each one. In future, we plan on adding support for non-notebook plane python blocks. These would also be edited through a flow notebook view, but would be saved as .py scripts, and executed without noticable overhead.

Dependencies:

  • python 3
  • jupyter
  • dill
  • networkx, matplotlib (for DAG view)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for dagpy, version 0.3.2
Filename, size File type Python version Upload date Hashes
Filename, size dagpy-0.3.2.zip (17.5 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page