Data science collaboration tool based on iPython notebooks.
Project description
DAGpy is a data science collaboration tool based on iPython notebooks enabling data science teams to:
- easily collaborate by branching out of others’ notebooks
- minimize code duplication
- give a clean overview of the project
- cache intermediate outputs so team members can use them without re-evaluation
- automate the process of code execution upon data changes or on schedule
- provide a clean interface to the data visualization dashboard designers and developers
DAGpy manages a DAG (directed acyclic graph) of blocks of code, with each block being a sequence of iPython notebook cells, together with their outputs. It is designed to work seamlessly with popular VC systems like git and can be run locally or as a server application.
Author: Ivan Bestvina
Example project
To play around with the example project, you can:
- view the project DAG: python program.py view
- run all the blocks: python program.py execute -a
- add blocks through flows (with block B as a parent) and run them automatically: python program.py makeflow B -r
- commit the changes: python program.py submitflow dagpy_flow.ipynb
- explore other DAGpy options with python program.py -h
Please note that notebook execution time includes a significant overhead of over a second, because a kernel must be started for each one. In future, we plan on adding support for non-notebook plane python blocks. These would also be edited through a flow notebook view, but would be saved as .py scripts, and executed without noticable overhead.
Dependencies:
- python 3
- jupyter
- dill
- networkx, matplotlib (for DAG view)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.