An extensible ML workflow framework built for data scientists and ML engineers.
Project description
Graphbook
The ML workflow framework
Report bug
·
Request feature
Overview • Status • Getting Started • Examples • Collaboration
Overview
Graphbook is a framework for building efficient, visual DAG-structured ML workflows composed of nodes written in Python. Graphbook provides common ML processing features such as multiprocessing IO and automatic batching, and it features a web-based UI to assemble, monitor, and execute data processing workflows. It can be used to prepare training data for custom ML models, experiment with custom trained or off-the-shelf models, and to build ML-based ETL applications. Custom nodes can be built in Python, and Graphbook will behave like a framework and call lifecycle methods on those nodes.
Status
Graphbook is in a very early stage of development, so expect minor bugs and rapid design changes through the coming releases. If you would like to report a bug or request a feature, please feel free to do so. We aim to make Graphbook serve our users in the best way possible.
Current Features
- Graph-based visual editor to experiment and create complex ML workflows
- Caches outputs and only re-executes parts of the workflow that changes between executions
- UI monitoring components for logs and outputs per node
- Custom buildable nodes with Python
- Automatic batching for Pytorch tensors
- Multiprocessing I/O to and from disk and network
- Customizable multiprocessing functions
- Ability to execute entire graphs, or individual subgraphs/nodes
- Ability to execute singular batches of data
- Ability to pause graph execution
- Basic nodes for filtering, loading, and saving outputs
- Node grouping and subflows
- Autosaving and shareable serialized workflow files
- Registers node code changes without needing a restart
- Monitorable CPU and GPU resource usage
- (BETA) Remote subgraphs for scaling workflows on other Graphbook services
Planned Features
- A
graphbook run
command to execute workflows in a CLI - Step/Resource functions with decorators to reduce verbosity
- Human-in-the-loop Steps for manual feedback/control during DAG execution
- All-code workflows, so users never have to leave their IDE
- UI extensibility
- And many optimizations for large data processing workloads
Getting Started
Install from PyPI
pip install graphbook
graphbook
- Visit http://localhost:8005
Install with Docker
- Pull and run the downloaded image
docker run --rm -p 8005:8005 -v $PWD/workflows:/app/workflows rsamf/graphbook:latest
- Visit http://localhost:8005
Visit the docs to learn more on how to create custom nodes and workflows with Graphbook.
Examples
We continually post examples of workflows and custom nodes in our examples repo.
Collaboration
Graphbook is in active development and very much welcomes contributors. This is a guide on how to run Graphbook in development mode. If you are simply using Graphbook, view the Getting Started section.
Run Graphbook in Development Mode
You can use any other virtual environment solution, but it is highly adviced to use poetry since our dependencies are specified in poetry's format.
- Clone the repo and
cd graphbook
poetry install --with dev
poetry shell
python graphbook/main.py
cd web
npm install
npm run dev
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for graphbook-0.5.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c52254f91ede212da2d31ab9c062ea7c516b07f73d84a298f9862f45f34de635 |
|
MD5 | 647c32ca63b859e2c7cf70a18852c6a0 |
|
BLAKE2b-256 | eba2eb37910b4fcbeaeb662a9db2e9a2cfa764c6bf4a48df55776907136fdc57 |