Skip to main content

Easy management of source data, intermediate data, and results for data science projects

Project description

Data Workspaces is an open source framework for maintaining the state of a data science project, including data sets, intermediate data, results, and code. It supports reproducability through snapshotting and lineage models and collaboration through a push/pull model inspired by source control systems like Git.

Data Workspaces is installed as a Python 3 package and provides a Git-like command line interface and programming APIs. Specific data science tools and workflows are supported through extensions called kits. Currently, this includes Scikit-learn, TensorFlow, and Jupyter Notebooks. The goal is to provide the reproducibility and collaboration benefits with minimal changes to your current projects and processes.

Data Workspaces runs on Unix-like systems, including Linux, MacOS, and on Windows via the Windows Subsystem for Linux.

Quick Start

Please see the Quickstart Section of the documentation.


The documentation is available here: The source for the documentation is under docs. To build it locally, install Sphinx and run the following:

cd docs
pip install -r requirements.txt # extras needed to build the docs
make html

To view the local documentation, open the file docs/_build/html/index.html in your browser.


This code is copyright 2018 - 2021 by the Max Planck Institute for Software Systems and Benedat LLC. It is licensed under the Apache 2.0 license. See the file LICENSE.txt for details.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for dataworkspaces, version 1.5.2
Filename, size File type Python version Upload date Hashes
Filename, size dataworkspaces-1.5.2-py3-none-any.whl (184.1 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size dataworkspaces-1.5.2.tar.gz (154.7 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page