Skip to main content

Easy management of source data, intermediate data, and results for data science projects

Project description

Data Workspaces is an open source framework for maintaining the state of a data science project, including data sets, intermediate data, results, and code. It supports reproducability through snapshotting and lineage models and collaboration through a push/pull model inspired by source control systems like Git.

Data Workspaces is installed as a Python 3 package and provides a Git-like command line interface and programming APIs. Specific data science tools and workflows are supported through extensions called kits. Currently, this includes Scikit-learn, TensorFlow, and Jupyter Notebooks. The goal is to provide the reproducibility and collaboration benefits with minimal changes to your current projects and processes.

Data Workspaces runs on Unix-like systems, including Linux, MacOS, and on Windows via the Windows Subsystem for Linux.

https://github.com/data-workspaces/data-workspaces-core/workflows/Run-tests-on-push/badge.svg

Quick Start

Please see the Quickstart Section of the documentation.

Documentation

The documentation is available here: https://data-workspaces-core.readthedocs.io/en/latest/. The source for the documentation is under docs. To build it locally, install Sphinx and run the following:

cd docs
pip install -r requirements.txt # extras needed to build the docs
make html

To view the local documentation, open the file docs/_build/html/index.html in your browser.

License

This code is copyright 2018 - 2021 by the Max Planck Institute for Software Systems and Benedat LLC. It is licensed under the Apache 2.0 license. See the file LICENSE.txt for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataworkspaces-1.6.0.tar.gz (154.8 kB view details)

Uploaded Source

Built Distribution

dataworkspaces-1.6.0-py3-none-any.whl (184.2 kB view details)

Uploaded Python 3

File details

Details for the file dataworkspaces-1.6.0.tar.gz.

File metadata

  • Download URL: dataworkspaces-1.6.0.tar.gz
  • Upload date:
  • Size: 154.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/24.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.31.1 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.11

File hashes

Hashes for dataworkspaces-1.6.0.tar.gz
Algorithm Hash digest
SHA256 e6057958784abaab0dacfbd6314a8484e8a7e2f678a6fe08e577052f343da76c
MD5 258e675da6ede21084fc9b1053ab6a6b
BLAKE2b-256 74ab5eefe538b100ccebe97838d2c8eeda120332c4efbd30bd31a56b96fb64af

See more details on using hashes here.

File details

Details for the file dataworkspaces-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: dataworkspaces-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 184.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/24.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.31.1 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.11

File hashes

Hashes for dataworkspaces-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1e9266388725313a135b4ed9bdcea93c7e4e1e9910d812799b18303ae62e6a9f
MD5 d2363a4515694225dfd701f3ccb9e8db
BLAKE2b-256 d359929cb77053d7f38fef9fa29d26b3bde491227377adc48ec133ef92fe028e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page