Skip to main content

This is a benchmark that tests various data-centric aspects of improving the quality of machine learning workflows.

Project description

banner

GitHub Workflow Status GitHub Documentation Status pre-commit PyPI - Python Version codecov

A benchmark of data-centric tasks from across the machine learning lifecycle.

Getting Started | What is dcbench? | Docs | Contributing | Website | About

⚡️ Quickstart

pip install dcbench

Optional: some parts of Meerkat rely on optional dependencies. If you know which optional dependencies you'd like to install, you can do so using something like pip install dcbench[dev] instead. See setup.py for a full list of optional dependencies.

Installing from dev: pip install "dcbench[dev] @ git+https://github.com/data-centric-ai/dcbench@main"

Using a Jupyter notebook or some other interactive environment, you can import the library and explore the data-centric problems in the benchmark:

import dcbench
dcbench.tasks

To learn more, follow the walkthrough in the docs.

💡 What is dcbench?

This benchmark evaluates the steps in your machine learning workflow beyond model training and tuning. This includes feature cleaning, slice discovery, and coreset selection. We call these “data-centric” tasks because they're focused on exploring and manipulating data – not training models. dcbench supports a growing list of them:

dcbench includes tasks that look very different from one another: the inputs and outputs of the slice discovery task are not the same as those of the minimal data cleaning task. However, we think it important that researchers and practitioners be able to run evaluations on data-centric tasks across the ML lifecycle without having to learn a bunch of different APIs or rewrite evaluation scripts.

So, dcbench is designed to be a common home for these diverse, but related, tasks. In dcbench all of these tasks are structured in a similar manner and they are supported by a common Python API that makes it easy to download data, run evaluations, and compare methods.

✉️ About

dcbench is being developed alongside the data-centric-ai benchmark. Reach out to Bojan Karlaš (karlasb [at] inf [dot] ethz [dot] ch) and Sabri Eyuboglu (eyuboglu [at] stanford [dot] edu if you would like to get involved or contribute!)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcbench-0.0.4.tar.gz (35.4 kB view details)

Uploaded Source

Built Distribution

dcbench-0.0.4-py2.py3-none-any.whl (49.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file dcbench-0.0.4.tar.gz.

File metadata

  • Download URL: dcbench-0.0.4.tar.gz
  • Upload date:
  • Size: 35.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for dcbench-0.0.4.tar.gz
Algorithm Hash digest
SHA256 79bd98e3c14d981645050831d7cfc6b9d6445e25580e8df9f94214d106df9be1
MD5 e16e4ffad386ab289e820913d4ca88ee
BLAKE2b-256 533b68340c1eb45f2f5dad61a24f8fc135ac7875b362c1eaabbded2c2b97d5c7

See more details on using hashes here.

File details

Details for the file dcbench-0.0.4-py2.py3-none-any.whl.

File metadata

  • Download URL: dcbench-0.0.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 49.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for dcbench-0.0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 be190ab2b9ff008306de9ba710d841673c73f7e30da85c7e1c91cb11b6bf128b
MD5 667d7e06537c702f228b4e289d61f30f
BLAKE2b-256 f550ac8b933dd3358219f256a8777cedf0ae7bc373b612e4932148e8e368f240

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page