This is a benchmark that tests various data-centric aspects of improving the quality of machine learning workflows.
Project description
A benchmark of data-centric tasks from across the machine learning lifecycle.
Getting Started | What is dcbench? | Docs | Contributing | Website | About
⚡️ Quickstart
pip install dcbench
Optional: some parts of Meerkat rely on optional dependencies. If you know which optional dependencies you'd like to install, you can do so using something like
pip install dcbench[dev]
instead. See setup.py for a full list of optional dependencies.
Installing from dev:
pip install "dcbench[dev] @ git+https://github.com/data-centric-ai/dcbench@main"
Using a Jupyter notebook or some other interactive environment, you can import the library and explore the data-centric problems in the benchmark:
import dcbench
dcbench.tasks
To learn more, follow the walkthrough in the docs.
💡 What is dcbench?
This benchmark evaluates the steps in your machine learning workflow beyond model training and tuning. This includes feature cleaning, slice discovery, and coreset selection. We call these “data-centric” tasks because they're focused on exploring and manipulating data – not training models. dcbench
supports a growing list of them:
dcbench
includes tasks that look very different from one another: the inputs and
outputs of the slice discovery task are not the same as those of the
minimal data cleaning task. However, we think it important that
researchers and practitioners be able to run evaluations on data-centric
tasks across the ML lifecycle without having to learn a bunch of
different APIs or rewrite evaluation scripts.
So, dcbench
is designed to be a common home for these diverse, but
related, tasks. In dcbench
all of these tasks are structured in a
similar manner and they are supported by a common Python API that makes
it easy to download data, run evaluations, and compare methods.
✉️ About
dcbench
is being developed alongside the data-centric-ai benchmark. Reach out to Bojan Karlaš (karlasb [at] inf [dot] ethz [dot] ch) and Sabri Eyuboglu (eyuboglu [at] stanford [dot] edu if you would like to get involved or contribute!)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dcbench-0.0.4.tar.gz
.
File metadata
- Download URL: dcbench-0.0.4.tar.gz
- Upload date:
- Size: 35.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79bd98e3c14d981645050831d7cfc6b9d6445e25580e8df9f94214d106df9be1 |
|
MD5 | e16e4ffad386ab289e820913d4ca88ee |
|
BLAKE2b-256 | 533b68340c1eb45f2f5dad61a24f8fc135ac7875b362c1eaabbded2c2b97d5c7 |
File details
Details for the file dcbench-0.0.4-py2.py3-none-any.whl
.
File metadata
- Download URL: dcbench-0.0.4-py2.py3-none-any.whl
- Upload date:
- Size: 49.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be190ab2b9ff008306de9ba710d841673c73f7e30da85c7e1c91cb11b6bf128b |
|
MD5 | 667d7e06537c702f228b4e289d61f30f |
|
BLAKE2b-256 | f550ac8b933dd3358219f256a8777cedf0ae7bc373b612e4932148e8e368f240 |