Skip to main content

HuggingFace community-driven open-source library for dataset disaggregation

Project description

Disaggregators

The disaggregators library allows you to easily add new features to your datasets to enable disaggregated data exploration and disaggregated model evaluation. disaggregators is preloaded with disaggregation modules intended for text and image data.

disaggregators is intended to be used with 🤗 Datasets, but should work with any other "mappable" interface to a dataset.

Requirements and Installation

disaggregators has been tested on Python 3.10.8.

pip install disaggregators will fetch the latest release from PyPI.

To install directly from this GitHub repo, use the following command:

pip install git+https://github.com/huggingface/disaggregators.git

Usage

You will likely want to use 🤗 Datasets with disaggregators.

pip install datasets

The snippet below loads the IMDB dataset from the Hugging Face Hub, and initializes a disaggregator for "pronouns" that will run on the IMDB dataset's "text" column. Note that if you would like to run multiple disaggregations, you can pass a list to the Disaggregator constructor (e.g. Disaggregator(["pronouns", "sentiment"], column="text")). We then use the 🤗 Datasets map method to apply the disaggregation to the dataset.

from disaggregators import Disaggregator
from datasets import load_dataset

dataset = load_dataset("imdb", split="train")
disaggregator = Disaggregator("pronouns", column="text")

ds = dataset.map(disaggregator.get_function())  # New boolean columns are added for she/her, he/him, and they/them

The resulting dataset can now be used for data exploration and disaggregated model evaluation.

Contact

Nima Boscarino – nima <at> huggingface <dot> co

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

disaggregators-0.1.1.tar.gz (9.9 kB view hashes)

Uploaded Source

Built Distribution

disaggregators-0.1.1-py3-none-any.whl (9.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page