Skip to main content

Python wrapper over the HuggingFace datasets library that makes it easier to load and convert datasets.

Project description

dataset-manager-py

Python wrapper over the HuggingFace datasets library that makes it easier to load and convert datasets.

# Import the DatasetManager class
from dataset.manager import DatasetManager

# Instantiate a new DataManager object
manager = DatasetManager()

# Download a dataset from the HuggingFace Hub
dataset = manager.load_from_hub(dataset_name="cuad")

# Calling dataset will print out the top-level detail about the dataset
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 22450
    })
    test: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 4182
    })
})

# You can also save the dataset to disk
manager.save_to_disk(path="cuad-dataset")

# And reload the dataset from disk
reloaded_dataset = manager.load_from_disk(path="cuad-dataset")

# It's also possible to compress the dataset into either a zip file or a tarball
# Defaults to the 'zip' format
manager.archive_dataset(dataset_dir="cuad-dataset", archive_path=".", archive_format="zip")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataset-manager-py-0.1.1.tar.gz (6.8 kB view hashes)

Uploaded Source

Built Distribution

dataset_manager_py-0.1.1-py3-none-any.whl (6.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page