Python wrapper over the HuggingFace datasets library that makes it easier to load and convert datasets.
Project description
dataset-manager-py
Python wrapper over the HuggingFace datasets library that makes it easier to load and convert datasets.
# Import the DatasetManager class
from dataset.manager import DatasetManager
# Instantiate a new HuggingFaceDatasetLoader object
manager = DatasetManager()
# Download a dataset from the HuggingFace Hub
dataset = manager.load_from_hub(dataset_name="cuad")
# Calling dataset will print out the top-level detail about the dataset
dataset
DatasetDict({
train: Dataset({
features: ['id', 'title', 'context', 'question', 'answers'],
num_rows: 22450
})
test: Dataset({
features: ['id', 'title', 'context', 'question', 'answers'],
num_rows: 4182
})
})
# You can also save the dataset to disk
manager.save_to_disk(path="cuad-dataset")
# And reload the dataset from disk
reloaded_dataset = manager.load_from_disk(path="cuad-dataset")
# It's also possible to compress the dataset into either a zip file or a tarball
# Defaults to the 'zip' format
manager.archive_dataset(dataset_dir="cuad-dataset", archive_path=".", archive_format="zip")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for dataset_manager_py-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e18cf56980d0542530aca374c7c314b25ac996c05c95687dd073029b47034f65 |
|
MD5 | 7c5e0b832d12d4f7cec27a871ea686b5 |
|
BLAKE2b-256 | 8ba5e601eafed3ec86d4043f502c82390a35fba55228390a3e41cf97be90ecff |