Skip to main content

Fast upload in parallel large datasets to HuggingFace Datasets hub.

Project description

HF-fastup

Pushes a HF dataset to the HF hub as a Parquet dataset, allowing streaming. The dataset is processed to shards and uploaded in parallel. It useful for large datasets, for example, with embedded data.

Usage

Make sure hf_transfer is installed and HF_HUB_ENABLE_HF_TRANSFER is set to 1.

import hffastup
import datasets
datasets.logging.set_verbosity_info()

# load any HF dataset
dataset = datasets.load_dataset("my_large_dataset.py")

hffastup.upload_to_hf_hub(dataset, "Org/repo") # upload to HF Hub
hffastup.push_dataset_card(dataset, "Org/repo") # Makes a dataset card and pushes it to HF Hub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hf-fastup-0.0.5.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

hf_fastup-0.0.5-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file hf-fastup-0.0.5.tar.gz.

File metadata

  • Download URL: hf-fastup-0.0.5.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for hf-fastup-0.0.5.tar.gz
Algorithm Hash digest
SHA256 842b238d8afbb3ab46d8ea7c4afa6172b887cf2e86841f35f0b277b716462ae5
MD5 9e883a52fb92d2c877711bb8095e9bc5
BLAKE2b-256 13fa548dc216ce6e3e90c2bae08e1d0e58061d63e849b4b040606f8fce9c7ecc

See more details on using hashes here.

File details

Details for the file hf_fastup-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: hf_fastup-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for hf_fastup-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3444daeee89f367e3d7fe117bfa84486ff6dfd8dd5c8d02bdeaf51c2c8c2472a
MD5 fb9cc4b78cfe9d30d978d068872fbaf3
BLAKE2b-256 7bf7374cb757ea0f87019e624664eddb76a0dbc5e21dcf2ae997471038a0e2cc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page