Fast upload in parallel large datasets to HuggingFace Datasets hub.
Project description
HF-fastup
Pushes a HF dataset to the HF hub as a Parquet dataset, allowing streaming. The dataset is processed to shards and uploaded in parallel. It useful for large datasets, for example, with embedded data.
Usage
Make sure hf_transfer is installed and HF_HUB_ENABLE_HF_TRANSFER
is set to 1
.
import hffastup
import datasets
datasets.logging.set_verbosity_info()
# load any HF dataset
dataset = datasets.load_dataset("my_large_dataset.py")
hffastup.upload_to_hf_hub(dataset, "Org/repo") # upload to HF Hub
hffastup.push_dataset_card(dataset, "Org/repo") # Makes a dataset card and pushes it to HF Hub
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hf-fastup-0.0.7.tar.gz
(5.9 kB
view details)
Built Distribution
File details
Details for the file hf-fastup-0.0.7.tar.gz
.
File metadata
- Download URL: hf-fastup-0.0.7.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fda4046498680ab173ed5147d847b85657434331900e787d9da73f297c3bca10 |
|
MD5 | 7abaa48912c08f4419535fce1fd85d33 |
|
BLAKE2b-256 | eca539d0568aae1a34384011294f041d4e1b967d1c97f2f65f5734f2c58d5bac |
File details
Details for the file hf_fastup-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: hf_fastup-0.0.7-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 861a57cc1b690de39ffdbdda1d77b3c3f28beb180a7e88df560cf5e51eb87e6f |
|
MD5 | 10ea5a9042bb85627075d5084cfef120 |
|
BLAKE2b-256 | 2466c04abf09fa7a2945f83d449b7a51bc6658852563af91e77787ef24604287 |