Fast upload in parallel large datasets to HuggingFace Datasets hub.
Project description
HF-fastup
Pushes a HF dataset to the HF hub as a Parquet dataset, allowing streaming. The dataset is processed to shards and uploaded in parallel. It useful for large datasets, for example, with embedded data.
Usage
Make sure hf_transfer is installed and HF_HUB_ENABLE_HF_TRANSFER is set to 1.
import hffastup
import datasets
datasets.logging.set_verbosity_info()
# load any HF dataset
dataset = datasets.load_dataset("my_large_dataset.py")
hffastup.upload_to_hf_hub(dataset, "Org/repo") # upload to HF Hub
hffastup.push_dataset_card(dataset, "Org/repo") # Makes a dataset card and pushes it to HF Hub
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hf-fastup-0.0.7.tar.gz.
File metadata
- Download URL: hf-fastup-0.0.7.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fda4046498680ab173ed5147d847b85657434331900e787d9da73f297c3bca10
|
|
| MD5 |
7abaa48912c08f4419535fce1fd85d33
|
|
| BLAKE2b-256 |
eca539d0568aae1a34384011294f041d4e1b967d1c97f2f65f5734f2c58d5bac
|
File details
Details for the file hf_fastup-0.0.7-py3-none-any.whl.
File metadata
- Download URL: hf_fastup-0.0.7-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
861a57cc1b690de39ffdbdda1d77b3c3f28beb180a7e88df560cf5e51eb87e6f
|
|
| MD5 |
10ea5a9042bb85627075d5084cfef120
|
|
| BLAKE2b-256 |
2466c04abf09fa7a2945f83d449b7a51bc6658852563af91e77787ef24604287
|