Fast upload in parallel large datasets to HuggingFace Datasets hub.

These details have not been verified by PyPI

Project links

Project description

HF-fastup

Pushes a HF dataset to the HF hub as a Parquet dataset, allowing streaming. The dataset is processed to shards and uploaded in parallel. It useful for large datasets, for example, with embedded data.

Usage

Make sure hf_transfer is installed and HF_HUB_ENABLE_HF_TRANSFER is set to 1.

import hffastup
import datasets
datasets.logging.set_verbosity_info()

# load any HF dataset
dataset = datasets.load_dataset("my_large_dataset.py")

hffastup.upload_to_hf_hub(dataset, "Org/repo") # upload to HF Hub
hffastup.push_dataset_card(dataset, "Org/repo") # Makes a dataset card and pushes it to HF Hub

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.7

Feb 16, 2024

0.0.6

Feb 15, 2024

0.0.5

Nov 20, 2023

0.0.4

Nov 20, 2023

0.0.3

Nov 20, 2023

0.0.2

Nov 20, 2023

0.0.1

Nov 20, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hf-fastup-0.0.7.tar.gz (5.9 kB view details)

Uploaded Feb 16, 2024 Source

Built Distribution

hf_fastup-0.0.7-py3-none-any.whl (6.2 kB view details)

Uploaded Feb 16, 2024 Python 3

File details

Details for the file hf-fastup-0.0.7.tar.gz.

File metadata

Download URL: hf-fastup-0.0.7.tar.gz
Upload date: Feb 16, 2024
Size: 5.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for hf-fastup-0.0.7.tar.gz
Algorithm	Hash digest
SHA256	`fda4046498680ab173ed5147d847b85657434331900e787d9da73f297c3bca10`
MD5	`7abaa48912c08f4419535fce1fd85d33`
BLAKE2b-256	`eca539d0568aae1a34384011294f041d4e1b967d1c97f2f65f5734f2c58d5bac`

See more details on using hashes here.

File details

Details for the file hf_fastup-0.0.7-py3-none-any.whl.

File metadata

Download URL: hf_fastup-0.0.7-py3-none-any.whl
Upload date: Feb 16, 2024
Size: 6.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for hf_fastup-0.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`861a57cc1b690de39ffdbdda1d77b3c3f28beb180a7e88df560cf5e51eb87e6f`
MD5	`10ea5a9042bb85627075d5084cfef120`
BLAKE2b-256	`2466c04abf09fa7a2945f83d449b7a51bc6658852563af91e77787ef24604287`

See more details on using hashes here.

hf-fastup 0.0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

HF-fastup

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes