CLI tool for uploading HuggingFace datasets and GitHub repos to Load S3 with tags
Project description
Load Pools CLI
A command-line tool for uploading HuggingFace datasets and GitHub repositories to Load S3 and tagging them for use with Load's data pools.
Features
- GitHub repository upload: Clone and upload entire repos with file-by-file tagging
- HuggingFace integration: Upload datasets by path
- Automatic tagging: Data-Protocol, Path, Filename, and Content-Type to make data discoverable
- Query support: All uploads are queryable via s3-agent's tag query API
Installation
Prerequisites
A Load Network account API key from cloud.load.network. HuggingFace calls may ask for a HuggingFace API key, freely available for registered users here
Usage
Upload a GitHub Repository
Upload all files from a GitHub repository with proper folder structure tagging:
load-pools create --github https://github.com/owner/repo --auth YOUR_LOAD_API_KEY
What happens:
- Repository is cloned to a temporary directory
- Each file is uploaded individually to s3-agent
- Files are tagged with:
Data-Protocol: "owner/repo"Path: "folder/subfolder"(relative path from repo root)Filename: "file.ext"Content-Type: "mime/type"
Example tags for file images/grayscale/9582.png:
[
{"key": "Data-Protocol", "value": "owner/repo"},
{"key": "Path", "value": "images/grayscale"},
{"key": "Filename", "value": "9582.png"},
{"key": "Content-Type", "value": "image/png"}
]
Upload a HuggingFace Dataset
Upload a HuggingFace dataset. Tables are uploaded row by row, with proper tagging.
load-pools create --hugging-face username/dataset-name --auth YOUR_LOAD_API_KEY
For private datasets or to bypass anonymous rate limits, pass --hf-auth <YOUR_TOKEN>.
What happens:
- HuggingFace dataset is downloaded
- Table rows are extracted into individual dataitems
- Rows are uploaded to s3-agent with metadata tags
Command Options
load-pools create [OPTIONS]
Options:
--github TEXT GitHub repository URL
--hugging-face TEXT HuggingFace repository slug (user/repo)
--auth TEXT Load account API key [required]
-v, --verbose Show detailed upload progress
--help Show help message
Querying Uploaded Data
After uploading, you can query your data using the s3-agent tags API.
Query all files from a GitHub repository:
curl -X POST https://load-s3-agent.load.network/tags/query \
-H "Content-Type: application/json" \
-d '{
"filters": [
{"key": "Data-Protocol", "value": "owner/repo"}
]
}'
Query files in a specific folder:
curl -X POST https://load-s3-agent.load.network/tags/query \
-H "Content-Type: application/json" \
-d '{
"filters": [
{"key": "Data-Protocol", "value": "owner/repo"},
{"key": "Path", "value": "images/grayscale"}
]
}'
Query by filename:
curl -X POST https://load-s3-agent.load.network/tags/query \
-H "Content-Type: application/json" \
-d '{
"filters": [
{"key": "Data-Protocol", "value": "owner/repo"},
{"key": "Filename", "value": "9582.png"}
]
}'
Query by content type:
curl -X POST https://load-s3-agent.load.network/tags/query \
-H "Content-Type: application/json" \
-d '{
"filters": [
{"key": "Data-Protocol", "value": "owner/repo"},
{"key": "Content-Type", "value": "image/png"}
]
}'
Examples
Upload a dataset repository:
load-pools create \
--github https://github.com/username/my-dataset \
--auth load_acc_xxxxxxxxxxxxx
Upload with verbose output:
load-pools create \
--github https://github.com/ml-datasets/images \
--auth load_acc_xxxxxxxxxxxxx \
--verbose
Upload a HuggingFace dataset:
load-pools create \
--hugging-face openai/graphwalks \
--auth load_acc_xxxxxxxxxxxxx
License
MIT License - see LICENSE file for details.
Related Projects
- xans104 - HuggingFace model uploader with ANS-104
- Load Network - Decentralized data storage network
- s3-agent - Load S3 Agent API
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file load_pools-0.1.16.tar.gz.
File metadata
- Download URL: load_pools-0.1.16.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
049d8b343c590c828887ccb3be6f269c88d892ea1aaab265e0b7e14b1266e34d
|
|
| MD5 |
47d0f61c2fe601d7a9a105b9be89cc09
|
|
| BLAKE2b-256 |
836d76b9d7f8fd2c6653af880ea4c3ff1f9c5b5656f6be5de862a2cda1fd927a
|
File details
Details for the file load_pools-0.1.16-py3-none-any.whl.
File metadata
- Download URL: load_pools-0.1.16-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8dd35f3182edfedefe074cbd3d6fbf36ae05d192aace43bc71a608089bff27a4
|
|
| MD5 |
33801a06c5366d219b3e3ad58340765b
|
|
| BLAKE2b-256 |
61642ff2fca22651de08157faf552c19083856872964ce1619832440939de365
|