Skip to main content

Hugging Face dataset-backed cloud file storage library

Project description

HuggingFaceStorage

Python library for cloud-style file storage backed by a private Hugging Face dataset repository.

Features

  • Immutable file version history per logical remote path
  • Soft delete via tombstone versions
  • Content-addressed blob storage (sha256) to avoid duplicate uploads
  • HF_TOKEN-based authentication
  • Public API: put, put_zip, get, list, delete, history

Project Structure

HuggingFaceStorage/
  src/hf_storage/
  tests/unit/
  tests/integration/
  requirements.txt
  pyproject.toml

Setup

  1. Create the virtual environment (Python 3.11):
py -3.11 -m venv .venv
  1. Activate:
.\.venv\Scripts\Activate.ps1
  1. Install dependencies:
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -e .
  1. Deactivate when finished:
deactivate

Authentication

Set your Hugging Face token before using the library:

$env:HF_TOKEN = "hf_xxx"

Quick Example

from hf_storage import HFStorage, StorageConfig

storage = HFStorage(StorageConfig(repo_id="your-namespace/your-private-dataset"))
storage.setup(create_if_missing=True, private=True)

version = storage.put("local.txt", "docs/local.txt")
zip_version = storage.put_zip("my_folder", "archives/my_folder")
storage.get("docs/local.txt", "restored.txt")
entries = storage.list(prefix="docs/")
history = storage.history("docs/local.txt")
deleted = storage.delete("docs/local.txt")

Running Tests

Unit tests:

pytest tests/unit

Integration tests (real HF repo):

$env:HF_STORAGE_INTEGRATION = "1"
$env:HF_STORAGE_TEST_REPO = "your-namespace/your-private-dataset"
pytest tests/integration

Publishing (Maintainers)

publish.bat is maintainer tooling for package release workflow only. It is not part of the public runtime API.

Set Twine credentials via environment variables:

$env:TWINE_USERNAME = "__token__"
$env:TWINE_TEST_PASSWORD = "pypi-<testpypi-token>"
$env:TWINE_PASSWORD = "pypi-<pypi-production-token>"

Default release target is TestPyPI:

.\publish.bat

Publish to production PyPI explicitly:

.\publish.bat pypi

What publish.bat does:

  • Runs unit tests (pytest tests/unit)
  • Builds wheel + sdist (python -m build)
  • Validates artifacts (twine check dist/*)
  • Uploads to TestPyPI by default, or PyPI when pypi is passed

Credential behavior:

  • .\publish.bat (default TestPyPI) uses TWINE_TEST_PASSWORD
  • .\publish.bat pypi (production) uses TWINE_PASSWORD
  • TWINE_USERNAME must be __token__ for both

Important: bump package version before each release. PyPI/TestPyPI do not allow re-uploading the same version.

One-command Zip Upload (Windows)

Use the batch wrapper to zip and upload a file or directory:

.\put_zip.bat "C:\path\to\folder_or_file" "backups/my_archive"

Notes:

  • HF_TOKEN and HF_STORAGE_REPO_ID are read from .env (or current env vars).
  • If the remote path does not end with .zip, .zip is appended automatically.

List and Download (Windows)

List stored logical paths:

.\list_files.bat

Include soft-deleted entries too:

.\list_files.bat 1

Download latest version by logical path:

.\get_file.bat "backup/venv.zip" ".\downloads\venv.zip"

Download a specific version:

.\get_file.bat "backup/venv.zip" ".\downloads\venv.zip" "version_id_here"

Soft delete a logical path:

.\delete_file.bat "backup/venv.zip"

Hard delete a logical path (removes manifest entry and unreferenced blob objects):

.\delete_file.bat "backup/venv.zip" hard

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hf_storage-0.1.0.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hf_storage-0.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file hf_storage-0.1.0.tar.gz.

File metadata

  • Download URL: hf_storage-0.1.0.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hf_storage-0.1.0.tar.gz
Algorithm Hash digest
SHA256 004a68dc5394800e1b3f84e4c726746db9bea7b3133f589b2c89bb6cb78e14ee
MD5 8686e8b30c759f9479cdf2514da06f79
BLAKE2b-256 555cef609de228fca5eeb4a11550910daf55b38a7d4c3e19bd124d04c09cfccb

See more details on using hashes here.

File details

Details for the file hf_storage-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hf_storage-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hf_storage-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bf25f9f55fc79c457d0d1a0f3cec70aa00f3ca5d58b883d273834fc2e7aa3b26
MD5 db0c82dda2cd3c8e0744a9479f117750
BLAKE2b-256 d9782804e4f8c19d3ce786318e0376d294349b5829612dd990a42ff6a15aea8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page