Skip to main content

Clean up cached data from ML/data science libraries

Project description

ds-cache-cleaner

Clean up cached data from ML/data science libraries.

Supported Caches

  • HuggingFace Models - ~/.cache/huggingface/hub (models)
  • HuggingFace Datasets (Hub) - ~/.cache/huggingface/hub (datasets)
  • Transformers - ~/.cache/huggingface/transformers
  • HF Datasets - ~/.cache/huggingface/datasets
  • ir_datasets - ~/.ir_datasets
  • datamaestro (cache) - ~/datamaestro/cache (partial downloads, processing)
  • datamaestro (data) - ~/datamaestro/data (downloaded datasets)

Installation

pip install ds-cache-cleaner

Or with uv:

uv pip install ds-cache-cleaner

Usage

List caches

ds-cache-cleaner list

Show cache entries

ds-cache-cleaner show
ds-cache-cleaner show -c "HuggingFace Hub"

Clean caches

# Interactive mode
ds-cache-cleaner clean

# Clean specific cache
ds-cache-cleaner clean -c "HuggingFace Hub"

# Clean all without prompting
ds-cache-cleaner clean --all

# Dry run
ds-cache-cleaner clean --dry-run

Interactive TUI

ds-cache-cleaner tui

Library Integration

ML libraries can integrate with ds-cache-cleaner to provide rich metadata about their cached data. This enables better descriptions, accurate last-access times, and more.

Metadata Format

The metadata is stored in a ds-cache-cleaner/ folder inside each cache directory:

~/.cache/mylib/
├── ds-cache-cleaner/
│   ├── lock                    # Lock file for concurrent access
│   ├── information.json        # Cache info and parts list
│   └── part_models.json        # Entries for "models" part
└── ... (actual cache data)

Using the CacheRegistry API

from ds_cache_cleaner import CacheRegistry

# Initialize once for your library
registry = CacheRegistry(
    cache_path="~/.cache/mylib",
    library="mylib",
    description="My ML Library cache",
)

# Register a part (e.g., models, datasets)
registry.register_part("models", "Downloaded model weights")

# When downloading a new model
registry.register_entry(
    part="models",
    path="bert-base",  # relative path within cache
    description="BERT base model",
    size=438_000_000,
)

# When accessing an existing entry (updates last_access time)
registry.touch("models", "bert-base")

# When deleting an entry (removes from metadata)
registry.remove("models", "bert-base")

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
hatch run test

# Lint
hatch run lint:check

# Format
hatch run lint:fix

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ds_cache_cleaner-0.2.0.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ds_cache_cleaner-0.2.0-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file ds_cache_cleaner-0.2.0.tar.gz.

File metadata

  • Download URL: ds_cache_cleaner-0.2.0.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ds_cache_cleaner-0.2.0.tar.gz
Algorithm Hash digest
SHA256 adb4d407528d1cf93111d852c4cf3a35b9272ebf0f560df6adbd69d6d74b3168
MD5 6eefbf8e1879d71ecfd5bbfe746d56df
BLAKE2b-256 a482941695ac4ec3aeae11cb9fcceea12f57860c1479e8428d929f70f38b99c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for ds_cache_cleaner-0.2.0.tar.gz:

Publisher: upload-to-pypi.yaml on bpiwowar/ds-cache-cleaner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ds_cache_cleaner-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ds_cache_cleaner-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd712792fa95be1e990794a14097eec33c585b158fc94e9f8e661db650feaeea
MD5 b4fd3d18fceeb84086a7d253ed2de751
BLAKE2b-256 dcb4e5749ca30400134865b0d3de17728bf9eb0ad16be262c0808ba38229b861

See more details on using hashes here.

Provenance

The following attestation bundles were made for ds_cache_cleaner-0.2.0-py3-none-any.whl:

Publisher: upload-to-pypi.yaml on bpiwowar/ds-cache-cleaner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page