Skip to main content

Clean up cached data from ML/data science libraries

Project description

ds-cache-cleaner

Clean up cached data from ML/data science libraries.

Supported Caches

  • HuggingFace Models - ~/.cache/huggingface/hub (models)
  • HuggingFace Datasets (Hub) - ~/.cache/huggingface/hub (datasets)
  • Transformers - ~/.cache/huggingface/transformers
  • HF Datasets - ~/.cache/huggingface/datasets
  • ir_datasets - ~/.ir_datasets
  • datamaestro (cache) - ~/datamaestro/cache (partial downloads, processing)
  • datamaestro (data) - ~/datamaestro/data (downloaded datasets)

Installation

pip install ds-cache-cleaner

Or with uv:

uv pip install ds-cache-cleaner

Usage

List caches

ds-cache-cleaner list

Show cache entries

ds-cache-cleaner show
ds-cache-cleaner show -c "HuggingFace Hub"

Clean caches

# Interactive mode
ds-cache-cleaner clean

# Clean specific cache
ds-cache-cleaner clean -c "HuggingFace Hub"

# Clean all without prompting
ds-cache-cleaner clean --all

# Dry run
ds-cache-cleaner clean --dry-run

Interactive TUI

ds-cache-cleaner tui

Library Integration

ML libraries can integrate with ds-cache-cleaner to provide rich metadata about their cached data. This enables better descriptions, accurate last-access times, and more.

Metadata Format

The metadata is stored in a ds-cache-cleaner/ folder inside each cache directory:

~/.cache/mylib/
├── ds-cache-cleaner/
│   ├── lock                    # Lock file for concurrent access
│   ├── information.json        # Cache info and parts list
│   └── part_models.json        # Entries for "models" part
└── ... (actual cache data)

Using the CacheRegistry API

from ds_cache_cleaner import CacheRegistry

# Initialize once for your library
registry = CacheRegistry(
    cache_path="~/.cache/mylib",
    library="mylib",
    description="My ML Library cache",
)

# Register a part (e.g., models, datasets)
registry.register_part("models", "Downloaded model weights")

# When downloading a new model
registry.register_entry(
    part="models",
    path="bert-base",  # relative path within cache
    description="BERT base model",
    size=438_000_000,
)

# When accessing an existing entry (updates last_access time)
registry.touch("models", "bert-base")

# When deleting an entry (removes from metadata)
registry.remove("models", "bert-base")

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
hatch run test

# Lint
hatch run lint:check

# Format
hatch run lint:fix

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ds_cache_cleaner-0.3.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ds_cache_cleaner-0.3.0-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file ds_cache_cleaner-0.3.0.tar.gz.

File metadata

  • Download URL: ds_cache_cleaner-0.3.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ds_cache_cleaner-0.3.0.tar.gz
Algorithm Hash digest
SHA256 61a2920dda2628b5f6dc54494a1ae868c965db14ea91e90b328f6ed54e1af72f
MD5 c51d298f0c8f8089198a5923234e27a4
BLAKE2b-256 603199597787236f4091f89bf7b9701c5c4cc8457437343e1f756b413cee8335

See more details on using hashes here.

Provenance

The following attestation bundles were made for ds_cache_cleaner-0.3.0.tar.gz:

Publisher: upload-to-pypi.yaml on bpiwowar/ds-cache-cleaner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ds_cache_cleaner-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ds_cache_cleaner-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bd2c7ff8fb3b1a37b8fe0ddc5cb926596c57562415d2e0eabf47ef83351625e
MD5 9646cea7757d85a7dfab60a7bb4af8fa
BLAKE2b-256 882edd542191dfcfec7010fd7a3fcdfdcf1acbb6bdf3d6846607f340ce9ee2bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for ds_cache_cleaner-0.3.0-py3-none-any.whl:

Publisher: upload-to-pypi.yaml on bpiwowar/ds-cache-cleaner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page