Skip to main content

Clean up cached data from ML/data science libraries

Project description

ds-cache-cleaner

Clean up cached data from ML/data science libraries.

Supported Caches

  • HuggingFace Models - ~/.cache/huggingface/hub (models)
  • HuggingFace Datasets (Hub) - ~/.cache/huggingface/hub (datasets)
  • Transformers - ~/.cache/huggingface/transformers
  • HF Datasets - ~/.cache/huggingface/datasets
  • ir_datasets - ~/.ir_datasets
  • datamaestro (cache) - ~/datamaestro/cache (partial downloads, processing)
  • datamaestro (data) - ~/datamaestro/data (downloaded datasets)

Installation

pip install ds-cache-cleaner

Or with uv:

uv pip install ds-cache-cleaner

Usage

List caches

ds-cache-cleaner list

Show cache entries

ds-cache-cleaner show
ds-cache-cleaner show -c "HuggingFace Hub"

Clean caches

# Interactive mode
ds-cache-cleaner clean

# Clean specific cache
ds-cache-cleaner clean -c "HuggingFace Hub"

# Clean all without prompting
ds-cache-cleaner clean --all

# Dry run
ds-cache-cleaner clean --dry-run

Interactive TUI

ds-cache-cleaner tui

Library Integration

ML libraries can integrate with ds-cache-cleaner to provide rich metadata about their cached data. This enables better descriptions, accurate last-access times, and more.

Metadata Format

The metadata is stored in a ds-cache-cleaner/ folder inside each cache directory:

~/.cache/mylib/
├── ds-cache-cleaner/
│   ├── lock                    # Lock file for concurrent access
│   ├── information.json        # Cache info and parts list
│   └── part_models.json        # Entries for "models" part
└── ... (actual cache data)

Using the CacheRegistry API

from ds_cache_cleaner import CacheRegistry

# Initialize once for your library
registry = CacheRegistry(
    cache_path="~/.cache/mylib",
    library="mylib",
    description="My ML Library cache",
)

# Register a part (e.g., models, datasets)
registry.register_part("models", "Downloaded model weights")

# When downloading a new model
registry.register_entry(
    part="models",
    path="bert-base",  # relative path within cache
    description="BERT base model",
    size=438_000_000,
)

# When accessing an existing entry (updates last_access time)
registry.touch("models", "bert-base")

# When deleting an entry (removes from metadata)
registry.remove("models", "bert-base")

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
hatch run test

# Lint
hatch run lint:check

# Format
hatch run lint:fix

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ds_cache_cleaner-0.3.1.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ds_cache_cleaner-0.3.1-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file ds_cache_cleaner-0.3.1.tar.gz.

File metadata

  • Download URL: ds_cache_cleaner-0.3.1.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ds_cache_cleaner-0.3.1.tar.gz
Algorithm Hash digest
SHA256 3e40a436ae724846a4a05fc33ecbb5d2f363ff97767a9f6b135ac0b2d50aafb3
MD5 47db2dc8b17f798ed74d4482f83caac9
BLAKE2b-256 daee9f63fabf21da56178baf1bcdb0a6f9e7db52bc67dedd5e3400bbeb5afd74

See more details on using hashes here.

Provenance

The following attestation bundles were made for ds_cache_cleaner-0.3.1.tar.gz:

Publisher: upload-to-pypi.yaml on bpiwowar/ds-cache-cleaner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ds_cache_cleaner-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ds_cache_cleaner-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e156f73c4f58f0a50ea72f57bbce4018fb90197a7ec19d0695a7d8b6c477f52e
MD5 89434830a89b5319b326befd2346f4d3
BLAKE2b-256 c542e7ae64d5db84e973f0e9bbacb61501eafe8dcf4fef40aed1628313333ee1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ds_cache_cleaner-0.3.1-py3-none-any.whl:

Publisher: upload-to-pypi.yaml on bpiwowar/ds-cache-cleaner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page