Skip to main content

Clean up cached data from ML/data science libraries

Project description

ds-cache-cleaner

Clean up cached data from ML/data science libraries.

Supported Caches

  • HuggingFace Hub - ~/.cache/huggingface/hub
  • Transformers - ~/.cache/huggingface/transformers
  • HF Datasets - ~/.cache/huggingface/datasets
  • ir_datasets - ~/.ir_datasets
  • datamaestro (cache) - ~/datamaestro/cache (partial downloads, processing)
  • datamaestro (data) - ~/datamaestro/data (downloaded datasets)

Installation

pip install ds-cache-cleaner

Or with uv:

uv pip install ds-cache-cleaner

Usage

List caches

ds-cache-cleaner list

Show cache entries

ds-cache-cleaner show
ds-cache-cleaner show -c "HuggingFace Hub"

Clean caches

# Interactive mode
ds-cache-cleaner clean

# Clean specific cache
ds-cache-cleaner clean -c "HuggingFace Hub"

# Clean all without prompting
ds-cache-cleaner clean --all

# Dry run
ds-cache-cleaner clean --dry-run

Interactive TUI

ds-cache-cleaner tui

Library Integration

ML libraries can integrate with ds-cache-cleaner to provide rich metadata about their cached data. This enables better descriptions, accurate last-access times, and more.

Metadata Format

The metadata is stored in a ds-cache-cleaner/ folder inside each cache directory:

~/.cache/mylib/
├── ds-cache-cleaner/
│   ├── lock                    # Lock file for concurrent access
│   ├── information.json        # Cache info and parts list
│   └── part_models.json        # Entries for "models" part
└── ... (actual cache data)

Using the CacheRegistry API

from ds_cache_cleaner import CacheRegistry

# Initialize once for your library
registry = CacheRegistry(
    cache_path="~/.cache/mylib",
    library="mylib",
    description="My ML Library cache",
)

# Register a part (e.g., models, datasets)
registry.register_part("models", "Downloaded model weights")

# When downloading a new model
registry.register_entry(
    part="models",
    path="bert-base",  # relative path within cache
    description="BERT base model",
    size=438_000_000,
)

# When accessing an existing entry (updates last_access time)
registry.touch("models", "bert-base")

# When deleting an entry (removes from metadata)
registry.remove("models", "bert-base")

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
hatch run test

# Lint
hatch run lint:check

# Format
hatch run lint:fix

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ds_cache_cleaner-0.1.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ds_cache_cleaner-0.1.0-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file ds_cache_cleaner-0.1.0.tar.gz.

File metadata

  • Download URL: ds_cache_cleaner-0.1.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.3

File hashes

Hashes for ds_cache_cleaner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 07fed5f85863ca069e1ac253e44ba3f433592ad7faedca0153abae8d2641e54a
MD5 351ff391163ed137bb7104a48dc976a5
BLAKE2b-256 390ec9933d4e6c66dabdf6756c26c18a41b8c9791c22fd045e6c02b44a819b61

See more details on using hashes here.

File details

Details for the file ds_cache_cleaner-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ds_cache_cleaner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c0dc7713faf11e88ea9bed2f95a892eb89281c4e471c8432a0d6ff64f5c0d4d
MD5 dc0c377cea996ea7fa5ad021815e761c
BLAKE2b-256 4b16d47f5978b4b5d20a52a6cd448c35a5187b5c43a23b7d3989abe596936022

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page