Clean up cached data from ML/data science libraries
Project description
ds-cache-cleaner
Clean up cached data from ML/data science libraries.
Supported Caches
- HuggingFace Models -
~/.cache/huggingface/hub(models) - HuggingFace Datasets (Hub) -
~/.cache/huggingface/hub(datasets) - Transformers -
~/.cache/huggingface/transformers - HF Datasets -
~/.cache/huggingface/datasets - ir_datasets -
~/.ir_datasets - datamaestro (cache) -
~/datamaestro/cache(partial downloads, processing) - datamaestro (data) -
~/datamaestro/data(downloaded datasets)
Installation
pip install ds-cache-cleaner
Or with uv:
uv pip install ds-cache-cleaner
Usage
List caches
ds-cache-cleaner list
Show cache entries
ds-cache-cleaner show
ds-cache-cleaner show -c "HuggingFace Hub"
Clean caches
# Interactive mode
ds-cache-cleaner clean
# Clean specific cache
ds-cache-cleaner clean -c "HuggingFace Hub"
# Clean all without prompting
ds-cache-cleaner clean --all
# Dry run
ds-cache-cleaner clean --dry-run
Interactive TUI
ds-cache-cleaner tui
Library Integration
ML libraries can integrate with ds-cache-cleaner to provide rich metadata about their cached data. This enables better descriptions, accurate last-access times, and more.
Metadata Format
The metadata is stored in a ds-cache-cleaner/ folder inside each cache directory:
~/.cache/mylib/
├── ds-cache-cleaner/
│ ├── lock # Lock file for concurrent access
│ ├── information.json # Cache info and parts list
│ └── part_models.json # Entries for "models" part
└── ... (actual cache data)
Using the CacheRegistry API
from ds_cache_cleaner import CacheRegistry
# Initialize once for your library
registry = CacheRegistry(
cache_path="~/.cache/mylib",
library="mylib",
description="My ML Library cache",
)
# Register a part (e.g., models, datasets)
registry.register_part("models", "Downloaded model weights")
# When downloading a new model
registry.register_entry(
part="models",
path="bert-base", # relative path within cache
description="BERT base model",
size=438_000_000,
)
# When accessing an existing entry (updates last_access time)
registry.touch("models", "bert-base")
# When deleting an entry (removes from metadata)
registry.remove("models", "bert-base")
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
hatch run test
# Lint
hatch run lint:check
# Format
hatch run lint:fix
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ds_cache_cleaner-0.3.1.tar.gz.
File metadata
- Download URL: ds_cache_cleaner-0.3.1.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e40a436ae724846a4a05fc33ecbb5d2f363ff97767a9f6b135ac0b2d50aafb3
|
|
| MD5 |
47db2dc8b17f798ed74d4482f83caac9
|
|
| BLAKE2b-256 |
daee9f63fabf21da56178baf1bcdb0a6f9e7db52bc67dedd5e3400bbeb5afd74
|
Provenance
The following attestation bundles were made for ds_cache_cleaner-0.3.1.tar.gz:
Publisher:
upload-to-pypi.yaml on bpiwowar/ds-cache-cleaner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ds_cache_cleaner-0.3.1.tar.gz -
Subject digest:
3e40a436ae724846a4a05fc33ecbb5d2f363ff97767a9f6b135ac0b2d50aafb3 - Sigstore transparency entry: 796560411
- Sigstore integration time:
-
Permalink:
bpiwowar/ds-cache-cleaner@5d83b32b7d999867ff85423397b9ab62a3224c6c -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/bpiwowar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
upload-to-pypi.yaml@5d83b32b7d999867ff85423397b9ab62a3224c6c -
Trigger Event:
release
-
Statement type:
File details
Details for the file ds_cache_cleaner-0.3.1-py3-none-any.whl.
File metadata
- Download URL: ds_cache_cleaner-0.3.1-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e156f73c4f58f0a50ea72f57bbce4018fb90197a7ec19d0695a7d8b6c477f52e
|
|
| MD5 |
89434830a89b5319b326befd2346f4d3
|
|
| BLAKE2b-256 |
c542e7ae64d5db84e973f0e9bbacb61501eafe8dcf4fef40aed1628313333ee1
|
Provenance
The following attestation bundles were made for ds_cache_cleaner-0.3.1-py3-none-any.whl:
Publisher:
upload-to-pypi.yaml on bpiwowar/ds-cache-cleaner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ds_cache_cleaner-0.3.1-py3-none-any.whl -
Subject digest:
e156f73c4f58f0a50ea72f57bbce4018fb90197a7ec19d0695a7d8b6c477f52e - Sigstore transparency entry: 796560415
- Sigstore integration time:
-
Permalink:
bpiwowar/ds-cache-cleaner@5d83b32b7d999867ff85423397b9ab62a3224c6c -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/bpiwowar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
upload-to-pypi.yaml@5d83b32b7d999867ff85423397b9ab62a3224c6c -
Trigger Event:
release
-
Statement type: