SHA256 hash-based file renaming for privacy and deduplication
Project description
namecrawler
SHA256 hash-based file renaming for privacy and deduplication
Rename files using their SHA256 content hash, creating deterministic, collision-resistant, privacy-preserving filenames.
Installation
pip install namecrawler
Quick Start
# Rename single file
namecrawler document.pdf
# Rename multiple files
namecrawler *.jpg
# Rename files in a directory
namecrawler ~/Documents/*.pdf
Features
- Deterministic: Same content = same filename (every time)
- Collision-Resistant: SHA256 makes accidental collisions virtually impossible
- Privacy-Preserving: Original filenames not exposed
- Deduplication-Friendly: Identical files get same hash (easy to find duplicates)
- Format-Preserving: Original file extensions maintained
- Fast: Efficient chunk-based hashing (8KB chunks)
- Safe: Only renames files that exist
Use Cases
1. Privacy Protection
Hide sensitive information in original filenames:
# Before: SSN_123-45-6789_tax_return_2024.pdf
# After: a3f89b2c1d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6.pdf
namecrawler sensitive_document.pdf
2. Deduplication
Find duplicate files easily:
namecrawler ~/Downloads/*.jpg
# Duplicate files will have the same hash name
# Just look for repeated filenames!
3. Content-Based Organization
Files with same content automatically grouped:
namecrawler backup_folder/*
# Version 1, 2, 3 of same file → all get same hash
4. Archival Storage
Create immutable, content-addressed archives:
namecrawler archive/*.*
# Filenames never change if content doesn't change
How It Works
- Reads file content in 8KB chunks (memory efficient)
- Computes SHA256 hash of the entire content
- Preserves file extension from original filename
- Renames file to
{hash}{extension}
Example:
# Original file: "meeting_notes_2024.txt"
# Content hash: "a1b2c3d4e5f6..."
# New filename: "a1b2c3d4e5f6...txt"
API Usage
Use as a Python library:
from namecrawler.cli import sha256sum, rename_file
from pathlib import Path
# Get hash of a file
file_path = Path("document.pdf")
file_hash = sha256sum(file_path)
print(f"SHA256: {file_hash}")
# Rename using hash
new_path = rename_file(file_path)
print(f"Renamed to: {new_path}")
Comparison with Other Tools
| Tool | Method | Reversible | Privacy | Speed |
|---|---|---|---|---|
| namecrawler | SHA256 hash | No | High | Fast |
| Manual rename | User input | Yes | ❌ Low | ❌ Slow |
| UUID tools | Random UUID | No | High | Fast |
| Timestamp tools | Current time | No | ❌ Low | Fast |
Advantages over alternatives:
- More meaningful than UUIDs (hash reveals if content changed)
- More private than timestamps (no metadata leakage)
- Deterministic (unlike random UUIDs)
- Built-in deduplication (same content = same hash)
Requirements
- Python 3.8+
- No external dependencies (uses stdlib only)
Limitations
- Not reversible: You cannot recover the original filename from the hash
- Same content = same name: Files with identical content get identical names
- No metadata preservation: Original filename lost (keep a mapping if needed)
Advanced Usage
Keep a rename log
# Create a simple mapping log
for file in *.pdf; do
echo "$file -> $(namecrawler "$file")" >> rename_log.txt
done
Undo by using a log
namecrawler doesn't include undo (by design - hashes are one-way), but you can create your own:
import json
from pathlib import Path
# Before renaming, save a log
log = {}
for file in Path('.').glob('*.pdf'):
from namecrawler.cli import sha256sum
hash_name = sha256sum(file) + file.suffix
log[hash_name] = str(file)
with open('rename_map.json', 'w') as f:
json.dump(log, f, indent=2)
# Later, restore using the log
with open('rename_map.json') as f:
log = json.load(f)
for hash_name, original in log.items():
Path(hash_name).rename(original)
Security Note
SHA256 hashes are cryptographically secure but not secret. If someone has the original file, they can compute the same hash. Use namecrawler for:
- Privacy (hiding original filenames)
- Deduplication (finding identical files)
- Content-addressing (organizing by content)
Don't use for:
- Security (anyone with original can verify hash)
- Encryption (filenames are not encrypted)
- Authentication (hashes alone don't prove ownership)
License
MIT License - see LICENSE file
Author
Luke Steuber
- Website: lukesteuber.com
- GitHub: @lukeslp
- Bluesky: @lukesteuber.com
Fun fact: The name "namecrawler" reflects how the tool "crawls" through file content to generate a name, rather than using metadata or user input.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file namecrawler-1.0.0.tar.gz.
File metadata
- Download URL: namecrawler-1.0.0.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
617967949bf40131da1e03e3aa601ccdf41efd00bd4f5b170a52460e50f64796
|
|
| MD5 |
c352cd1580de20f5099a0cbd2ef4691f
|
|
| BLAKE2b-256 |
7c7e66dd4327585876c347183d9dfecf0ff8b8ce2acecc898cf2e65050468de5
|
File details
Details for the file namecrawler-1.0.0-py3-none-any.whl.
File metadata
- Download URL: namecrawler-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e17a46bcacf77dd63e419f1fa200bf0c891cbeb5d414d5bbfeeb9b1ee831001
|
|
| MD5 |
bd00c7e15d9a87b25268e9cb4a18dafd
|
|
| BLAKE2b-256 |
9b33cba2c7457bf777952ad05162dd53261fe307938a67b0d278f268d23f2c3a
|