SHA256 hash-based file renaming for privacy and deduplication

These details have not been verified by PyPI

Project links

Project description

namecrawler

SHA256 hash-based file renaming for privacy and deduplication

Rename files using their SHA256 content hash, creating deterministic, collision-resistant, privacy-preserving filenames.

Installation

pip install namecrawler

Quick Start

# Rename single file
namecrawler document.pdf

# Rename multiple files
namecrawler *.jpg

# Rename files in a directory
namecrawler ~/Documents/*.pdf

Features

Deterministic: Same content = same filename (every time)
Collision-Resistant: SHA256 makes accidental collisions virtually impossible
Privacy-Preserving: Original filenames not exposed
Deduplication-Friendly: Identical files get same hash (easy to find duplicates)
Format-Preserving: Original file extensions maintained
Fast: Efficient chunk-based hashing (8KB chunks)
Safe: Only renames files that exist

Use Cases

1. Privacy Protection

Hide sensitive information in original filenames:

# Before: SSN_123-45-6789_tax_return_2024.pdf
# After:  a3f89b2c1d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6.pdf
namecrawler sensitive_document.pdf

2. Deduplication

Find duplicate files easily:

namecrawler ~/Downloads/*.jpg
# Duplicate files will have the same hash name
# Just look for repeated filenames!

3. Content-Based Organization

Files with same content automatically grouped:

namecrawler backup_folder/*
# Version 1, 2, 3 of same file → all get same hash

4. Archival Storage

Create immutable, content-addressed archives:

namecrawler archive/*.* 
# Filenames never change if content doesn't change

How It Works

Reads file content in 8KB chunks (memory efficient)
Computes SHA256 hash of the entire content
Preserves file extension from original filename
Renames file to {hash}{extension}

Example:

# Original file: "meeting_notes_2024.txt"
# Content hash: "a1b2c3d4e5f6..."
# New filename: "a1b2c3d4e5f6...txt"

API Usage

Use as a Python library:

from namecrawler.cli import sha256sum, rename_file
from pathlib import Path

# Get hash of a file
file_path = Path("document.pdf")
file_hash = sha256sum(file_path)
print(f"SHA256: {file_hash}")

# Rename using hash
new_path = rename_file(file_path)
print(f"Renamed to: {new_path}")

Comparison with Other Tools

Tool	Method	Reversible	Privacy	Speed
namecrawler	SHA256 hash	No	High	Fast
Manual rename	User input	Yes	❌ Low	❌ Slow
UUID tools	Random UUID	No	High	Fast
Timestamp tools	Current time	No	❌ Low	Fast

Advantages over alternatives:

More meaningful than UUIDs (hash reveals if content changed)
More private than timestamps (no metadata leakage)
Deterministic (unlike random UUIDs)
Built-in deduplication (same content = same hash)

Requirements

Python 3.8+
No external dependencies (uses stdlib only)

Limitations

Not reversible: You cannot recover the original filename from the hash
Same content = same name: Files with identical content get identical names
No metadata preservation: Original filename lost (keep a mapping if needed)

Advanced Usage

Keep a rename log

# Create a simple mapping log
for file in *.pdf; do
  echo "$file -> $(namecrawler "$file")" >> rename_log.txt
done

Undo by using a log

namecrawler doesn't include undo (by design - hashes are one-way), but you can create your own:

import json
from pathlib import Path

# Before renaming, save a log
log = {}
for file in Path('.').glob('*.pdf'):
    from namecrawler.cli import sha256sum
    hash_name = sha256sum(file) + file.suffix
    log[hash_name] = str(file)

with open('rename_map.json', 'w') as f:
    json.dump(log, f, indent=2)

# Later, restore using the log
with open('rename_map.json') as f:
    log = json.load(f)
    for hash_name, original in log.items():
        Path(hash_name).rename(original)

Security Note

SHA256 hashes are cryptographically secure but not secret. If someone has the original file, they can compute the same hash. Use namecrawler for:

Privacy (hiding original filenames)
Deduplication (finding identical files)
Content-addressing (organizing by content)

Don't use for:

Security (anyone with original can verify hash)
Encryption (filenames are not encrypted)
Authentication (hashes alone don't prove ownership)

License

MIT License - see LICENSE file

Author

Luke Steuber

Website: lukesteuber.com
GitHub: @lukeslp
Bluesky: @lukesteuber.com

Fun fact: The name "namecrawler" reflects how the tool "crawls" through file content to generate a name, rather than using metadata or user input.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Jun 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

namecrawler-1.0.0.tar.gz (8.1 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

namecrawler-1.0.0-py3-none-any.whl (7.8 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file namecrawler-1.0.0.tar.gz.

File metadata

Download URL: namecrawler-1.0.0.tar.gz
Upload date: Jun 9, 2026
Size: 8.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for namecrawler-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`617967949bf40131da1e03e3aa601ccdf41efd00bd4f5b170a52460e50f64796`
MD5	`c352cd1580de20f5099a0cbd2ef4691f`
BLAKE2b-256	`7c7e66dd4327585876c347183d9dfecf0ff8b8ce2acecc898cf2e65050468de5`

See more details on using hashes here.

File details

Details for the file namecrawler-1.0.0-py3-none-any.whl.

File metadata

Download URL: namecrawler-1.0.0-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 7.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for namecrawler-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e17a46bcacf77dd63e419f1fa200bf0c891cbeb5d414d5bbfeeb9b1ee831001`
MD5	`bd00c7e15d9a87b25268e9cb4a18dafd`
BLAKE2b-256	`9b33cba2c7457bf777952ad05162dd53261fe307938a67b0d278f268d23f2c3a`

See more details on using hashes here.

namecrawler 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

namecrawler

Installation

Quick Start

Features

Use Cases

1. Privacy Protection

2. Deduplication

3. Content-Based Organization

4. Archival Storage

How It Works

API Usage

Comparison with Other Tools

Requirements

Limitations

Advanced Usage

Keep a rename log

Undo by using a log

Security Note

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes