Deterministic integrity verification tool using Merkle trees and cryptographic hashing
Project description
๐ MerkleWatch
Deterministic. Cryptographic. Tamper-Evident.
A CLI-first integrity verification tool that creates tamper-evident snapshots of directory structures using Merkle trees and cryptographic hashing.
Features โข Installation โข Quick Start โข Commands โข Ignore Rules โข Documentation
๐ Features
- ๐ฒ Merkle Tree Integrity โ Creates a single cryptographic root hash representing your entire directory
- ๐ Tamper Detection โ Detects any file modifications, additions, removals, or reorderings
- ๐ Detailed Diff Views โ Visual comparison of changes with color-coded output
- ๐ซ Flexible Ignore Rules โ
.merkleignoresupport with gitignore-like syntax - โก Streaming Support โ Efficiently handles large files with chunked reading (64KB chunks)
- ๐ฏ Deterministic โ Same directory always produces the same hash (cross-platform)
- ๐งฉ Modular Design โ Clean separation between hashing, tree construction, and filesystem operations
- ๐ JSON Manifests โ Human-readable snapshots with complete metadata
- ๐ก๏ธ Domain Separation โ Cryptographically safe hashing with prefix-based separation
- ๐ Snapshot Comparison โ Compare two snapshots to see what changed over time
- ๐ ๏ธ Interactive Setup โ Guided ignore rule configuration
๐ฆ Installation
Prerequisites
- Python 3.10 or higher
Install from Source
git clone https://github.com/ADPer0705/MerkleWatch.git
cd MerkleWatch
pip install -e .
๐ Quick Start
1. Create a Snapshot
merklewatch snapshot ./my_project --out baseline.json
2. Verify Integrity Later
merklewatch verify baseline.json ./my_project
3. Compare Two Snapshots
merklewatch diff baseline.json latest.json
๐ป Commands
snapshot - Create a Cryptographic Snapshot
Generate a tamper-evident snapshot of any directory:
merklewatch snapshot <directory> --out <manifest.json>
Examples:
# Snapshot your project
merklewatch snapshot ./my_project --out snapshot.json
# Snapshot with ignore rules (create .merkleignore first)
echo "node_modules/" > ./my_project/.merkleignore
echo "__pycache__/" >> ./my_project/.merkleignore
merklewatch snapshot ./my_project --out clean_snapshot.json
Output:
Snapshoting /path/to/directory...
Snapshot created successfully!
Root Hash: a7304db0e614521b6cd9c79bfaa8707f845c5f9f509bbc8286f040461b0820b9
Manifest saved to: snapshot.json
verify - Verify Directory Integrity
Check if a directory matches a previous snapshot:
merklewatch verify <manifest.json> <directory>
Successful Verification:
merklewatch verify baseline.json ./my_project
Verifying /path/to/directory against baseline.json...
โ Verification SUCCESSFUL!
Root Hash matches: a7304db0e614521b6cd9c79bfaa8707f845c5f9f509bbc8286f040461b0820b9
Failed Verification (Tampering Detected):
merklewatch verify baseline.json ./my_project
Verifying /path/to/directory against baseline.json...
โ Verification FAILED!
Root Hash Mismatch:
Expected: a7304db0e614521b6cd9c79bfaa8707f845c5f9f509bbc8286f040461b0820b9
Actual: 94eee32191b256f2fdd489422beed8b7f1220e388d95d19002d7d4881c2f5fc7
Summary: 3 changes: 1 added, 1 removed, 1 modified
โ Added files:
+ new_suspicious_file.txt
โ Removed files:
- important_config.txt
โ Modified files:
M critical_data.json
Old: 516ad7b388b21e05e8c56229f063d112e70a2fea45fdd357e8ff44e6a5bce689
New: 52b3272721ffd27d6300389fb9b01a86148447fc78c14f7afde337854cc0860e
diff - Compare Two Snapshots
Compare two manifest files to see what changed between snapshots:
merklewatch diff <old_manifest.json> <new_manifest.json>
Example:
merklewatch diff snapshot_jan.json snapshot_feb.json
Comparing snapshot_jan.json โ snapshot_feb.json...
Old manifest: 2025-01-15T10:30:00Z
Root Hash: a7304db0e614521b6cd9c79bfaa8707f845c5f9f509bbc8286f040461b0820b9
New manifest: 2025-02-15T14:45:00Z
Root Hash: 94eee32191b256f2fdd489422beed8b7f1220e388d95d19002d7d4881c2f5fc7
Summary: 5 changes: 2 added, 1 removed, 2 modified
โ Added files:
+ src/new_feature.py
+ docs/api.md
โ Removed files:
- deprecated/old_code.py
โ Modified files:
M src/main.py
Old: 516ad7b388b21e05e8c56229f063d112e70a2fea45fdd357e8ff44e6a5bce689
New: 52b3272721ffd27d6300389fb9b01a86148447fc78c14f7afde337854cc0860e
M README.md
Old: 8f4d3a1c9e7b2f6a5d0c8e1b4a7d3f9c2e5b8a1d4c7f0e3b6a9d2c5f8e1b4a7
New: 1a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p7q8r9s0t1u2v3w4x5y6z7a8b9c0d1e2f
ignore - Configure Ignore Rules
Interactively configure .merkleignore file with a guided interface:
merklewatch ignore <directory>
Interactive Prompts:
- Suggests Common Patterns: Automatically finds
node_modules/,.git/,__pycache__/, etc. - Checkbox Selection: Check/uncheck patterns to add
- Browse All Files: Optional fuzzy-searchable list of all files and directories
- Save: Writes selected patterns to
.merkleignore
Example Session:
merklewatch ignore ./my_project
Configuring ignores for /path/to/my_project
? Found common ignore candidates. Select ones to add: (Use arrow keys to move, space to select, type to filter)
ยป โ node_modules/
โ __pycache__/
โ .git/
โ .DS_Store
? Do you want to browse and ignore other files/directories? (Y/n)
Updated .merkleignore with 3 new patterns.
๐ซ Ignore Rules
MerkleWatch supports .merkleignore files with gitignore-like syntax. Place a .merkleignore file in the root of the directory you want to snapshot.
.merkleignore Syntax
# Comments start with #
# Ignore specific files
.DS_Store
secrets.txt
config.local.json
# Ignore file patterns (glob matching)
*.log
*.tmp
*.pyc
*.swp
# Ignore directories (trailing slash recommended)
node_modules/
__pycache__/
.git/
dist/
build/
venv/
# Ignore directories (without slash also works)
.env
cache
temp
Pattern Matching Rules
| Pattern Type | Example | Matches |
|---|---|---|
Directory (with /) |
node_modules/ |
Directory and all its contents |
Directory (without /) |
build |
Any file/directory named build |
| Glob pattern | *.log |
All files ending with .log anywhere |
| Specific file | .DS_Store |
Exact filename match anywhere |
| Comment | # ignore logs |
Ignored (documentation) |
Important Notes
- โ ๏ธ No Built-in Ignores: MerkleWatch has NO default ignore patterns. Only patterns in
.merkleignoreare used. - ๐ Create Before Snapshot: Place
.merkleignorebefore runningsnapshot - ๐ Applies to All Commands: Both
snapshotandverifyrespect ignore rules - ๐ฏ Case Sensitive: Pattern matching is case-sensitive
๐๏ธ How It Works
MerkleWatch creates a cryptographically secure fingerprint of your directory structure using Merkle trees:
1๏ธโฃ Hash Every File
Files are hashed using SHA-256 with chunked reading (64KB chunks) to handle large files efficiently:
file_hash = SHA256(file_contents)
leaf_hash = SHA256(0x00 || file_hash)
2๏ธโฃ Build Merkle Trees
Each directory becomes a Merkle tree where:
- Files are leaf nodes (prefixed with
0x00) - Subdirectories are represented by their root hash (prefixed with
0x02) - All children are sorted alphabetically and paired
internal_hash = SHA256(0x01 || left || right)
dir_node = SHA256(0x02 || subdirectory_root_hash)
3๏ธโฃ Compute Root Hash
The entire directory structure collapses into a single root hash โ your tamper-evident seal.
Root Hash = MerkleRoot(all children)
Domain Separation
| Type | Prefix | Purpose |
|---|---|---|
| File leaf | 0x00 |
Content leaf |
| Internal node | 0x01 |
Combines two children |
| Directory node | 0x02 |
Represents subdirectory root |
This prevents second-preimage attacks and ensures cryptographic safety.
๐ Manifest Format
Manifests are JSON files containing:
{
"merklewatch_version": "1.0.0",
"algorithm": "sha256",
"timestamp": 1732464642.123456,
"timestamp_iso": "2025-11-24T14:50:42Z",
"root_hash": "a7304db0e614521b6cd9c79bfaa8707f845c5f9f509bbc8286f040461b0820b9",
"files": {
"README.md": {
"size": 1234,
"mtime": 1732464000.0,
"content_hash": "516ad7b388b21...",
"leaf_hash": "8a9f3c12d45..."
}
},
"directories": {
"src": {
"root_hash": "94eee32191b256...",
"node_hash": "1a2b3c4d5e6f..."
}
}
}
See docs/manifest-format.md for full specification.
๐๏ธ Project Structure
merklewatch/
โโโ src/merklewatch/
โ โโโ __init__.py # Package initialization
โ โโโ __main__.py # Entry point
โ โโโ cli.py # Typer-based CLI interface
โ โโโ hashing.py # SHA-256 primitives with domain separation
โ โโโ merkle.py # Merkle tree construction logic
โ โโโ filesystem.py # Directory traversal & scanning
โ โโโ manifest.py # JSON manifest generation
โ โโโ verification.py # Verification logic
โ โโโ diff.py # Diff formatting and display
โ โโโ ignore.py # Ignore rules handling
โโโ docs/ # Documentation
โ โโโ architecture.md # System architecture
โ โโโ manifest-format.md # Manifest specification
โ โโโ ignore-rules.md # Ignore rules guide
โ โโโ examples.md # Usage examples
โโโ test/ # Test data
โโโ pyproject.toml # Project metadata & dependencies
โโโ Makefile # Development automation
โโโ CHANGELOG.md # Version history
โโโ CONTRIBUTING.md # Contribution guidelines
โโโ LICENSE # MIT License
โโโ README.md # This file
๐ ๏ธ Error Handling
MerkleWatch gracefully handles common filesystem issues:
- Permission Errors: Warns and skips inaccessible files/directories
- Symlinks: Skips symbolic links to avoid loops and security issues
- Empty Directories: Handles empty directories correctly
- Large Files: Uses chunked reading (64KB) to avoid memory issues
- Missing Files: During verification, clearly reports added/removed files
๐ฏ Use Cases
- ๐ Digital Forensics โ Chain-of-custody documentation with tamper-evident snapshots
- ๐ Security Audits โ Verify configuration integrity across systems
- ๐พ Backup Verification โ Ensure backup completeness and detect corruption
- ๐๏ธ Reproducible Builds โ Verify build outputs match expected state
- ๐ File System Monitoring โ Detect unauthorized changes in critical directories
- ๐ฆ Software Distribution โ Verify package integrity before deployment
- ๐ Change Tracking โ Track changes between versions with detailed diffs
๐ Documentation
- Installation Guide
- Quick Start
- Commands Reference
- Ignore Rules Guide
- Architecture Documentation
- Manifest Format Specification
- Contributing Guide
- Changelog
- License
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
Inspired by the need for deterministic, cryptographically secure directory integrity verification in:
- Digital forensics workflows
- Secure backup systems
- Configuration management
- Reproducible build systems
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file merklewatch-1.0.0.tar.gz.
File metadata
- Download URL: merklewatch-1.0.0.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00399dbe0f546084639c3611a4c68b095501c645da76588adff9bdf5f187484d
|
|
| MD5 |
32b7084ff7cc302feee0f2fbafb207be
|
|
| BLAKE2b-256 |
6f4c72bbbcfdac0d28262a28146a7357ea9157656034ca876d408b35cd73a1fe
|
File details
Details for the file merklewatch-1.0.0-py3-none-any.whl.
File metadata
- Download URL: merklewatch-1.0.0-py3-none-any.whl
- Upload date:
- Size: 20.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ddb9580fded60a4c889d24caa132d6ad0f42a32fec418b1508f07055691cff65
|
|
| MD5 |
d572261a89692f0dcd4ce828fc774fc5
|
|
| BLAKE2b-256 |
2b81f8cf641664d5c1d6f97fed349ad825a78ced9387b1f9af6b93ff8ab8b800
|