Skip to main content

Deterministic integrity verification tool using Merkle trees and cryptographic hashing

Project description

๐Ÿ” MerkleWatch

Deterministic. Cryptographic. Tamper-Evident.

License: MIT Python 3.10+ Version

A CLI-first integrity verification tool that creates tamper-evident snapshots of directory structures using Merkle trees and cryptographic hashing.

Features โ€ข Installation โ€ข Quick Start โ€ข Commands โ€ข Ignore Rules โ€ข Documentation


๐Ÿš€ Features

  • ๐ŸŒฒ Merkle Tree Integrity โ€” Creates a single cryptographic root hash representing your entire directory
  • ๐Ÿ”’ Tamper Detection โ€” Detects any file modifications, additions, removals, or reorderings
  • ๐Ÿ“Š Detailed Diff Views โ€” Visual comparison of changes with color-coded output
  • ๐Ÿšซ Flexible Ignore Rules โ€” .merkleignore support with gitignore-like syntax
  • โšก Streaming Support โ€” Efficiently handles large files with chunked reading (64KB chunks)
  • ๐ŸŽฏ Deterministic โ€” Same directory always produces the same hash (cross-platform)
  • ๐Ÿงฉ Modular Design โ€” Clean separation between hashing, tree construction, and filesystem operations
  • ๐Ÿ“‹ JSON Manifests โ€” Human-readable snapshots with complete metadata
  • ๐Ÿ›ก๏ธ Domain Separation โ€” Cryptographically safe hashing with prefix-based separation
  • ๐Ÿ”„ Snapshot Comparison โ€” Compare two snapshots to see what changed over time
  • ๐Ÿ› ๏ธ Interactive Setup โ€” Guided ignore rule configuration

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.10 or higher

Install from Source

git clone https://github.com/ADPer0705/MerkleWatch.git
cd MerkleWatch
pip install -e .

๐Ÿš€ Quick Start

1. Create a Snapshot

merklewatch snapshot ./my_project --out baseline.json

2. Verify Integrity Later

merklewatch verify baseline.json ./my_project

3. Compare Two Snapshots

merklewatch diff baseline.json latest.json

๐Ÿ’ป Commands

snapshot - Create a Cryptographic Snapshot

Generate a tamper-evident snapshot of any directory:

merklewatch snapshot <directory> --out <manifest.json>

Examples:

# Snapshot your project
merklewatch snapshot ./my_project --out snapshot.json

# Snapshot with ignore rules (create .merkleignore first)
echo "node_modules/" > ./my_project/.merkleignore
echo "__pycache__/" >> ./my_project/.merkleignore
merklewatch snapshot ./my_project --out clean_snapshot.json

Output:

Snapshoting /path/to/directory...
Snapshot created successfully!
Root Hash: a7304db0e614521b6cd9c79bfaa8707f845c5f9f509bbc8286f040461b0820b9
Manifest saved to: snapshot.json

verify - Verify Directory Integrity

Check if a directory matches a previous snapshot:

merklewatch verify <manifest.json> <directory>

Successful Verification:

merklewatch verify baseline.json ./my_project
Verifying /path/to/directory against baseline.json...

โœ“ Verification SUCCESSFUL!
Root Hash matches: a7304db0e614521b6cd9c79bfaa8707f845c5f9f509bbc8286f040461b0820b9

Failed Verification (Tampering Detected):

merklewatch verify baseline.json ./my_project
Verifying /path/to/directory against baseline.json...

โœ— Verification FAILED!

Root Hash Mismatch:
  Expected: a7304db0e614521b6cd9c79bfaa8707f845c5f9f509bbc8286f040461b0820b9
  Actual:   94eee32191b256f2fdd489422beed8b7f1220e388d95d19002d7d4881c2f5fc7

Summary: 3 changes: 1 added, 1 removed, 1 modified

โœ“ Added files:
  + new_suspicious_file.txt

โœ— Removed files:
  - important_config.txt

โš  Modified files:
  M critical_data.json
      Old: 516ad7b388b21e05e8c56229f063d112e70a2fea45fdd357e8ff44e6a5bce689
      New: 52b3272721ffd27d6300389fb9b01a86148447fc78c14f7afde337854cc0860e

diff - Compare Two Snapshots

Compare two manifest files to see what changed between snapshots:

merklewatch diff <old_manifest.json> <new_manifest.json>

Example:

merklewatch diff snapshot_jan.json snapshot_feb.json
Comparing snapshot_jan.json โ†’ snapshot_feb.json...

Old manifest: 2025-01-15T10:30:00Z
  Root Hash: a7304db0e614521b6cd9c79bfaa8707f845c5f9f509bbc8286f040461b0820b9

New manifest: 2025-02-15T14:45:00Z
  Root Hash: 94eee32191b256f2fdd489422beed8b7f1220e388d95d19002d7d4881c2f5fc7

Summary: 5 changes: 2 added, 1 removed, 2 modified

โœ“ Added files:
  + src/new_feature.py
  + docs/api.md

โœ— Removed files:
  - deprecated/old_code.py

โš  Modified files:
  M src/main.py
      Old: 516ad7b388b21e05e8c56229f063d112e70a2fea45fdd357e8ff44e6a5bce689
      New: 52b3272721ffd27d6300389fb9b01a86148447fc78c14f7afde337854cc0860e
  M README.md
      Old: 8f4d3a1c9e7b2f6a5d0c8e1b4a7d3f9c2e5b8a1d4c7f0e3b6a9d2c5f8e1b4a7
      New: 1a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p7q8r9s0t1u2v3w4x5y6z7a8b9c0d1e2f

ignore - Configure Ignore Rules

Interactively configure .merkleignore file with a guided interface:

merklewatch ignore <directory>

Interactive Prompts:

  1. Suggests Common Patterns: Automatically finds node_modules/, .git/, __pycache__/, etc.
  2. Checkbox Selection: Check/uncheck patterns to add
  3. Browse All Files: Optional fuzzy-searchable list of all files and directories
  4. Save: Writes selected patterns to .merkleignore

Example Session:

merklewatch ignore ./my_project
Configuring ignores for /path/to/my_project

? Found common ignore candidates. Select ones to add: (Use arrow keys to move, space to select, type to filter)
 ยป โœ“ node_modules/
   โœ“ __pycache__/
   โœ“ .git/
   โ—‹ .DS_Store

? Do you want to browse and ignore other files/directories? (Y/n)

Updated .merkleignore with 3 new patterns.

๐Ÿšซ Ignore Rules

MerkleWatch supports .merkleignore files with gitignore-like syntax. Place a .merkleignore file in the root of the directory you want to snapshot.

.merkleignore Syntax

# Comments start with #

# Ignore specific files
.DS_Store
secrets.txt
config.local.json

# Ignore file patterns (glob matching)
*.log
*.tmp
*.pyc
*.swp

# Ignore directories (trailing slash recommended)
node_modules/
__pycache__/
.git/
dist/
build/
venv/

# Ignore directories (without slash also works)
.env
cache
temp

Pattern Matching Rules

Pattern Type Example Matches
Directory (with /) node_modules/ Directory and all its contents
Directory (without /) build Any file/directory named build
Glob pattern *.log All files ending with .log anywhere
Specific file .DS_Store Exact filename match anywhere
Comment # ignore logs Ignored (documentation)

Important Notes

  • โš ๏ธ No Built-in Ignores: MerkleWatch has NO default ignore patterns. Only patterns in .merkleignore are used.
  • ๐Ÿ“ Create Before Snapshot: Place .merkleignore before running snapshot
  • ๐Ÿ”„ Applies to All Commands: Both snapshot and verify respect ignore rules
  • ๐ŸŽฏ Case Sensitive: Pattern matching is case-sensitive

๐Ÿ—๏ธ How It Works

MerkleWatch creates a cryptographically secure fingerprint of your directory structure using Merkle trees:

1๏ธโƒฃ Hash Every File

Files are hashed using SHA-256 with chunked reading (64KB chunks) to handle large files efficiently:

file_hash = SHA256(file_contents)
leaf_hash = SHA256(0x00 || file_hash)

2๏ธโƒฃ Build Merkle Trees

Each directory becomes a Merkle tree where:

  • Files are leaf nodes (prefixed with 0x00)
  • Subdirectories are represented by their root hash (prefixed with 0x02)
  • All children are sorted alphabetically and paired
internal_hash = SHA256(0x01 || left || right)
dir_node = SHA256(0x02 || subdirectory_root_hash)

3๏ธโƒฃ Compute Root Hash

The entire directory structure collapses into a single root hash โ€” your tamper-evident seal.

Root Hash = MerkleRoot(all children)

Domain Separation

Type Prefix Purpose
File leaf 0x00 Content leaf
Internal node 0x01 Combines two children
Directory node 0x02 Represents subdirectory root

This prevents second-preimage attacks and ensures cryptographic safety.


๐Ÿ“‹ Manifest Format

Manifests are JSON files containing:

{
  "merklewatch_version": "1.0.0",
  "algorithm": "sha256",
  "timestamp": 1732464642.123456,
  "timestamp_iso": "2025-11-24T14:50:42Z",
  "root_hash": "a7304db0e614521b6cd9c79bfaa8707f845c5f9f509bbc8286f040461b0820b9",
  "files": {
    "README.md": {
      "size": 1234,
      "mtime": 1732464000.0,
      "content_hash": "516ad7b388b21...",
      "leaf_hash": "8a9f3c12d45..."
    }
  },
  "directories": {
    "src": {
      "root_hash": "94eee32191b256...",
      "node_hash": "1a2b3c4d5e6f..."
    }
  }
}

See docs/manifest-format.md for full specification.


๐Ÿ—‚๏ธ Project Structure

merklewatch/
โ”œโ”€โ”€ src/merklewatch/
โ”‚   โ”œโ”€โ”€ __init__.py         # Package initialization
โ”‚   โ”œโ”€โ”€ __main__.py         # Entry point
โ”‚   โ”œโ”€โ”€ cli.py              # Typer-based CLI interface
โ”‚   โ”œโ”€โ”€ hashing.py          # SHA-256 primitives with domain separation
โ”‚   โ”œโ”€โ”€ merkle.py           # Merkle tree construction logic
โ”‚   โ”œโ”€โ”€ filesystem.py       # Directory traversal & scanning
โ”‚   โ”œโ”€โ”€ manifest.py         # JSON manifest generation
โ”‚   โ”œโ”€โ”€ verification.py     # Verification logic
โ”‚   โ”œโ”€โ”€ diff.py             # Diff formatting and display
โ”‚   โ””โ”€โ”€ ignore.py           # Ignore rules handling
โ”œโ”€โ”€ docs/                   # Documentation
โ”‚   โ”œโ”€โ”€ architecture.md     # System architecture
โ”‚   โ”œโ”€โ”€ manifest-format.md  # Manifest specification
โ”‚   โ”œโ”€โ”€ ignore-rules.md     # Ignore rules guide
โ”‚   โ””โ”€โ”€ examples.md         # Usage examples
โ”œโ”€โ”€ test/                   # Test data
โ”œโ”€โ”€ pyproject.toml          # Project metadata & dependencies
โ”œโ”€โ”€ Makefile                # Development automation
โ”œโ”€โ”€ CHANGELOG.md            # Version history
โ”œโ”€โ”€ CONTRIBUTING.md         # Contribution guidelines
โ”œโ”€โ”€ LICENSE                 # MIT License
โ””โ”€โ”€ README.md               # This file

๐Ÿ› ๏ธ Error Handling

MerkleWatch gracefully handles common filesystem issues:

  • Permission Errors: Warns and skips inaccessible files/directories
  • Symlinks: Skips symbolic links to avoid loops and security issues
  • Empty Directories: Handles empty directories correctly
  • Large Files: Uses chunked reading (64KB) to avoid memory issues
  • Missing Files: During verification, clearly reports added/removed files

๐ŸŽฏ Use Cases

  • ๐Ÿ” Digital Forensics โ€” Chain-of-custody documentation with tamper-evident snapshots
  • ๐Ÿ” Security Audits โ€” Verify configuration integrity across systems
  • ๐Ÿ’พ Backup Verification โ€” Ensure backup completeness and detect corruption
  • ๐Ÿ—๏ธ Reproducible Builds โ€” Verify build outputs match expected state
  • ๐Ÿ“Š File System Monitoring โ€” Detect unauthorized changes in critical directories
  • ๐Ÿ“ฆ Software Distribution โ€” Verify package integrity before deployment
  • ๐Ÿ”„ Change Tracking โ€” Track changes between versions with detailed diffs

๐Ÿ“š Documentation


๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.


๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

Inspired by the need for deterministic, cryptographically secure directory integrity verification in:

  • Digital forensics workflows
  • Secure backup systems
  • Configuration management
  • Reproducible build systems

Made with โค๏ธ by ADPer

โญ Star this repo if you find it useful!

Report Bug ยท Request Feature

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merklewatch-1.0.0.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

merklewatch-1.0.0-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file merklewatch-1.0.0.tar.gz.

File metadata

  • Download URL: merklewatch-1.0.0.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for merklewatch-1.0.0.tar.gz
Algorithm Hash digest
SHA256 00399dbe0f546084639c3611a4c68b095501c645da76588adff9bdf5f187484d
MD5 32b7084ff7cc302feee0f2fbafb207be
BLAKE2b-256 6f4c72bbbcfdac0d28262a28146a7357ea9157656034ca876d408b35cd73a1fe

See more details on using hashes here.

File details

Details for the file merklewatch-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: merklewatch-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for merklewatch-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ddb9580fded60a4c889d24caa132d6ad0f42a32fec418b1508f07055691cff65
MD5 d572261a89692f0dcd4ce828fc774fc5
BLAKE2b-256 2b81f8cf641664d5c1d6f97fed349ad825a78ced9387b1f9af6b93ff8ab8b800

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page