Skip to main content

Merkle tree and hash chain utilities for immutable audit logs

Project description

merkle-tree

Merkle tree and hash chain utilities for immutable audit logs.

Features

  • Hash Chaining: Link records together with SHA-256 hashes to create tamper-evident chains
  • Merkle Trees: Compute Merkle roots for efficient batch verification
  • Batch Processing: Process multiple records as a batch with automatic sequencing
  • Chain Verification: Detect tampering by verifying hash chain integrity
  • Protocol-Based: Flexible design using Python protocols for easy integration

Installation

pip install lokryn-merkle-tree

Or with uv:

uv add lokryn-merkle-tree

Quick Start

1. Define Your Record Type

Any class can be hashed as long as it implements the HashableRecord protocol. You control exactly what data is included in the hash by implementing get_hash_content():

from dataclasses import dataclass
from merkle_tree import HashableRecord, Hasher


@dataclass
class AuditEntry:
    """An audit log entry."""

    timestamp: str
    user_id: str
    action: str
    details: str

    # Required protocol fields
    record_hash: str = ""
    prev_hash: str = ""
    batch_id: str = ""
    batch_sequence: int = 0
    batch_merkle_root: str = ""

    def get_hash_content(self) -> bytes:
        """Define what data is included in the hash."""
        return f"{self.timestamp}|{self.user_id}|{self.action}|{self.details}".encode()

2. Hash Individual Records

hasher = Hasher()

entry1 = AuditEntry(
    timestamp="2024-01-15T10:30:00Z",
    user_id="user_123",
    action="LOGIN",
    details="Successful login from 192.168.1.1"
)

entry2 = AuditEntry(
    timestamp="2024-01-15T10:31:00Z",
    user_id="user_123",
    action="VIEW",
    details="Viewed document doc_456"
)

hasher.hash_record(entry1)
hasher.hash_record(entry2)

# Records are now chained
print(entry1.record_hash)  # SHA-256 hash of entry1
print(entry2.prev_hash)    # Points to entry1's hash

3. Process Batches with Merkle Roots

hasher = Hasher()

entries = [
    AuditEntry(timestamp="2024-01-15T10:30:00Z", user_id="user_1", action="CREATE", details="..."),
    AuditEntry(timestamp="2024-01-15T10:30:01Z", user_id="user_2", action="UPDATE", details="..."),
    AuditEntry(timestamp="2024-01-15T10:30:02Z", user_id="user_1", action="DELETE", details="..."),
]

merkle_root = hasher.hash_batch(entries)

# All entries share the same batch_id
# The last entry contains the merkle_root
print(entries[-1].batch_merkle_root)

4. Verify Integrity

# Verify hash chain continuity
is_valid, breaks = Hasher.verify_chain(entries)
if not is_valid:
    print(f"Chain broken at indices: {[b['record_index'] for b in breaks]}")

# Verify batch Merkle root
is_valid, stored_root, computed_root = Hasher.verify_batch(entries)
if not is_valid:
    print(f"Merkle root mismatch: stored={stored_root}, computed={computed_root}")

API Reference

HashableRecord Protocol

A protocol defining the interface for hashable records:

Attribute Type Description
record_hash str SHA-256 hash of this record's content
prev_hash str Hash of the previous record in the chain
batch_id str UUID identifying the batch this record belongs to
batch_sequence int Zero-indexed position within the batch
batch_merkle_root str Merkle root (set only on the last record of a batch)
Method Returns Description
get_hash_content() bytes The bytes to be hashed for this record

Hasher Class

Constructor

Hasher(last_hash: str = "")

Create a new hasher. Optionally provide the last hash from an existing chain to continue it.

Properties

Property Type Description
last_hash str The hash of the most recently processed record

Methods

Method Returns Description
set_last_hash(last_hash: str) None Update the chain state
hash_record(record: T) str Hash a record and link it to the chain
hash_batch(records: list[T]) str Hash a batch and compute Merkle root
compute_merkle_root(hashes: list[str]) str Static: compute Merkle root from hashes
verify_chain(records: list[T]) tuple[bool, list[dict]] Static: verify chain continuity
verify_batch(records: list[T]) tuple[bool, str, str] Static: verify batch Merkle root

Flexible Record Types

The protocol-based design means you can hash any data structure. Here are a few examples:

# A simple log entry
@dataclass
class LogEntry:
    message: str
    level: str
    # ... protocol fields ...

    def get_hash_content(self) -> bytes:
        return f"{self.level}:{self.message}".encode()


# A financial transaction
@dataclass
class Transaction:
    from_account: str
    to_account: str
    amount: Decimal
    currency: str
    # ... protocol fields ...

    def get_hash_content(self) -> bytes:
        return f"{self.from_account}>{self.to_account}:{self.amount}{self.currency}".encode()


# A file with binary content
@dataclass
class FileRecord:
    filename: str
    content: bytes
    # ... protocol fields ...

    def get_hash_content(self) -> bytes:
        return self.filename.encode() + self.content

The get_hash_content() method gives you full control over what data contributes to the hash. Include fields that should be immutable; exclude fields that may change (like status or metadata).

How It Works

Hash Chaining

Each record's prev_hash points to the previous record's record_hash, forming a chain:

Record 1          Record 2          Record 3
┌─────────────┐   ┌─────────────┐   ┌─────────────┐
│ record_hash │◄──│ prev_hash   │   │             │
│     A       │   │     A       │◄──│ prev_hash   │
│             │   │ record_hash │   │     B       │
│ prev_hash   │   │     B       │   │ record_hash │
│     A*      │   │             │   │     C       │
└─────────────┘   └─────────────┘   └─────────────┘

* First record: prev_hash = record_hash

Merkle Tree

Batch processing computes a Merkle root for efficient verification:

                    Root
                   /    \
                  /      \
               H(AB)    H(CD)
               /  \      /  \
              A    B    C    D

A, B, C, D = record hashes
H(AB) = SHA-256(A + B)
Root = SHA-256(H(AB) + H(CD))

For odd numbers of hashes, the last hash is carried up unchanged.

Continuing an Existing Chain

To append to an existing chain, initialize Hasher with the last known hash:

# Get the last hash from your database
last_hash = db.get_last_record_hash()

# Continue the chain
hasher = Hasher(last_hash=last_hash)
hasher.hash_record(new_entry)  # Links to existing chain

Use Cases

  • Audit Logs: Create tamper-evident logs for compliance and security
  • Data Integrity: Verify that historical data hasn't been modified
  • Blockchain-like Structures: Build lightweight chain structures without full blockchain overhead
  • Document Versioning: Track document changes with cryptographic proof
  • Event Sourcing: Ensure event stream integrity

Development

Setup

# Clone the repository
git clone https://github.com/lokryn-llc/merkle-tree.git
cd merkle-tree

# Install dependencies with uv
uv sync

Running Tests

# Run all tests
uv run pytest

# Run with verbose output
uv run pytest -v

# Run only unit tests
uv run pytest tests/test_hasher.py

# Run only integration tests
uv run pytest tests/test_integration.py

Linting

uv run ruff check .
uv run ruff format .

License

This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lokryn_merkle_tree-0.1.0.tar.gz (29.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lokryn_merkle_tree-0.1.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file lokryn_merkle_tree-0.1.0.tar.gz.

File metadata

  • Download URL: lokryn_merkle_tree-0.1.0.tar.gz
  • Upload date:
  • Size: 29.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lokryn_merkle_tree-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4c00ca39b13a21d1f358d8bca8304bbafcff5cb226fa521a3f0ada988eb34e7d
MD5 228d0b2c569c34ef18f6ead0a1e55244
BLAKE2b-256 7ef6f0a014f3291e13fdd7467019694e9ecc7ece005a3f6ff954f30811919eaa

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokryn_merkle_tree-0.1.0.tar.gz:

Publisher: publish.yml on lokryn-llc/merkle-tree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lokryn_merkle_tree-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for lokryn_merkle_tree-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 571676594ac0e2b2bb7afd816314239b7dd42a212d2f7b418ffdb716d3352e71
MD5 4d992258784ffc4886dbbe7617328f2a
BLAKE2b-256 eccd66aba6bbf96fd40fcf3d36d7c95be671258c5b47e7c665814875bfbb0bcc

See more details on using hashes here.

Provenance

The following attestation bundles were made for lokryn_merkle_tree-0.1.0-py3-none-any.whl:

Publisher: publish.yml on lokryn-llc/merkle-tree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page