Skip to main content

法 — ML model weight integrity verification via hierarchical Merkle trees. O(1) root check, O(k log C) layer-aware diff, incremental sync.

Project description

hanfei-fa 法

Verify ML model weights. Know exactly what changed.


法不阿贵,绳不挠曲。 The law does not favor the noble; the plumb line does not bend for the crooked. — 韩非子


Hierarchical Merkle tree verification for ML model weights. Answers three questions no other tool can:

  1. "Is this model exactly what I expect?" — O(1) root hash comparison
  2. "Which layers changed after fine-tuning?" — O(k log C) tree-walk diff with layer/tensor/chunk granularity
  3. "How much bandwidth can I save with incremental sync?" — estimates show 50-70% savings for typical fine-tuning

Zero runtime dependencies. Pure Python standard library. Optional integrations with safetensors, PyTorch, HuggingFace Hub, and BLAKE3.

Why this exists

Every existing tool hashes model files as opaque blobs:

Tool Granularity Diff capability Knows model structure?
HuggingFace Hub Whole-file SHA256 No No
HuggingFace Xet Byte-level CDC chunks Implicit (dedup) No
Sigstore Model Signing Whole-file SHA256 No No
DVC Whole-file MD5 No No
PyTorch torch.save None (CRC disabled) No No
safetensors None (Issue #220 closed "not planned") No No
hanfei-fa Chunk → Tensor → Layer → Model O(k log C) tree-walk Yes

hanfei-fa is the only tool that understands model structure. When you fine-tune 2 of 12 transformer layers, it tells you which 2 layers changed, which tensors within them, and which chunks — without scanning the unchanged 80%.

Install

pip install hanfei-fa                    # core (zero deps)
pip install hanfei-fa[safetensors]       # + safetensors support
pip install hanfei-fa[huggingface]       # + HuggingFace Hub integration
pip install hanfei-fa[torch]             # + PyTorch checkpoint support
pip install hanfei-fa[fast]              # + BLAKE3 (5-10x faster hashing)
pip install hanfei-fa[all]               # everything

Quick Start

Sign and verify a safetensors model

from merkle_verify.safetensors_adapter import sign, verify

# Sign: builds Merkle tree, writes .merkle.json sidecar
tree = sign("model.safetensors")
print(tree.model_root)  # e14b10a8ce78b70...

# Verify: re-hashes and compares against manifest
is_valid, details = verify("model.safetensors")
# True — all tensors intact

Diff two model versions

from merkle_verify.safetensors_adapter import diff

result = diff("base_model.safetensors", "finetuned_model.safetensors")
print(result["changed_layers"])   # ['blocks.4', 'blocks.5']
print(result["changed_params"])   # ['blocks.4.attn.weight', ...]
print(result["change_percentage"])  # 33.2%
print(result["hash_comparisons"])   # 2066 (vs 21811 total chunks)

Verify a single tensor (without loading the full model)

from merkle_verify.safetensors_adapter import verify_tensor

is_valid, details = verify_tensor("model.safetensors", "blocks.0.attn.weight")
# Loads and hashes only this one tensor — O(tensor_size), not O(model_size)

Sign a HuggingFace Hub model from local cache

from merkle_verify.safetensors_adapter import from_hf_repo

tree = from_hf_repo("bert-base-uncased")
# Automatically finds cached safetensors, handles sharded models
print(f"{tree.model_root}")   # golden fingerprint
print(f"{len(tree.layer_trees)} layers, verified")

PyTorch checkpoints

from merkle_verify.pytorch_adapter import merkle_save, merkle_load

# Save with integrity manifest
merkle_save(model, "checkpoint.pt")

# Load with automatic verification
state_dict, details = merkle_load("checkpoint.pt")
assert details["verified"]  # weights match manifest

Use BLAKE3 for faster hashing

from merkle_verify import set_default_algorithm, HashAlgorithm

set_default_algorithm(HashAlgorithm.BLAKE3)  # 5-10x faster than SHA-256
# All subsequent operations use BLAKE3 automatically

Stream-hash a large file (constant memory)

from merkle_verify import build_file_merkle_tree

tree = build_file_merkle_tree("70b-model.safetensors")
# O(chunk_size) memory, not O(file_size). Works on multi-GB files.

CLI

merkle-verify hash model.safetensors          # Merkle root hash
merkle-verify sign model.safetensors          # Build tree + write .merkle.json
merkle-verify verify model.safetensors        # Check against manifest (exit 0/1)
merkle-verify diff base.safetensors ft.safetensors  # Layer-aware diff
merkle-verify info model.merkle.json          # Show manifest details
merkle-verify hf-sign bert-base-uncased       # Sign from HF cache

How it works

A 4-level hierarchical Merkle tree mirrors the structure of a neural network:

                Model Root
               /          \
        Layer 0            Layer 1         ...  Layer N
       /       \          /       \
  attn.weight  attn.bias  mlp.weight  mlp.bias
   /  |  \       |         /  |  \      |
  c0  c1  c2    c0       c0  c1  c2    c0       ← 16KB chunks

Verification: Compare root hashes — O(1).

Diff: Walk both trees in parallel. If a subtree's hash matches, skip it entirely. Only descend into subtrees that differ. Complexity: O(k log C) where k = changed chunks, C = total chunks.

Pruning in practice: Fine-tuning 1 of 60 ResNet parameters? The diff performs 264 hash comparisons instead of scanning all 2,953 chunks — a 91% reduction.

Performance

Tested on real models:

Model Params Build time Diff (1% change) Hash comparisons
ResNet-18 11.7M 0.03s 0.1ms 264 / 2,953
BERT-base 110M 1.2s
GPT-2 scale (340MB) 0.8s 1.0ms 2,066 / 21,811
Streaming 512MB 0.9s

Supported hash algorithms

Algorithm Output Speed Install
SHA-256 (default) 64 hex Baseline Built-in
SHA-512 128 hex ~Same Built-in
SHA3-256 64 hex ~Same Built-in
BLAKE2b 128 hex ~Same Built-in
BLAKE3 64 hex 5-10x faster pip install hanfei-fa[fast]

Part of the HanFei (韩非) series

This project is part of a family of open-source tools for verifiable AI:

Project Role Language Install
hanfei-shu 术 GPU-accelerated MSM for ZK proofs Rust + CUDA cargo add hanfei-shu
hanfei-fa 法 (this) Model weight integrity verification Python pip install hanfei-fa

The names come from Han Feizi's (韩非子) political philosophy:

  • 法 (fa) — Law: objective, deterministic verification. A hash doesn't lie.
  • 术 (shu) — Technique: the computational machinery that makes proofs fast.

Contributing

Contributions are welcome and appreciated. This project grows through community involvement.

How to contribute:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/your-idea)
  3. Make your changes with tests
  4. Submit a pull request

All PRs are reviewed and merged on a regular basis. We especially welcome:

  • New model architecture support in _extract_layer_name()
  • Chunking strategy improvements (content-defined chunking, etc.)
  • Performance optimizations
  • Documentation and examples
  • Integration with other ML frameworks (JAX, TensorFlow, ONNX)

Found this useful? Please consider:

  • Giving a star on GitHub
  • Citing the project if you use it in your work:
@software{hanfei_fa,
  author = {Geoffrey Wang},
  title = {hanfei-fa: ML Model Weight Integrity Verification via Hierarchical Merkle Trees},
  year = {2026},
  url = {https://github.com/GeoffreyWang1117/hanfei-fa},
}

License

Apache-2.0 — Copyright 2026 Geoffrey Wang

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hanfei_fa-0.2.0.tar.gz (44.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hanfei_fa-0.2.0-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file hanfei_fa-0.2.0.tar.gz.

File metadata

  • Download URL: hanfei_fa-0.2.0.tar.gz
  • Upload date:
  • Size: 44.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for hanfei_fa-0.2.0.tar.gz
Algorithm Hash digest
SHA256 05a359008cb3f4888626c68f59283735a551fa291f89f6213978cd5db580aebd
MD5 d2f52a63b66673fcd3306ccbcdb899a4
BLAKE2b-256 ba55d904502a855c76f614072123587fddb539bb75f33dbbb56da5be4d24b50e

See more details on using hashes here.

File details

Details for the file hanfei_fa-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: hanfei_fa-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for hanfei_fa-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8af1ab9f9cf1af2579c3c67a52f82cd3fe5138dd0d72057ef2a5b76d5fb81c09
MD5 08b3b66a145b6542cd93fde2b15e1d01
BLAKE2b-256 0b8083defbcf5f7b4c3c0eff013cc27f704ea27acb3a3084266dada99ba7c9bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page