Easy-to-use Merkle tree library.
Project description
Merkle Structures
This is a simple-to-use implementation of the concept of Merklized data
structures, e.g. the Merkle Tree and the Merkle Mountain Range. There are
several main classes: Tree, XorHashTree, VirtualMachine, and OpCode; two
error classes; two configuration functions; and a few miscellaneous helper
functions. See the Usage section for details. This uses sha256 as the default
hash algorithm, but it can use any in theory.
This package uses a virtual machine for proof verification: proofs are created in a bytecode form, and the bytecode is then fed through a virtual machine with several registers: left, right, path, bit, final, size, and return.
Status
Open issues planned for future releases can be found here. Historical changes can be found in the changelog.
Installation
pip install merkleasy
Classes
ImplementationError(BaseException)SecurityError(BaseException)TreeXorHashTreeVirtualMachineOpCode
Functions and Methods
set_hash_function(hash_function: Callable[[bytes], bytes]) -> Noneget_hash_function() -> Callablecompile(proof: list[bytes]) -> bytesdecompile(proof: bytes) -> list[bytes]hash_node(left: bytes, right: bytes) -> byteshash_leaf(leaf: bytes) -> bytes
Tree
__init__(self, left: Tree | bytes, right: Tree | bytes) -> None__str__(self) -> str__repr__(self) -> str__eq__(self, other: object) -> bool__hash__(self) -> intto_dict(self) -> dictto_json(self, pretty: bool = False) -> str@classmethod from_leaves(cls, leaves: list[bytes]) -> Tree@classmethod from_dict(cls, data: dict) -> Tree@classmethod from_json(cls, data: str) -> Treeprove(self, leaf: bytes, verbose: bool = False) -> bytes@staticmethod verify(root: bytes, leaf: bytes, proof: bytes, report_errors: bool = False) -> bool|tuple[bool, list[BaseException]]
XorHashTree
__init__(self, left: XorHashTree | bytes, right: XorHashTree | bytes) -> None__str__(self) -> str__repr__(self) -> str__eq__(self, other: object) -> bool__hash__(self) -> intto_dict(self) -> dictto_json(self, pretty: bool = False) -> str@classmethod from_leaves(cls, leaves: list[bytes]) -> XorHashTree@classmethod from_dict(cls, data: dict) -> XorHashTree@classmethod from_json(cls, data: str) -> XorHashTreeprove(self, leaf: bytes, verbose: bool = False) -> bytes@staticmethod verify(root: bytes, leaf: bytes, proof: bytes, report_errors: bool = False) -> bool|tuple[bool, list[BaseException]]
Usage
Usage examples are shown below. The Tree class is a classic Merkle tree. The
XorHashTree class is a variant of the Tree class that joins branches by
XORing their hashes, but its use is practically identical to the Tree class.
More thorough documentation can be found in the dox.md file generated by autodox.
Tree.from_leaves
The easiest way to use this to create a Merkle Tree is with from_leaves:
from merkleasy import Tree
leaves = [b'leaf1', b'leaf2', b'leaf3', b'leaf4', b'etc']
tree = Tree.from_leaves(leaves)
Note that all leaves are hashed by the from_leaves method.
Raises ValueError or TypeError upon invalid input.
Tree.__init__
To make custom Merklized data structures, use the __init__ method:
from merkleasy import Tree
leaf1 = b'leaf1'
leaf2 = b'leaf2'
leaf3 = b'leaf3'
leaf4 = b'leaf4'
leaf5 = b'leaf5'
another_whole_tree = Tree.from_leaves([b'123', b'456', b'789'])
tree = Tree(
Tree(
leaf1,
Tree(
Tree(leaf2, leaf3),
Tree(leaf4, leaf5)
)
),
another_whole_tree
)
Raises TypeError upon invalid input.
Tree.to_dict and Tree.from_dict
A Tree structure can be converted to a dict and back.
from merkleasy import Tree
tree = Tree.from_leaves([b'leaf1', b'leaf2', b'leaf3'])
serialized = tree.to_dict()
deserialized = Tree.from_dict(serialized)
assert deserialized == tree
Tree.from_dict raises any of the following upon invalid input:
TypeErrorValueErrorSecurityError
Tree.to_json and Tree.from_json
Serialization and deserialization of a structure uses to_json and from_json:
from merkleasy import Tree
tree = Tree.from_leaves([b'leaf1', b'leaf2', b'leaf3'])
serialized = tree.to_json()
pretty = tree.to_json(pretty=True)
deserialized = Tree.from_json(serialized)
assert deserialized == Tree.from_json(pretty)
Tree.from_json raises any of the following upon invalid input:
json.decoder.JSONDecodeErrorTypeErrorValueErrorSecurityError
Tree.prove
Inclusion proofs can be generated using the prove method:
from merkleasy import Tree
tree = Tree.from_leaves([b'leaf1', b'leaf2', b'leaf3'])
proof = tree.prove(b'leaf2')
Each inclusion proof is a sequence of bytes, which are the compiled byte codes
of OpCodes and their arguments to be executed by the proof-verifying Virtual
Machine. An optional parameter, verbose, can be set to True in the call to
prove if the proof should include checks for intermediate values; if verbose
is left to the default False value, a shorter proof that checks only the final
hash will be produced. There are no security advantages to using verbose proofs;
it is primarily useful for manual inspection by including intermediate,
calculated values. However, the functionality exists as a side-effect of
preventing malicious proofs from tricking the validator --
test_Tree_verify_does_not_validate_malicious_proof contains an example attack.
Raises TypeError or ValueError upon invalid input.
Tree.verify
Inclusion proofs can be verified using Tree.verify:
from merkleasy import Tree, SecurityError
tree = Tree.from_leaves([b'leaf1', b'leaf2', b'leaf3'])
leaf = b'leaf1'
proof1 = tree.prove(leaf)
proof2 = tree.prove(b'leaf2')
# Example 1: type error
try:
if Tree.verify('some string', leaf, proof1):
print(f'verified proof {proof1.hex()} for {leaf=}')
else:
print(f'proof {proof1.hex()} for {leaf=} is invalid')
except TypeError as e:
# expected result
print(f'invalid type supplied as an input: {e}')
# Example 2: valid proof
if Tree.verify(tree.root, leaf, proof1):
# expected result
print(f'verified proof {proof1.hex()} for {leaf=}')
else:
print(f'proof {proof1.hex()} for {leaf=} is invalid')
# Example 3: invalid (malformed) proof
result = Tree.verify(tree.root, leaf, b'\x99' + proof2, report_errors=True)
if result[0]:
print(f'verified proof {proof2.hex()} for {leaf=}')
else:
# expected result
print(f'errors encountered: {result[1]}')
assert type(result[1][0]) is ValueError, type(result[1][0])
# note that for XorHashTree, the error type will be SyntaxError
# Example 4: invalid proof (wrong leaf)
result = Tree.verify(tree.root, leaf, proof2, report_errors=True)
if result[0]:
print(f'verified proof {proof2.hex()} for {leaf=}')
else:
# expected result
print(f'errors encountered: {result[1]}')
assert type(result[1][0]) is SecurityError
This static method parses the proof, interpreting as a sequence of byte codes
containing OpCodes and their arguments. It ensures that the proof starts
with the leaf and ends with the root, and then it follows the proof operations.
Raises TypeError when provided invalid parameters. If all type checks pass, it
executes without error and returns True or False. If report_errors is set
to True, then it returns a tuple of (status, errors), and those errors will
be those encountered by the virtual machine while attempting to execute the
proof.
get_hash_function
To access the currently-set hash function, use the following:
from merkleasy import get_hash_function
hash_function = get_hash_function()
print(hash_function(b'abc').hex())
# ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
set_hash_function
The package uses sha256 by default, but it can be used with any hash function. This is accomplished by passing a Callable that takes a bytes parameter, applies a hash algorithm, and returns a bytes value. For example, to use sha3_256:
from hashlib import sha3_256
from merkleasy import set_hash_function, get_hash_function
set_hash_function(lambda preimage: sha3_256(preimage).digest())
print(get_hash_function()(b'abc'))
# 3a985da74fe225b2045c172d6bd390bd855f086e3e9d525b46bfe24511431532
Raises ImplementationError if the Callable parameter passed in does not meet
the requirements or TypeError if the parameter is not a Callable.
Note that calling set_hash_function will have no effect on any Trees created
prior. However, it will affect any calls to Tree.verify with proofs from
those Trees. If you plan to use the library with a custom hash function, then
set_hash_function should be called during a setup routine.
If you want to handle multiple Trees created with different hash algorithms,
then a context handler like the below might be useful:
from hashlib import sha3_256
from merkleasy import set_hash_function, get_hash_function, Tree
class HashAlgoSwitch:
"""Context manager for switching out algorithms for Tree use."""
def __init__(self, new_hash_function) -> None:
self.original_hash_function = get_hash_function()
set_hash_function(new_hash_function)
def __enter__(self) -> None:
return
def __exit__(self, __exc_type, __exc_value, __traceback) -> None:
set_hash_function(self.original_hash_function)
def alt_create_tree(leaves) -> Tree:
with HashAlgoSwitch(lambda data: sha3_256(data).digest()):
return Tree.from_leaves(leaves)
leaves = [b'one', b'two', b'three']
tree1 = Tree.from_leaves(leaves)
tree2 = alt_create_tree(leaves)
assert tree1 != tree2
Security / Usage Notes
Second-preimage attack
Any application/library that uses this package should use a schema for leaves that is anything except exactly 32 bytes. This prevents the second-preimage attack whereby the application is tricked into thinking that an intermediate node in the tree is a leaf. It is hard to envision a scenario in which this could actually become a serious security issue, but it is worth keeping in mind during application development.
So in addition to verifying an inclusion proof, verify that the data fits the leaf schema. For example, leaf schema could be non-bytes, and a serializer to bytes from the schema could be used on the leaf before verifying the inclusion proof. Another option is to hash each leaf and prepend any arbitrary byte to make each leaf 33 bytes long, allowing for the hash to be verified as a leaf without requiring the full leaf, which will maintain concise proofs.
Mirror trees (XorHashTree)
All XorHashTrees with identical left and right branches ("mirror trees") will
have the same root, regardless of what those branches are, which means that
inclusion of any arbitrary branch can be proved for any mirror tree. This is
because xor(left, right) == 0 when left == right. If you use XorHashTree
mirror trees, they will be exploitable. This is not the case for the basic
Merkle Tree class. If checking to ensure that mirror trees are not created
within an application is not desirable, use the Tree class instead.
(This package does not generate an error when a mirror tree is created becase it may have uses in some niche scenarios.)
Testing
To develop or test, fork or clone the repo.
Windows Setup
python -m venv venv/
source venv/Scripts/activate
*nix Setup
python -m venv venv/
source venv/bin/activate
Running Tests
There are several test files. Run them with the following:
python tests/test_classes.py
python tests/test_vm.py
python tests/test_xorhashtree.py
These files demonstrate all the intended behaviors of the class and rule out
many unintended behaviors. They use randint and many repetitions to ensure
that the test suite is thorough. The tests are also a form of technical
documentation; any questions about the code can likely be answered by reading
through them.
There are a total of 52 tests currently for completed features.
There are several additional test files that are a combination of an unfinished feature and leftovers from an old project that requires updates. Once those are resolved, this notice will be removed.
ISC License
Copyleft (c) 2025 Jonathan Voss (k98kurz)
Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyleft notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file merkleasy-0.1.1.tar.gz.
File metadata
- Download URL: merkleasy-0.1.1.tar.gz
- Upload date:
- Size: 31.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0rc1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d90030f3def392707da2c17109a29225a6ea3805ec1004627abe4561130514a9
|
|
| MD5 |
c84fb5847ece0cf952699097e297f7a2
|
|
| BLAKE2b-256 |
740886724b415cbfa882ebd06794edc9321c624283034fcf143d16e43a46af55
|
File details
Details for the file merkleasy-0.1.1-py3-none-any.whl.
File metadata
- Download URL: merkleasy-0.1.1-py3-none-any.whl
- Upload date:
- Size: 24.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0rc1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8506d5053a4d78d1c0780785e1fa5546eb88b45eb57e03925c6396bbecd44934
|
|
| MD5 |
5d1a759c1043600c0520dc4adf5dd8b6
|
|
| BLAKE2b-256 |
a841d82e09634ef9f26f768b806fca3a3a689660be6541028bc269e2e024b7b3
|