Skip to main content

Merkle-tree cryptography in python

Project description

pymerkle

Merkle-tree in Python

Build Status Docs Status PyPI version Python >= 3.7

Documentation at pymerkle.readthedocs.org.

Storage agnostic implementation capable of generating inclusion and consistency proofs.

Install

pip3 install pymerkle

This will also install cachetools as a dependency.

Basic API

Let MerkleTree be any class implementing the BaseMerkleTree interface; e.g.,

from pymerkle import InmemoryTree as MerkleTree

tree = MerkleTree(algorithm='sha256')

Append data into the tree and retrieve the corresponding hash value:

index = tree.append_entry(b'foo')   # leaf index

value = tree.get_leaf(index)        # leaf hash

Current tree size:

size = tree.get_size()    # number of leaves

Current and intermediate states:

state = tree.get_state()    # current root-hash

state = tree.get_state(5)   # root-hash of size 5 subtree

Inclusion proof

Prove inclusion of the 3-rd leaf hash in the subtree of size 5:

proof = tree.prove_inclusion(3, 5)

Verify the proof against the base hash and the subtree root:

from pymerkle import verify_inclusion

base = tree.get_leaf(3)
root = tree.get_state(5)

verify_inclusion(base, root, proof)

Consistency proof

Prove consistency between the states with size 3 and 5:

proof = tree.prove_consistency(3, 5)

Verify the proof against the respective root-hashes:

from pymerkle import verify_consistency

state1 = tree.get_state(3)
state2 = tree.get_state(5)

verify_consistency(state1, state2, proof)

Supported hash functions

sha224, sha256, sha384, sha512, sha3_224, sha3_256, sha3_384, sha3_512

Support for Keccak beyond SHA3

Installing pysha3==1.0.2 makes available the following hash functions:

keccak_224, keccak_256, keccak_384, keccak_512

Security

This library requires security review.

Resistance against second-preimage attack

This consists in the following standard technique:

  • Upon computing the hash of a leaf node, prepend 0x00 to the payload
  • Upon computing the hash of an interior node, prepend 0x01 to the payload

Note: For, say, testing purposes, you can disable this feature by passing disable_security=True when initializing the BaseMerkleTree superclass. Refer here to see how to perform second-preimage attack against the present implementation.

Resistance against CVE-2012-2459 DOS

Contrary to the bitcoin specification, lonely leaves are not duplicated while the tree is growing. Instead, a bifurcation node is created at the rightmost branch (see next section). As a consequence, the present implementation should be invulnerable to the CVE-2012-2459 DOS attack (see also here for insight).

Topology

Interior nodes are not assumed to be stored anywhere and no concrete links are created between them. The tree structure is determined by the recursive function which computes intermediate states on the fly and is essentially the same as RFC 9162 (Section 2). It turns out to be that of a binary Sakura tree (Section 5.4).

Optimizations

The performance of a Merkle-tree depends on how efficiently it computes the root-hash for arbitrary leaf ranges on the fly. The recursive version of this operation (e.g., RFC 9162, Section 2) is slow.

A key remark is that the above operation can be made iterative by combining the root-hashes for ranges whose size is a power of two ("subroots") and can as such be computed efficiently. Subroot computation has significant impact on performance (>500% speedup) while keeping peak memory usage reasonably low (e.g., 200 MiB for a tree with several tens of millions of entries) and linear with respect to tree size.

Note: For, say, comparison purposes, you can disable this feature by passing disable_optimizations=True when initializing the BaseMerkleTree superclass.

Caching

In view of the above technique, subroot computation is the only massively repeated and relatively costly operation. It thus makes sense to apply memoization for ranges whose size exceeds a certain threshold (128 leaves by default). For example, after sufficiently many cache hits (e.g. 2MiB cache memory), proof generation becomes 5 times faster for a tree with several tens of million of entries. Practically, a pretty big tree with sufficiently long uptime will respond instantly with negligible penalty in memory usage.

Note: For, say, comparison purposes, you can disable this feature by passing disable_cache=True when initializing the BaseMerkleTree superclass.

Storage

This library is unopinionated on how leaves are appended to the tree, i.e., how data is stored in concrete. Cryptographic functionality is encapsulated in the BaseMerkleTree abstract class, which admits pluggable storage backends through subclassing. It is the the developer's choice to decide how to store data by implementing the interior storage interface of this class. Any contiguously indexed dataset should do the job. Conversely, given any such dataset, we should be able to trivially implement a Merkle-tree that is operable with it.

Example

This is a simple non-persistent implementation utilizing a list as storage. It expects entries to be strings, which it encodes in utf-8 before hashing.

from pymerkle import BaseMerkleTree


class MerkleTree(BaseMerkleTree):

    def __init__(self, algorithm='sha256'):
        """
        Storage setup and superclass initialization
        """
        self.hashes = []

        super().__init__(algorithm)


    def _encode_entry(self, data):
        """
        Prepares data entry for hashing
        """
        return data.encode('utf-8')


    def _store_leaf(self, data, digest):
        """
        Stores data hash in a new leaf and returns index
        """
        self.hashes += [digest]

        return len(self.hashes)


    def _get_leaf(self, index):
        """
        Returns the hash stored by the leaf specified
        """
        value = self.hashes[index - 1]

        return value


    def _get_leaves(self, offset, width):
        """
        Returns hashes corresponding to the specified leaf range
        """
        values = self.hashes[offset: offset + width]

        return values


    def _get_size(self):
        """
        Returns the current number of leaves
        """
        return len(self.hashes)

Development

In what follows, you need to have locally installed dev requirements:

pip3 install -r requirements-dev.txt

Tests

./test.sh [--help]

Performance

In order to capture the effect of I/O operations, performance measurements are run against a SQLite database as leaf storage. Create it using the following script:

python benchmarks/init_db.py [--help]

Benchmarks

./benchmark.sh [--help]

Profiling

Assuming valgrind and massif-visualizer are installed, use

./profile.sh [--help]

to do memory profiling. Pass --time to profile execution times instead of memory allocations.

Documentation

pymerkle.readthedocs.org.

Build locally

Documentation is built with sphinx.

Assuming dev requirements have been installed, build the docs with

./build-docs.sh [--help]

and browse at

docs/target/build/html/index.html

to view them.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymerkle-6.1.0.tar.gz (36.2 kB view details)

Uploaded Source

Built Distribution

pymerkle-6.1.0-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file pymerkle-6.1.0.tar.gz.

File metadata

  • Download URL: pymerkle-6.1.0.tar.gz
  • Upload date:
  • Size: 36.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for pymerkle-6.1.0.tar.gz
Algorithm Hash digest
SHA256 bf43addba1c49da4f7fd905a6836f58f9e99294b2b34c6460b24f53156ed285a
MD5 b6cf7d1d5362b2e96e988667828ef1d3
BLAKE2b-256 6190b96b3af828426da138107270e6942b5aac9525c6b1eddec73c9ecd63134f

See more details on using hashes here.

File details

Details for the file pymerkle-6.1.0-py3-none-any.whl.

File metadata

  • Download URL: pymerkle-6.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for pymerkle-6.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ba32e71fc9bd53a65c2815414aca9e3c43b9d0376b3c73d41cb7b3814267a0a
MD5 387345f53311fa75b597c0e005961101
BLAKE2b-256 1c460b5c80ee9912fcd174d112c00f95d12f19e88dec00177c6b3a51b1330d4f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page