Skip to main content

A Python library for constructing Merkle Trees and validating Log Proofs

Project description

pymerkle: A Python library for constructing Merkle Trees and validating Proofs

Build Status Docs Status PyPI version Python >= 3.6

Complete documentation can be found at pymerkle.readthedocs.org.

This library implements a class for binary balanced Merkle-trees (with possibly odd number of leaves) capable of generating audit-proofs, consistency-proofs and inclusion-tests. It supports all hash functions (including SHA3 variations) and encoding types, whereas defense against second-preimage attack is by default activated. It further provides flexible mechanisms for validating the generated proofs and thus easy verification of encrypted data.

It is a zero dependency library (with the inessential exception of tqdm for displaying progress bars).

Installation

pip3 install pymerkle --pre

Quick example

See also Usage and API

from pymerkle import *                    # Import MerkleTree, validateProof and ProofValidator

tree = MerkleTree()                       # Create empty SHA256/UTF-8 Merkle-tree with
                                          # defense against second-preimage attack

# Successively update the Merkle-tree with one hundred records

for i in range(100):
    tree.encryptRecord(bytes('{}-th record'.format(i), 'utf-8'))

# Generate some audit-proofs

p = tree.auditProof(b'12-th record')      # Audit proof based on a given record
q = tree.auditProof(55)                   # Audit-proof based upon the 56-th leaf

# Quick validation of the above proofs

validateProof(target_hash=tree.rootHash(), proof=p) # True
validateProof(target_hash=tree.rootHash(), proof=q) # True

# Store the tree's current state (root-hash and length) for later use

old_hash = tree.rootHash()
sublength = tree.length()

# Further encryption of files and objects

tree.encryptObject({'a': 0, 'b': 1})      # One new leaf storing the digest of the given object
tree.encryptFileContent('path_to_file')   # One new leaf storing the digest of the given file's content
tree.encryptFilePerLog('logs/sample_log') # Encrypt file per log (one new leaf for each line)

# Generate consistency-proof for the state before the above encryptions

r = tree.consistencyProof(old_hash, sublength)

# Create object for refined proof validation, validate proof and generate receipt

validator = ProofValidator()     
validation_receipt = validator.validate(target_hash=tree.rootHash(), proof=r)

Encryption modes

Encryption of plain text (string, bytes, bytearray), JSON objects (dict) and files is supported. Use according to convenience any of the following methods of the MerkleTree class (all of them invoking internally the .update() method for appending newly created leaves):

.encryptRecord(), .encryptFileConent(), .encryptFilePerLog(), .encryptObject(), .encryptObjectFromFile(), .encryptFilePerObject()

See API or Usage for details about arguments and precise functionality.

Proof validation

Direct validation of a Merkle-proof is performed usind the validateProof() function, modifying the status of the inserted proof appropriately and returning the corresponding boolean. A more elaborate validation procedure includes generating a receipt with the validation result and storing at will the generated receipt as a .json file. This is achieved using the .validate() method of the ProofValidator like in the above quick example.

See API or Usage for details about arguments and precise functionality.

Exporting and reloading the tree from a file

Given an instance of the MekleTree class, the minimum required information can be exported using the .export() method into a .json file, so that the Merkle-tree can be reloaded in its current state from that file using the .loadFromFile() static method. This can be useful for transmitting the tree's current state to a trusted party or retrieving the tree from a backup file. Reconstruction of the tree is uniquely determined by the sequence of stored hashes (see the next section Tree structure to understand why).

See API or Usage for details about arguments and precise functionality.

Tree structure

Contrary to most implementations, the Merkle-tree is here always binary balanced, with all nodes except for the exterior ones (leaves) having two parents. This is achieved as follows: upon appending a block of new leaves, instead of promoting a lonely leaf to the next level or duplicating it, a bifurcation node gets created so that trees with the same number of leaves have always identical structure and input clashes among growing strategies be avoided (independently of the configured hash and encoding types). This standardization is further crucial for:

  • fast generation of consistency-proof paths (based on additive decompositions in decreasing powers of 2)
  • fast recalculation of the root-hash after appending a new leaf, since only the hashes at the tree's left-most branch need be recalculated
  • memory efficiency, since the height as well as total number of nodes with respect to the tree's length is controlled to the minimum. For example, a tree with 9 leaves has 17 nodes in the present implementation, whereas the total number of nodes in the structure described here is 20.

The topology is namely identical to that of a binary Sekura tree, depicted in Section 5.4 of this paper. Follow the straightforward algorithm of the MerkleTree.update() method for further insight.

Deviation from bitcoin specification

In contrast to the bitcoin specification for Merkle-trees, lonely leaves are not duplicated in order for the tree to remain genuinely binary. Instead, creating bifurcation nodes at the rightmost branch allows the tree to remain balanced upon any update. As a consequence, even if security against second-preimage attack (see below) were deactivated, the current implementation is by structure invulnerable to length-extension attacks due to the vulnerability described here (reported as CVE-2012-2459). Using Merkle-trees without duplicate entries further reduces the risk of bugs in protocols based upon them.

Defense against second-preimage attack

Defense against second-preimage attack is by default activated. Roughly speaking, it consists in the following security measures:

  • Before calculating the hash of a leaf, prepend the corresponding record with the null hexadecimal 0x00

  • Before calculating the hash any interior node, prepend both of its parents' hashes with the unit hexadecimal 0x01

(See here or here for some insight). In order to deactivate defense against second-preimage attack, set the security kwarg equal to False at construction:

tree = MerkleTree(security=False)

Read the tests/test_defense.py file inside the project's repository to see how to perform second-preimage attacks against the current implementation.

Running tests

You need to have installed pytest. From inside the root directory run the command

pytest tests/

to run all tests. This might take up to 2-4 minutes, since crypto parts of the code are tested against all possible combinations of hash algorithm and encoding type. You can run only a specific test file, e.g., test_encryption.py, with the command

pytest tests/test_encryption.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymerkle-3.0.1b0.tar.gz (31.5 kB view hashes)

Uploaded Source

Built Distribution

pymerkle-3.0.1b0-py3-none-any.whl (40.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page