Binary Merkle Tree
Usage
The binary hash tree facilitates equality checks over a list of arbitrary data blobs.
Each tree anchors in a 32 byte root hash which is constructed by recursively hashing pairs of two tree nodes into one.
Merkle proofs, which equate arbitrary data against entries in the tree, are of succinct size irrespective of the input data.
Structure
- Each tree node is identified by a SHA-256 hash.
- Two types of nodes exist: Leaf and intermediate nodes.
- The pre-image of a leaf node is the byte
0x00
followed by an arbitrary amount of data. - The pre-image of an intermediate node is the byte
0x01
followed by two 32 byte hashes, each referring to a node.
- The pre-image of a leaf node is the byte
- Each node has one or zero parent intermediate nodes.
- Referring to the same node twice in the same intermediate node is permitted.
- The graph of nodes represents a binary tree, thus is acyclic and has one root node.
Algorithms
Leaf Order
The leaf order of a tree is defined by depth-first search traversal starting at the root, counting only leaf nodes.
Example
The leaf order of the tree in figure 1 is [Lβ, Lα, Lδ]
.
Figure 1: Non-canonical tree with three leaf nodes and two intermediate nodes
Iε
/ \
Iγ Lδ
/ \
Lβ Lα
Level Order
The level order of a tree is defined by depth-first search starting at the root, counting only nodes with a given level.
Example
The tree in figure 1 has the following level orders:
0: [Iε]
1: [Iγ, Lδ]
2: [Lβ, Lα]
Canonical Construction
The construction algorithm deterministically creates a tree structure over a list of items.
Determinism ensures that independently constructed trees over the same items are identical. This is required for equality and membership checks.
The canonical tree layout for any arbitrary list of items is defined by the following invariants:
- Each list item corresponds to one leaf node.
- The ordering of list items matches the order of leaf nodes.
- Each leaf node is in the deepest tree level.
- For any level
l
with number of nodesn(l)
, ifn(l) % 2 == 1 and n(l) > 1
, then the last node in levell-1
is an intermediate node that contains the hash of the last node inl
twice.
Example
Figure 2 shows the canonical construction of items [L0, L1, L2, L3, L4]
.
Figure 2: Canonical tree with 5 items
Iζ
/ \
/ \
Iδ Iε
/ \ \\
/ \ \\
Iα Iβ Iγ
/ \ / \ ||
L0 L1 L2 L3 L4
Contents of nodes:
L0 := sha256(concat(0x00, data[0]))
L1 := sha256(concat(0x00, data[1]))
L2 := sha256(concat(0x00, data[2]))
L3 := sha256(concat(0x00, data[3]))
L4 := sha256(concat(0x00, data[4]))
Iα := sha256(concat(0x01, hash(L0), hash(L1)))
Iβ := sha256(concat(0x01, hash(L2), hash(L3)))
Iγ := sha256(concat(0x01, hash(L4), hash(L4)))
Iδ := sha256(concat(0x01, hash(Iα), hash(Iβ)))
Iε := sha256(concat(0x01, hash(Iγ), hash(Iγ)))
Iζ := sha256(concat(0x01, hash(Iδ), hash(Iε)))
List Equality Check
Checking the equality of two merkle trees is trivial: Comparing the hash of the roots of either tree.
Security
No practical collision attacks against SHA-256 are known as of Oct 2022.
Collision resistance is vital to ensure that the graph of nodes remains acyclic and that each hash unambiguously refers to one logical node.
Test Vectors
Items (UTF-8) | Root Hash (Hex) |
---|---|
['test'] | dbebd10e61bc8c28591273feafbbef95d544f874693301d8f7f8e54c6e30058e |
['my', 'very', 'eager', 'mother', 'just, 'served', 'us', 'nine', 'pizzas', 'make', 'prime'] | b40c847546fdceea166f927fc46c5ca33c3638236a36275c1346d3dffb84e1bc |