Skip to main content

Python implementation of multiformats protocols.

Project description

multiformats: A Python implementation of multiformat protocols

Generic badge PyPI version shields.io PyPI status Checked with Mypy Python package standard-readme compliant

This is a fully compliant Python implementation of the multiformat protocols.

Table of Contents

Install

You can install the latest release from PyPI as follows:

pip install --upgrade multiformats

Usage

Varint

The varint module implements the unsigned-varint spec. Functionality is provided by the encode and decode functions, converting between non-negative int values and the corresponding varint bytes:

>>> from multiformats import varint
>>> varint.encode(128)
b'\x80\x01'
>>> varint.decode(b'\x80\x01')
128

For advanced usage, see the API documentation.

Multicodec

The multicodec module implements the multicodec spec. The Multicodec class provides a container for multicodec data:

>>> Multicodec("identity", "multihash", 0x00, "permanent", "raw binary")
Multicodec(name='identity', tag='multihash', code=0,
           status='permanent', description='raw binary')

Core functionality is provided by the get, exists, wrap and unwrap functions. The get and exists functions can be used to check whether a multicodec with given name or code is known, and if so to get the corresponding object:

>>> multicodec.exists("identity")
True
>>> multicodec.exists(code=0x01)
True
>>> multicodec.get("identity")
Multicodec(name='identity', tag='multihash', code=0,
           status='permanent', description='raw binary')
>>> multicodec.get(code=0x01)
Multicodec(name='cidv1', tag='cid', code=1,
           status='permanent', description='CIDv1')

The wrap and unwrap functions can be use to wrap raw binary data into multicodec data (prepending the varint-encoded multicodec code) and to unwrap multicodec data into a pair of multicodec code and raw binary data:

>>> raw_data = bytes([192, 168, 0, 254])
>>> multicodec_data = wrap("ip4", raw_data)
>>> raw_data.hex()
  'c0a800fe'
>>> multicodec_data.hex()
'04c0a800fe'
>>> varint.encode(0x04).hex()
'04' #       0x04 ^^^^ is the multicodec code for 'ip4'
>>> codec, raw_data = unwrap(multicodec_data)
>>> raw_data.hex()
  'c0a800fe'
>>> codec
Multicodec(name='ip4', tag='multiaddr', code='0x04', status='permanent', description='')

The Multicodec.wrap and Multicodec.unwrap methods perform analogous functionality with an object-oriented API, additionally enforcing that the unwrapped code is actually the code of the multicodec being used:

>>> ip4 = multicodec.get("ip4")
>>> ip4
Multicodec(name='ip4', tag='multiaddr', code='0x04', status='permanent', description='')
>>> raw_data = bytes([192, 168, 0, 254])
>>> multicodec_data = ip4.wrap(raw_data)
>>> raw_data.hex()
  'c0a800fe'
>>> multicodec_data.hex()
'04c0a800fe'
>>> varint.encode(0x04).hex()
'04' #       0x04 ^^^^ is the multicodec code for 'ip4'
>>> ip4.unwrap(multicodec_data).hex()
  'c0a800fe'
>>> ip4.unwrap(bytes.fromhex('00c0a800fe')) # 'identity' multicodec data
multiformats.multicodec.err.ValueError: Found code 0x00 when unwrapping data, expected code 0x04.

The table function can be used to iterate through known multicodecs, optionally restrictiong to one or more tags and/or statuses:

>>> len(list(multicodec.table())) # multicodec.table() returns an iterator
482
>>> selected = multicodec.table(tag=["cid", "ipld", "multiaddr"], status="permanent")
>>> [m.code for m in selected]
[1, 4, 6, 41, 53, 54, 55, 56, 81, 85, 112, 113, 114, 120,
 144, 145, 146, 147, 148, 149, 150, 151, 152, 176, 177,
 178, 192, 193, 290, 297, 400, 421, 460, 477, 478, 479, 512]

For advanced usage, see the API documentation.

Multibase

The multibase module implements the multibase spec. The Multibase class provides a container for multibase data:

>>> Multibase(name="base16", code="f",
              status="default", description="hexadecimal")
    Multibase(name='base16', code='f', status='default', description='hexadecimal')

Core functionality is provided by the encode and decode functions, which can be used to encode a bytestring into a string using a chosen multibase encoding and to decode a string into a bytestring using the multibase encoding specified by its first character:

>>> multibase.encode(b"Hello World!", "base32")
'bjbswy3dpeblw64tmmqqq'
>>> multibase.decode('bjbswy3dpeblw64tmmqqq')
b'Hello World!'

The multibase encoding specified by a given string is accessible using the from_str function:

>>> multibase.from_str('bjbswy3dpeblw64tmmqqq')
Multibase(encoding='base32', code='b',
          status='default',
          description='rfc4648 case-insensitive - no padding')

The exists and get functions can be used to check whether a multibase with given name or code is known, and if so to get the corresponding object:

>>> multibase.exists("base32")
True
>>> multibase.get("base32")
Multibase(encoding='base32', code='b',
          status='default',
          description='rfc4648 case-insensitive - no padding')
>>> multibase.exists(code="f")
True
>>> multibase.get(code="f")
Multibase(encoding="base16", code="f",
          status="default", description="hexadecimal")

For advanced usage, see the API documentation.

Multihash

The multihash module implements the multihash spec.

Core functionality is provided by the digest, wrap, unwrap functions, or the correspondingly-named methods Multihash.wrap and Multihash.unwrap of the Multihash class. The digest function and Multihash.digest method can be used to create a multihash digest directly from data:

>>> data = b"Hello world!"
>>> digest = multihash.digest(data, "sha2-256")
>>> digest.hex()
'1220c0535e4be2b79ffd93291305436bf889314e4a3faec05ecffcbb7df31ad9e51a'
>>> sha2_256 = multihash.get("sha2-256")
>>> digest = sha2_256.digest(data)
>>> digest.hex()
'1220c0535e4be2b79ffd93291305436bf889314e4a3faec05ecffcbb7df31ad9e51a'

By default, the full digest produced by the hash function is used. Optionally, a smaller digest size can be specified to produce truncated hashes:

>>> digest = multihash.digest(data, "sha2-256", size=20)
#        optional truncated hash size, in bytes ^^^^^^^
>>> multihash_digest.hex()
'1214c0535e4be2b79ffd93291305436bf889314e4a3f' # 20-bytes truncated hash

The unwrap function can be used to extract the raw digest from a multihash digest:

>>> digest.hex()
'1214c0535e4be2b79ffd93291305436bf889314e4a3f'
>>> raw_digest = multihash.unwrap(digest)
>>> raw_digest.hex()
    'c0535e4be2b79ffd93291305436bf889314e4a3f'

The Multihash.unwrap method performs the same functionality, but additionally checks that the multihash digest is valid for the multihash:

>>> raw_digest = sha2_256.unwrap(digest)
>>> raw_digest.hex()
    'c0535e4be2b79ffd93291305436bf889314e4a3f'
>>> sha1 = multihash.get("sha1")
>>> (sha2_256.code, sha1.code)
(18, 17)
>>> sha1.unwrap(digest)
err.ValueError: Decoded code 18 differs from multihash code 17.

The wrap function and Multihash.wrap method can be used to wrap a raw digest into a multihash digest:

>>> raw_digest.hex()
    'c0535e4be2b79ffd93291305436bf889314e4a3f'
>>> multihash.wrap(raw_digest, "sha2-256").hex()
'1214c0535e4be2b79ffd93291305436bf889314e4a3f'
>>> sha2_256.wrap(raw_digest).hex()
'1214c0535e4be2b79ffd93291305436bf889314e4a3f'

The multihash multicodec specified by a given multihash digest is accessible using the from_digest function:

>>> multihash.from_digest(multihash_digest)
Multicodec(name='sha2-256', tag='multihash', code='0x12',
           status='permanent', description='')

Note the both multihash code and digest length are encoded as varints(see varint usage above) and can span multiple bytes:

>>> multihash.get("skein1024-1024")
Multicodec(name='skein1024-1024', tag='multihash', code='0xb3e0',
           status='draft', description='')
>>> multihash.digest(data, "skein1024-1024").hex()
'e0e702800192e08f5143...' # 3+2+128 = 133 bytes in total
#^^^^^^     3-bytes varint for hash function code 0xb3e0
#      ^^^^ 2-bytes varint for hash digest length 128
>>> from multiformats import varint
>>> hex(varint.decode(bytes.fromhex("e0e702")))
'0xb3e0'
>>> varint.decode(bytes.fromhex("8001"))
128

Data and digests are all bytes objects (above, we represented them as hex strings for clarity):

>>> hash_digest
        b'\xc0S^K\xe2\xb7\x9f\xfd\x93)\x13\x05Ck\xf8\x891NJ?'
>>> multihash_digest
b'\x12\x14\xc0S^K\xe2\xb7\x9f\xfd\x93)\x13\x05Ck\xf8\x891NJ?'
# ^^^^     0x12 -> multihash multicodec "sha2-256"
#     ^^^^ 0x14 -> truncated hash length of 20 bytes

If you wish to produce digests for objects of other types, you should encode them into bytes first. For example, the to_bytes(length, byteorder) method can be used to obtain a bytes representation of an integer with given number of bytes and byte ordering, while the encode(encoding) method can be used to obtain a bytes representation of a string with given encoding:

>>> (400).to_bytes(4, byteorder="big")
b'\x00\x00\x01\x90'
>>> (400).to_bytes(4, byteorder="little")
b'\x90\x01\x00\x00'
>>> "Hello world!".encode("utf-8")
b'Hello world!'
>>> "Hello world!".encode("utf-16")
b'\xff\xfeH\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00!\x00'
>>> "Hello world!".encode("utf-16-le")
b'H\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00!\x00'
>>> "Hello world!".encode("utf-16-be")
b'\x00H\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00!'

For advanced usage, see the API documentation.

CID

The cid module implements the CID spec.

Core functionality is provided by the CID class, which can be imported directly from multiformats:

>>> from multiformats import CID

CIDs can be decoded from bytestrings or (multi)base encoded strings:

>>> cid = CID.decode("zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA")
>>> cid
CID('base58btc', 1, 'raw',
    '12206e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95')

CIDs can be created programmatically, and their fields accessed individually:

>>> cid = CID("base58btc", 1, "raw",
... "12206e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95")
>>> cid.base
Multibase(name='base58btc', code='z',
          status='default', description='base58 bitcoin')
>>> cid.codec
Multicodec(name='raw', tag='ipld', code='0x55',
           status='permanent', description='raw binary')
>>> cid.hashfun
Multicodec(name='sha2-256', tag='multihash', code='0x12',
           status='permanent', description='')
>>> cid.digest.hex()
'12206e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95'
>>> cid.raw_digest.hex()
    '6e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95'

CIDs can be converted to bytestrings or (multi)base encoded strings:

>>> str(cid)
'zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA'
>>> bytes(cid).hex()
'015512206e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95'
>>> cid.encode("base32") # encode with different multibase
'bafkreidon73zkcrwdb5iafqtijxildoonbwnpv7dyd6ef3qdgads2jc4su'

Additionally, the CID.peer_id static method can be used to pack the raw hash of a public key into a CIDv1 PeerID, according to the PeerID spec:

>>> pk_bytes = bytes.fromhex(
... "1498b5467a63dffa2dc9d9e069caf075d16fc33fdd4c3b01bfadae6433767d93")
... # a 32-byte Ed25519 public key
>>> peer_id = CID.peer_id(pk_bytes)
>>> peer_id
CID('base32', 1, 'libp2p-key',
'00201498b5467a63dffa2dc9d9e069caf075d16fc33fdd4c3b01bfadae6433767d93')
#^^   0x00 = 'identity' multihash used (public key length <= 42)
#  ^^ 0x20 = 32-bytes of raw hash digestlength
>>> str(peer_id)
'bafzaaiautc2um6td375c3soz4bu4v4dv2fx4gp65jq5qdp5nvzsdg5t5sm'

For advanced usage, see the API documentation.

Multiaddr

The multiaddr module implements the multiaddr spec.

Core functionality is provided by the Proto class:

>>> from multiformats import Proto
>>> ip4 = Proto("ip4")
>>> ip4
Proto("ip4")
>>> str(ip4)
'/ip4'
>>> ip4.codec
Multicodec(name='ip4', tag='multiaddr', code='0x04',
           status='permanent', description='')

Slash notation is used to attach address values to protocols:

>>> a = ip4/"192.168.1.1"
>>> a
Addr('ip4', '192.168.1.1')
>>> str(a)
'/ip4/192.168.1.1'
>>> bytes(a).hex()
'04c0a80101'

Address values can be specified as strings, integers, or bytes-like objects:

>>> ip4/"192.168.1.1"
Addr('ip4', '192.168.1.1')
>>> ip4/bytes([192, 168, 1, 1])
Addr('ip4', '192.168.1.1')
>>> udp = Proto("udp")
>>> udp/9090 # int 9090 is converted to str "9090"
Addr('udp', '9090')

Slash notation is also used to encapsulate multiple protocol/address segments into a multiaddr:

>>> quic = Proto("quic") # no addr required
>>> ma = ip4/"127.0.0.1"/udp/9090/quic
>>> ma
Multiaddr(Addr('ip4', '127.0.0.1'), Addr('udp', '9090'), Proto('quic'))
>>> str(ma)
'/ip4/127.0.0.1/udp/9090/quic'

Bytes for multiaddrs are computed according to the (TLV)+ multiaddr encoding:

>>> bytes(ip4/"127.0.0.1").hex()
'047f000001'
>>> bytes(udp/9090).hex()
          '91022382'
>>> bytes(quic).hex()
                  'cc03'
>>> bytes(ma).hex()
'047f00000191022382cc03'

The parse and decode functions create multiaddrs from their human-readable strings and encoded bytes respectively:

    >>> from multiformats import multiaddr
    >>> s = '/ip4/127.0.0.1/udp/9090/quic'
    >>> multiaddr.parse(s)
    Multiaddr(Addr('ip4', '127.0.0.1'), Addr('udp', '9090'), Proto('quic'))
    >>> b = bytes.fromhex('047f00000191022382cc03')
    >>> multiaddr.decode(b)
    Multiaddr(Addr('ip4', '127.0.0.1'), Addr('udp', '9090'), Proto('quic'))

For uniformity of API, the same functionality as the Proto class is provided by the proto function:

>>> ip4 = multiaddr.proto("ip4")
>>> ip4
Proto("ip4")

For advanced usage, see the API documentation.

API

The API documentation for this package is automatically generated by pdoc.

Contributing

Please see the contributing file.

License

MIT © Hashberg Ltd.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multiformats-0.1.2.post1.tar.gz (174.6 kB view hashes)

Uploaded Source

Built Distribution

multiformats-0.1.2.post1-py3-none-any.whl (47.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page