Python implementation of multiformats protocols.
Project description
multiformats
: A Python implementation of multiformat protocols
This is a fully compliant Python implementation of the multiformat protocols.
Table of Contents
Install
You can install the latest release from PyPI as follows:
pip install --upgrade multiformats
Usage
Varint
The varint
module implements the unsigned-varint spec. Functionality is provided by the encode
and decode
functions, converting between non-negative int
values and the corresponding varint bytes
:
>>> from multiformats import varint
>>> varint.encode(128)
b'\x80\x01'
>>> varint.decode(b'\x80\x01')
128
For advanced usage, see the API documentation.
Multicodec
The multicodec
module implements the multicodec spec. The Multicodec
class provides a container for multicodec data:
>>> Multicodec("identity", "multihash", 0x00, "permanent", "raw binary")
Multicodec(name='identity', tag='multihash', code=0,
status='permanent', description='raw binary')
Core functionality is provided by the get
, exists
, wrap
and unwrap
functions.
The get
and exists
functions can be used to check whether a multicodec with given name or code is known,
and if so to get the corresponding object:
>>> multicodec.exists("identity")
True
>>> multicodec.exists(code=0x01)
True
>>> multicodec.get("identity")
Multicodec(name='identity', tag='multihash', code=0,
status='permanent', description='raw binary')
>>> multicodec.get(code=0x01)
Multicodec(name='cidv1', tag='cid', code=1,
status='permanent', description='CIDv1')
The wrap
and unwrap
functions can be use to wrap raw binary data into multicodec data
(prepending the varint-encoded multicodec code) and to unwrap multicodec data into a pair
of multicodec code and raw binary data:
>>> raw_data = bytes([192, 168, 0, 254])
>>> multicodec_data = wrap("ip4", raw_data)
>>> raw_data.hex()
'c0a800fe'
>>> multicodec_data.hex()
'04c0a800fe'
>>> varint.encode(0x04).hex()
'04' # 0x04 ^^^^ is the multicodec code for 'ip4'
>>> codec, raw_data = unwrap(multicodec_data)
>>> raw_data.hex()
'c0a800fe'
>>> codec
Multicodec(name='ip4', tag='multiaddr', code='0x04', status='permanent', description='')
The Multicodec.wrap
and Multicodec.unwrap
methods perform analogous functionality
with an object-oriented API, additionally enforcing that the unwrapped code is actually
the code of the multicodec being used:
>>> ip4 = multicodec.get("ip4")
>>> ip4
Multicodec(name='ip4', tag='multiaddr', code='0x04', status='permanent', description='')
>>> raw_data = bytes([192, 168, 0, 254])
>>> multicodec_data = ip4.wrap(raw_data)
>>> raw_data.hex()
'c0a800fe'
>>> multicodec_data.hex()
'04c0a800fe'
>>> varint.encode(0x04).hex()
'04' # 0x04 ^^^^ is the multicodec code for 'ip4'
>>> ip4.unwrap(multicodec_data).hex()
'c0a800fe'
>>> ip4.unwrap(bytes.fromhex('00c0a800fe')) # 'identity' multicodec data
multiformats.multicodec.err.ValueError: Found code 0x00 when unwrapping data, expected code 0x04.
The table
function can be used to iterate through known multicodecs, optionally restrictiong to one or more tags and/or statuses:
>>> len(list(multicodec.table())) # multicodec.table() returns an iterator
482
>>> selected = multicodec.table(tag=["cid", "ipld", "multiaddr"], status="permanent")
>>> [m.code for m in selected]
[1, 4, 6, 41, 53, 54, 55, 56, 81, 85, 112, 113, 114, 120,
144, 145, 146, 147, 148, 149, 150, 151, 152, 176, 177,
178, 192, 193, 290, 297, 400, 421, 460, 477, 478, 479, 512]
For advanced usage, see the API documentation.
Multibase
The multibase
module implements the multibase spec. The Multibase
class provides a container for multibase data:
>>> Multibase(name="base16", code="f",
status="default", description="hexadecimal")
Multibase(name='base16', code='f', status='default', description='hexadecimal')
Core functionality is provided by the encode
and decode
functions, which can be used to
encode a bytestring into a string using a chosen multibase encoding and to decode a string
into a bytestring using the multibase encoding specified by its first character:
>>> multibase.encode(b"Hello World!", "base32")
'bjbswy3dpeblw64tmmqqq'
>>> multibase.decode('bjbswy3dpeblw64tmmqqq')
b'Hello World!'
The multibase encoding specified by a given string is accessible using the from_str
function:
>>> multibase.from_str('bjbswy3dpeblw64tmmqqq')
Multibase(encoding='base32', code='b',
status='default',
description='rfc4648 case-insensitive - no padding')
The exists
and get
functions can be used to check whether a multibase with given name or code is known, and if so to get the corresponding object:
>>> multibase.exists("base32")
True
>>> multibase.get("base32")
Multibase(encoding='base32', code='b',
status='default',
description='rfc4648 case-insensitive - no padding')
>>> multibase.exists(code="f")
True
>>> multibase.get(code="f")
Multibase(encoding="base16", code="f",
status="default", description="hexadecimal")
For advanced usage, see the API documentation.
Multihash
The multihash
module implements the multihash spec.
Core functionality is provided by the digest
, wrap
, unwrap
functions, or the correspondingly-named methods Multihash.wrap
and Multihash.unwrap
of the Multihash
class.
The digest
function and Multihash.digest
method can be used to create a multihash digest directly from data:
>>> data = b"Hello world!"
>>> digest = multihash.digest(data, "sha2-256")
>>> digest.hex()
'1220c0535e4be2b79ffd93291305436bf889314e4a3faec05ecffcbb7df31ad9e51a'
>>> sha2_256 = multihash.get("sha2-256")
>>> digest = sha2_256.digest(data)
>>> digest.hex()
'1220c0535e4be2b79ffd93291305436bf889314e4a3faec05ecffcbb7df31ad9e51a'
By default, the full digest produced by the hash function is used. Optionally, a smaller digest size can be specified to produce truncated hashes:
>>> digest = multihash.digest(data, "sha2-256", size=20)
# optional truncated hash size, in bytes ^^^^^^^
>>> multihash_digest.hex()
'1214c0535e4be2b79ffd93291305436bf889314e4a3f' # 20-bytes truncated hash
The unwrap
function can be used to extract the raw digest from a multihash digest:
>>> digest.hex()
'1214c0535e4be2b79ffd93291305436bf889314e4a3f'
>>> raw_digest = multihash.unwrap(digest)
>>> raw_digest.hex()
'c0535e4be2b79ffd93291305436bf889314e4a3f'
The Multihash.unwrap
method performs the same functionality, but additionally checks
that the multihash digest is valid for the multihash:
>>> raw_digest = sha2_256.unwrap(digest)
>>> raw_digest.hex()
'c0535e4be2b79ffd93291305436bf889314e4a3f'
>>> sha1 = multihash.get("sha1")
>>> (sha2_256.code, sha1.code)
(18, 17)
>>> sha1.unwrap(digest)
err.ValueError: Decoded code 18 differs from multihash code 17.
The wrap
function and Multihash.wrap
method can be used to wrap a raw digest into a multihash digest:
>>> raw_digest.hex()
'c0535e4be2b79ffd93291305436bf889314e4a3f'
>>> multihash.wrap(raw_digest, "sha2-256").hex()
'1214c0535e4be2b79ffd93291305436bf889314e4a3f'
>>> sha2_256.wrap(raw_digest).hex()
'1214c0535e4be2b79ffd93291305436bf889314e4a3f'
The multihash multicodec specified by a given multihash digest is accessible using the from_digest
function:
>>> multihash.from_digest(multihash_digest)
Multicodec(name='sha2-256', tag='multihash', code='0x12',
status='permanent', description='')
Note the both multihash code and digest length are encoded as varints(see varint usage above) and can span multiple bytes:
>>> multihash.get("skein1024-1024")
Multicodec(name='skein1024-1024', tag='multihash', code='0xb3e0',
status='draft', description='')
>>> multihash.digest(data, "skein1024-1024").hex()
'e0e702800192e08f5143...' # 3+2+128 = 133 bytes in total
#^^^^^^ 3-bytes varint for hash function code 0xb3e0
# ^^^^ 2-bytes varint for hash digest length 128
>>> from multiformats import varint
>>> hex(varint.decode(bytes.fromhex("e0e702")))
'0xb3e0'
>>> varint.decode(bytes.fromhex("8001"))
128
Data and digests are all bytes
objects (above, we represented them as hex strings for clarity):
>>> hash_digest
b'\xc0S^K\xe2\xb7\x9f\xfd\x93)\x13\x05Ck\xf8\x891NJ?'
>>> multihash_digest
b'\x12\x14\xc0S^K\xe2\xb7\x9f\xfd\x93)\x13\x05Ck\xf8\x891NJ?'
# ^^^^ 0x12 -> multihash multicodec "sha2-256"
# ^^^^ 0x14 -> truncated hash length of 20 bytes
If you wish to produce digests for objects of other types, you should encode them into bytes
first.
For example, the to_bytes(length, byteorder)
method can be used to obtain a bytes
representation of an integer
with given number of bytes and byte ordering, while the encode(encoding)
method can be used to obtain a bytes
representation of a string with given encoding:
>>> (400).to_bytes(4, byteorder="big")
b'\x00\x00\x01\x90'
>>> (400).to_bytes(4, byteorder="little")
b'\x90\x01\x00\x00'
>>> "Hello world!".encode("utf-8")
b'Hello world!'
>>> "Hello world!".encode("utf-16")
b'\xff\xfeH\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00!\x00'
>>> "Hello world!".encode("utf-16-le")
b'H\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00!\x00'
>>> "Hello world!".encode("utf-16-be")
b'\x00H\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00!'
For advanced usage, see the API documentation.
CID
The cid
module implements the CID spec.
Core functionality is provided by the CID
class, which can be imported directly from multiformats
:
>>> from multiformats import CID
CIDs can be decoded from bytestrings or (multi)base encoded strings:
>>> cid = CID.decode("zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA")
>>> cid
CID('base58btc', 1, 'raw',
'12206e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95')
CIDs can be created programmatically, and their fields accessed individually:
>>> cid = CID("base58btc", 1, "raw",
... "12206e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95")
>>> cid.base
Multibase(name='base58btc', code='z',
status='default', description='base58 bitcoin')
>>> cid.codec
Multicodec(name='raw', tag='ipld', code='0x55',
status='permanent', description='raw binary')
>>> cid.hashfun
Multicodec(name='sha2-256', tag='multihash', code='0x12',
status='permanent', description='')
>>> cid.digest.hex()
'12206e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95'
>>> cid.raw_digest.hex()
'6e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95'
CIDs can be converted to bytestrings or (multi)base encoded strings:
>>> str(cid)
'zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA'
>>> bytes(cid).hex()
'015512206e6ff7950a36187a801613426e858dce686cd7d7e3c0fc42ee0330072d245c95'
>>> cid.encode("base32") # encode with different multibase
'bafkreidon73zkcrwdb5iafqtijxildoonbwnpv7dyd6ef3qdgads2jc4su'
Additionally, the CID.peer_id
static method can be used to pack the raw hash of a public key into
a CIDv1 PeerID, according to the PeerID spec:
>>> pk_bytes = bytes.fromhex(
... "1498b5467a63dffa2dc9d9e069caf075d16fc33fdd4c3b01bfadae6433767d93")
... # a 32-byte Ed25519 public key
>>> peer_id = CID.peer_id(pk_bytes)
>>> peer_id
CID('base32', 1, 'libp2p-key',
'00201498b5467a63dffa2dc9d9e069caf075d16fc33fdd4c3b01bfadae6433767d93')
#^^ 0x00 = 'identity' multihash used (public key length <= 42)
# ^^ 0x20 = 32-bytes of raw hash digestlength
>>> str(peer_id)
'bafzaaiautc2um6td375c3soz4bu4v4dv2fx4gp65jq5qdp5nvzsdg5t5sm'
For advanced usage, see the API documentation.
Multiaddr
The multiaddr
module implements the multiaddr spec.
Core functionality is provided by the Proto
class:
>>> from multiformats import Proto
>>> ip4 = Proto("ip4")
>>> ip4
Proto("ip4")
>>> str(ip4)
'/ip4'
>>> ip4.codec
Multicodec(name='ip4', tag='multiaddr', code='0x04',
status='permanent', description='')
Slash notation is used to attach address values to protocols:
>>> a = ip4/"192.168.1.1"
>>> a
Addr('ip4', '192.168.1.1')
>>> str(a)
'/ip4/192.168.1.1'
>>> bytes(a).hex()
'04c0a80101'
Address values can be specified as strings, integers, or bytes
-like objects:
>>> ip4/"192.168.1.1"
Addr('ip4', '192.168.1.1')
>>> ip4/bytes([192, 168, 1, 1])
Addr('ip4', '192.168.1.1')
>>> udp = Proto("udp")
>>> udp/9090 # int 9090 is converted to str "9090"
Addr('udp', '9090')
Slash notation is also used to encapsulate multiple protocol/address segments into a multiaddr:
>>> quic = Proto("quic") # no addr required
>>> ma = ip4/"127.0.0.1"/udp/9090/quic
>>> ma
Multiaddr(Addr('ip4', '127.0.0.1'), Addr('udp', '9090'), Proto('quic'))
>>> str(ma)
'/ip4/127.0.0.1/udp/9090/quic'
Bytes for multiaddrs are computed according to the (TLV)+
multiaddr encoding:
>>> bytes(ip4/"127.0.0.1").hex()
'047f000001'
>>> bytes(udp/9090).hex()
'91022382'
>>> bytes(quic).hex()
'cc03'
>>> bytes(ma).hex()
'047f00000191022382cc03'
The parse
and decode
functions create multiaddrs from their human-readable strings and encoded bytes respectively:
>>> from multiformats import multiaddr
>>> s = '/ip4/127.0.0.1/udp/9090/quic'
>>> multiaddr.parse(s)
Multiaddr(Addr('ip4', '127.0.0.1'), Addr('udp', '9090'), Proto('quic'))
>>> b = bytes.fromhex('047f00000191022382cc03')
>>> multiaddr.decode(b)
Multiaddr(Addr('ip4', '127.0.0.1'), Addr('udp', '9090'), Proto('quic'))
For uniformity of API, the same functionality as the Proto
class is provided by the proto
function:
>>> ip4 = multiaddr.proto("ip4")
>>> ip4
Proto("ip4")
For advanced usage, see the API documentation.
API
The API documentation for this package is automatically generated by pdoc.
Contributing
Please see the contributing file.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file multiformats-0.1.2.post2.tar.gz
.
File metadata
- Download URL: multiformats-0.1.2.post2.tar.gz
- Upload date:
- Size: 174.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1b7fa128a7b34f357e5ea39b52e8d96831457ee1fa0f0d29e23c5508d4c5f2c |
|
MD5 | 873dcae603e70e90d05f63dce608c014 |
|
BLAKE2b-256 | c0f419ad1576627522f9a43955b2d75d928fd05ce4ca6b9065777d4b7f5cc2d3 |
File details
Details for the file multiformats-0.1.2.post2-py3-none-any.whl
.
File metadata
- Download URL: multiformats-0.1.2.post2-py3-none-any.whl
- Upload date:
- Size: 54.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8eddb2d9d932df26cca39b038a8254825737f70c6457c03b1a98b6d9eeaee7aa |
|
MD5 | d110cefe0ff1b68e02122d12d550030f |
|
BLAKE2b-256 | b43237030b5f54f907b0fc8adbc15019141e27af99b719d1234c389f05e1befa |