Skip to main content

A fast and flexible binary format serialization library for Python

Project description

fmtspec

PyPI Version Latest Release

fmtspec is a Python library for binary encoding and decoding built around composable format objects.

The only dependency is the excellent msgspec package.

Installation

$ pip install fmtspec

or add as a dependency in pyproject.toml

$ uv add fmtspec

Core Features

  • encode(...) and decode(...) for in-memory byte buffers
  • encode_stream(...) and decode_stream(...) for files, sockets, and BytesIO
  • Type-safe support for annotated dataclasses and msgspec.Struct
  • fmtspec.types for reusable primitives such as integers, enums, arrays, sized fields, bitfields, and tagged layouts
  • fmtspec.lib for a collection of ready-made embeddable formats (e.g. MessagePack)
  • encode_inspect(...), decode_inspect(...), and format_tree(...) for inspecting parse trees during encoding and decoding
  • Informative exceptions with failure paths and format context
  • Context and fmtspec.stream for implementing custom Type classes on the public API

Typical Workflow

  1. Describe the wire format with fmtspec.types and plain Python containers.
  2. Encode Python values with encode(...) or encode_stream(...).
  3. Decode the bytes back into builtins, dataclasses, or msgspec.Struct shapes.

Specify format with composable types

from fmtspec import decode, encode, types

packet_fmt = {
    "name": types.TakeUntil(types.str_utf8, b"\0"),
    "count": types.u32le,
}

packet = {
    "name": "widget",
    "count": 3,
}

data = encode(packet, packet_fmt)
assert data == b"widget\0\x03\x00\x00\x00"

decoded = decode(data, packet_fmt)
assert decoded == packet

# or without keys using iterable of formats

no_map_packet_fmt = (
    types.TakeUntil(types.str_utf8, b"\0"),
    types.u32le,
)

packet_tuple = ("widget", 3)

data = encode(packet_tuple, no_map_packet_fmt)
assert data == b"widget\0\x03\x00\x00\x00"

decoded = decode(data, no_map_packet_fmt)
assert tuple(decoded) == packet_tuple

This is the core fmtspec style: combine primitive format objects into mappings, tuples, or custom types, then round-trip ordinary Python values.

For streaming, the encode_stream and decode_stream functions are used:

from io import BytesIO

stream = BytesIO()
encode_stream(stream, packet, packet_fmt)

# rollback to the start for decoding our encoded data
stream.seek(0)
assert decode_stream(stream, packet_fmt) == packet

Derive the format for type-safe support

If class fields are annotated with typing.Annotated[..., fmt], fmtspec can recursively derive the mapping format for you.

from dataclasses import dataclass
from typing import Annotated

from fmtspec import decode, encode, types


@dataclass(frozen=True, slots=True)
class Record:
    name: Annotated[str, types.TakeUntil(types.str_utf8, b"\0")]
    count: Annotated[int, types.u32le]


record = Record(name="widget", count=3)
data = encode(record)
roundtripped = decode(data, type=Record)
assert roundtripped == record

This is the most ergonomic path when your wire layout already matches a dataclass or msgspec.Struct.

Inspect Layouts While Debugging

Inspection shows offsets, sizes, values, and child structure for each encoding or decoding step. It is intended for debugging, tooling, and protocol exploration rather than performance.

For simple structures:

from fmtspec import encode_inspect, format_tree, types

fmt = {
    "x": types.u8,
    "y": types.u16,
}

data, tree = encode_inspect({"x": 1, "y": 0x0203}, fmt)
print(format_tree(tree))

Output:

* Mapping @ [0:3] (3 bytes) (2 items)
├─ [x] Int @ [0:1] (1 bytes)
│    value: 1
│    data: 01
└─ [y] Int @ [1:3] (2 bytes)
     value: 515
     data: 02 03

For complex protocols like TLS Client Hello with nested structures and dynamic fields:

from fmtspec import encode_inspect, format_tree, types

version_fmt = {"major": types.u8, "minor": types.u8}
random_fmt = {"gmt_unix_time": types.u32, "random": types.Bytes(28)}
session_id_fmt = types.Sized(length=types.u8, fmt=types.Bytes())
cipher_suites_fmt = types.Sized(length=types.u16, fmt=types.array(types.u16))
compression_methods_fmt = types.Sized(length=types.u8, fmt=types.Bytes())

server_name_fmt = {
    "name_type": types.u8,
    "host_name": types.Sized(length=types.u16, fmt=types.Bytes()),
}
sni_fmt = types.Sized(length=types.u16, fmt=types.array(server_name_fmt))
protocol_fmt = types.Sized(length=types.u8, fmt=types.Bytes())
alpn_fmt = types.Sized(length=types.u16, fmt=types.array(protocol_fmt))

extension_fmt = {
    "type": types.u16,
    "body": types.Sized(
        length=types.u16,
        fmt=types.Switch(
            key=types.Ref("type"),
            cases={
                0x0000: sni_fmt,
                0x0010: alpn_fmt,
            },
            default=types.bytes_,
        ),
    ),
}
extensions_fmt = types.Sized(length=types.u16, fmt=types.array(extension_fmt))

tls_client_hello_fmt = {
    "version": version_fmt,
    "random": random_fmt,
    "session_id": session_id_fmt,
    "cipher_suites": cipher_suites_fmt,
    "compression_methods": compression_methods_fmt,
    "extensions": extensions_fmt,
}

client_hello = {
    "version": {"major": 3, "minor": 3},
    "random": {
        "gmt_unix_time": 0,
        "random": b"abcdefghijklmnopqrstuvwxyz_-",
    },
    "session_id": b"",
    "cipher_suites": [0x1301],
    "compression_methods": b"\x00",
    "extensions": [
        # SNI extension with one server name entry
        {"type": 0, "body": [{"name_type": 0, "host_name": b"example.com"}]},
    ],
}

data, tree = encode_inspect(client_hello, tls_client_hello_fmt)
print(format_tree(tree))

Output (truncated for brevity):

* Mapping @ [0:63] (63 bytes) (6 items)
├─ [version] Mapping @ [0:2] (2 bytes) (2 items)
│  ├─ [major] Int @ [0:1] (1 bytes)
│  │    value: 3
│  │    data: 03
│  └─ [minor] Int @ [1:2] (1 bytes)
│       value: 3
│       data: 03
├─ [random] Mapping @ [2:34] (32 bytes) (2 items)
│  ├─ [gmt_unix_time] Int @ [2:6] (4 bytes)
│  │    value: 0
│  │    data: 00 00 00 00
│  └─ [random] Bytes @ [6:34] (28 bytes)
│       value: b'abcdefghijklmnopqrstuvwxyz_-'
│       data: 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 78 ... (truncated)
├─ [session_id] Sized @ [34:35] (1 bytes) (2 items)
│  ├─ [--size--] Int @ [34:35] (1 bytes)
│  │    value: 0
│  │    data: 00
│  └─ [None] Bytes @ [35:35] (0 bytes)
│       value: b''
├─ [cipher_suites] Sized @ [35:39] (4 bytes) (2 items)
│  ├─ [--size--] Int @ [35:37] (2 bytes)
│  │    value: 2
│  │    data: 00 02
│  └─ [None] Array @ [37:39] (2 bytes) (1 items)
│     └─ [0] Int @ [37:39] (2 bytes)
│          value: 4865
│          data: 13 01
├─ [compression_methods] Sized @ [39:41] (2 bytes) (2 items)
│  ├─ [--size--] Int @ [39:40] (1 bytes)
│  │    value: 1
│  │    data: 01
│  └─ [None] Bytes @ [40:41] (1 bytes)
│       value: b'\x00'
│       data: 00
└─ [extensions] Sized @ [41:63] (22 bytes) (2 items)
   ├─ [--size--] Int @ [41:43] (2 bytes)
   │    value: 20
   │    data: 00 14
   └─ [None] Array @ [43:63] (20 bytes) (1 items)
      └─ [0] Mapping @ [43:63] (20 bytes) (2 items)
         ├─ [type] Int @ [43:45] (2 bytes)
         │    value: 0
         │    data: 00 00
         └─ [body] Sized @ [45:63] (18 bytes) (2 items)
            ... (truncated for brevity)

The inspection tree provides visibility into the full parse structure, including offsets and sizes at each nesting level.

The same format also works with the stream-based encode_stream and decode_stream functions:

from io import BytesIO

from fmtspec import decode_stream, encode_stream

stream = BytesIO()
encode_stream(stream, client_hello, tls_client_hello_fmt)

stream.seek(0)
assert decode_stream(stream, tls_client_hello_fmt) == client_hello

Error Model

The public API raises structured exceptions instead of only raw ValueError instances.

  • EncodeError: a Python value could not be serialized with the chosen format
  • DecodeError: the incoming bytes did not match the format
  • ExcessDecodeError: decoding succeeded, but trailing bytes remained; distinct from DecodeErrorexc.stream is positioned at the excess data for further decoding, and exc.remaining holds a typed byte count
  • TypeConversionError: decoding succeeded, but the result could not be converted into the requested type

These exceptions preserve context such as the active format, object, path, cause, and optional inspection node.

from fmtspec import DecodeError, ExcessDecodeError, decode, types

fmt = {
    "kind": types.u8,
    "payload": types.Sized(length=types.u8, fmt=types.Bytes()),
}

try:
    decode(b"\x01\x05abc", fmt)
except ExcessDecodeError as exc:
    # remaining bytes are accessible for further parsing
    print(exc.remaining)  # typed byte count
    print(exc.stream.read())  # or decode_stream(exc.stream, next_fmt)
    print(exc.path)
    print(exc.fmt)
except DecodeError as exc:
    # real decode failure (truncation, bad tag, etc.)
    print(exc)

This context is especially useful with nested mappings, arrays, Switch(...), and custom types, where the failing field path matters as much as the raw message. See docs/core-api.md for more detail on errors and inspection.

Documentation

These reference pages cover the details by topic:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fmtspec-0.7.0.tar.gz (40.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fmtspec-0.7.0-py3-none-any.whl (54.6 kB view details)

Uploaded Python 3

File details

Details for the file fmtspec-0.7.0.tar.gz.

File metadata

  • Download URL: fmtspec-0.7.0.tar.gz
  • Upload date:
  • Size: 40.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for fmtspec-0.7.0.tar.gz
Algorithm Hash digest
SHA256 f7870ebcc15c1ddd9ba0754e98c95cd78927aac3726a58e3193d9651924a976e
MD5 fc9833f5cf2e865fff84675319d52256
BLAKE2b-256 e15f09c8cb55348ab47eb5006935bdb5330a58a5d6edc7f2b3059ee9d0deb2a7

See more details on using hashes here.

File details

Details for the file fmtspec-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: fmtspec-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 54.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for fmtspec-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b19ffeec04c8f6396faa05afb5fbee2ca507156d4bc3ed05d94c96f7be56bcf
MD5 204b2e2b64e9365c23721eab116fca01
BLAKE2b-256 57f8c70e090b2e9375b62cf0c423c2e82d0b9df3bc3797573838b2c80feb55b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page