Skip to main content

Binary deserialization

Project description

"Let me get that for you."

Bellhop provides deserealization of binary data according to a model.

For example,

import bellhop

class Model(bellhop.Model):
    num: int = bellhop.Field(length=4, endian=bellhop.Endian.big)
    flag: bool
    data: bytes = bellhop.Field(length=3)

obj = Model(b"\x00\x00\x00\xff\x01\x01\x02\x03\x04")
print(obj) # Model(num=255, flag=True, data=b"\x01\x02\x03")

Supported types

Basic types

The basic types supported are bool, any int subclass (though bool is treated differently), bytes, and any bellhop.Model subclass.

Integers

Integer fields (not counting bool) must have their length specified as seen in the first example. The allowed values are 1, 2, 4, and 8. Furthermore, you may specify the endianness. The possible values are Endian.native (the default), Endian.big, and Endian.little.

Boolean values

A bool field consumes one byte and is True if and only if the byte is non-zero.

Bytes

A bytes field consumes the data as is. Its length can be specified as in the original example. Setting the length to a negative value will consume all remaining data.

If the length is not specified via the bellhop.Field, then you must implement the resolve_length class method:

    @classmethod
    def resolve_length(cls, ctx: bellhop.ParsingContext) -> int:
        ...

A ParsingContext has the signature

class ParsingContext:
    @property
    def user_data(self) -> typing.Any:
        ...
    
    @property
    def offset(self) -> int:
        ...
    
    @property
    def field(self) -> str:
        ...
    
    @property
    def stash(self) -> dict[str, typing.Any]:
        ...
    
    @property
    def list_index(self) -> int:
        ...

ctx.offset is the offset (relative to the start of the model) of the field currently being parsed. ctx.field is the name of the field. ctx.stash not only holds the values of the previously parsed fields but can also be used to stash information for later use. Note that changing a previously parsed field's value via the stash does change the field's actual value.

The rest of the context's properties will be discussed later.

Compound types

Lists

You can have a list of any basic type:

class Model(bellhop.Model):
    array: list[int] = bellhop.Field(length=1, list_length=4)

obj = Model(b"\x00\x01\x02\x03")
print(obj) # Model(array=[0, 1, 2, 3])

If the list length is not specified via the bellhop.Field, then you must implement the resolve_list_length class method:

    @classmethod
    def resolve_list_length(cls, ctx: bellhop.ParsingContext) -> int:
        ...

The individual item length (if applicable) can be specified via either a bellhop.Field or resolve_length. Every element of a list will have the same length.

You can set the list length to a negative value. This will cause elements to be continually added until a bellhop.TerminateList exception is raised (i.e., from a callback method).

If you add list_post=True to the bellhop.Field, the list_post_processing class method will be called for every item in the list:

    @classmethod
    def list_post_processing(cls, ctx: bellhop.ParsingContext, item: typing.Any) -> typing.Any:
        ...

ctx.list_index will equal the index within the list of the current item. item will be the parsed item and you must return either the item or a replacement item (which must still match the expected type).

Unions

You can have a union of any basic type, any list type, and None:

class Model(bellhop.Model):
    field: int | list[bytes] | None

You must implement the resolve_union class method:

    @classmethod
    def resolve_union(cls, ctx: bellhop.ParsingContext) -> typing.Any:
        ...

This method must retain the type to use. If you want to use None (which consumes zero bytes), you can return either None or types.NoneType. When returning a list type, you must be specific. For example, using the example above, you would have to return list[bytes] and not list.

To sidestep ambiguities, the length of a union field must be specified via resolve_length and not bellhop.Field.

Fallback

It may be the case that you have a field which you think will match a particular bellhop.Model subclass but you're not sure. You can specify the field as

class Model(bellhop.Model):
    field: Submodel | bytes = bellhop.Field(fallback=True)

In such a case, you wouldn't have to implement resolve_union (unless there were another subclass in the union). Instead, the parser would first try to parse the field as a Submodel and then, if that failed, it would backtrack and treat it as a bytes.

Custom fields

You can specify custom parsing for a particular field, even one not of a basic type, by setting custom=True in bellhop.Field. Its length will be determined by either bellhop.Field or resolve_length. The appropriate number of bytes will be read and then passed to resolve_custom:

    @classmethod
    def resolve_custom(cls, ctx: bellhop.ParsingContest, chunk: bytes) -> typing.Any:
        ...

Raw fields

Sometimes you can't know how long a field will be until you start reading its bytes. For example, suppose the bytes of a str field are preceded by a one-byte length (e.g., b"\x05hello"). resolve_custom won't work for you here since you'd have to first create a field for the length. Instead you can do

class Model(bellhop.Model):
    word: str = bellhop.Field(raw=True)

    @classmethod
    def resolve_raw(cls, ctx: bellhop.ParsingContext, reader: Callable[[int], bytes]) -> typing.Any:
        length = reader(1)[0]
        return reader(length).decode("utf-8")

obj = Model(b"\x05helloabc")
assert obj.word == "hello"

A field cannot be both custom and raw.

Configuration

You can provide a configuration object to your model that can set the default endianness and even add new basic types.

Default endianness

As stated above, if an integer's endianness is not stated, it defaults to native endianness. You can change this in your configuration:

class Model(bellhop.Model):
    __config__ = bellhop.Configuration(endian=bellhop.Endian.big)

    ...

If one model inherits from another, then the child model will inherit its parent's endianness unless the child specifies it in its own configuration.

New basic types

You can expand the list of basic types which your model accepts via an implementation:

@dataclasses.dataclass
class Foo:
    x: int
    y: int

def foo_builder(chunk: bytes) -> Foo:
    x, y = struct.unpack(">HH", chunk)
    return Foo(x=x, y=y)

implementation = bellhop.Implementation(Foo, builder=foo_builder, length=4)

class Model(bellhop.Model):
    __config__ = bellhop.Configuration(implementations=implementation)

    foo: Foo

obj = Model(b"\x00\x01\x00\x02")
assert obj.foo.x == 1
assert obj.foo.y == 2

The implementations argument to Configuration can either be a single implementation or an iterable thereof.

If the implementation's length is not provided, then the length will be determined by either bellhop.Field or resolve_length.

If the builder is not provided, then the class' constructor will be used (meaning it has to take a bytes as its only argument).

As with endianness, child models inherit implementations from their ancestors.

Padding

You can state that padding bytes should follow a field by

class Model(bellhop.Model):
    flag: bool = bellhop.Field(padding=1)
    num: int = bellhop.Field(length=1)

obj = Model(b"\x00\xff\x01")
assert obj.num == 1

If you set padding=None, then you must implement

    @classmethod
    def resolve_padding(cls, ctx: bellhop.ParsingContext) -> int:
        ...

Post init

You can define a __post_init__ method which will be called after all of the fields have been parsed:

class Model(bellhop.Model):
    num: int = bellhop.Field(length=1)

    def __post_init__(self) -> None:
        self.num += 1

obj = Model(b"\x00")
assert obj.num == 1

Errors

The are several error types that can be raised by the parsing logic. All of them inherit from bellhop.Error and have a chain attribute. The chain is a description of where the parsing logic was when the error occurred. For example,

class Submodel(bellhop.Model):
    flag: bool
    num: int = bellhop.Field(length=1, post=True)

    @classmethod
    def post_processing(cls, ctx: bellhop.ParsingContext, value: typing.Any) -> typing.Any:
        return 1/0

class Model(bellhop.Model):
    chunk: bytes = bellhop.Field(length=4)
    sub: Submodel

try:
    Model(bytes(6))
except bellhop.Error as e:
    assert isinstance(e, bellhop.UserCallbackError)
    assert e.chain == [(Model, "sub", 4), (Submodel, "num", 1)]
    assert isinstance(e.__cause__, ZeroDivisionError)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bellhop_parse-0.1.0.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bellhop_parse-0.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file bellhop_parse-0.1.0.tar.gz.

File metadata

  • Download URL: bellhop_parse-0.1.0.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for bellhop_parse-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0563648fd4328d52d221fcc7f583eb1b7cd7c5cb06c12a476f0c9ea622e1c5b7
MD5 6cc00a97a625f5aa87734c8784ac6f22
BLAKE2b-256 bcc5b17032f93cc3a18b016bb412ca0fa21b16031992d70de5b68f53e4e53415

See more details on using hashes here.

File details

Details for the file bellhop_parse-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bellhop_parse-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for bellhop_parse-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b5b65495995b289ad2e8d8c4885d61e57e583602028225634b33d79eaa97d5c1
MD5 d63bbc1b88235f87548629de7757f475
BLAKE2b-256 b8db2a99b00a88ea71e8bab7633bc273d56523064160e58eed395135400be47d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page