Binary deserialization
Project description
"Let me get that for you."
Bellhop provides deserealization of binary data according to a model.
For example,
import bellhop
class Model(bellhop.Model):
num: int = bellhop.Field(length=4, endian=bellhop.Endian.big)
flag: bool
data: bytes = bellhop.Field(length=3)
obj = Model(b"\x00\x00\x00\xff\x01\x01\x02\x03\x04")
print(obj) # Model(num=255, flag=True, data=b"\x01\x02\x03")
Supported types
Basic types
The basic types supported are bool, any int subclass (though bool is treated differently), bytes, and any bellhop.Model subclass.
Integers
Integer fields (not counting bool) must have their length specified as seen in the first example. The allowed values are 1, 2, 4, and 8. Furthermore, you may specify the endianness. The possible values are Endian.native (the default), Endian.big, and Endian.little.
Boolean values
A bool field consumes one byte and is True if and only if the byte is non-zero.
Bytes
A bytes field consumes the data as is. Its length can be specified as in the original example. Setting the length to a negative value will consume all remaining data.
If the length is not specified via the bellhop.Field, then you must implement the resolve_length class method:
@classmethod
def resolve_length(cls, ctx: bellhop.ParsingContext) -> int:
...
A ParsingContext has the signature
class ParsingContext:
@property
def user_data(self) -> typing.Any:
...
@property
def offset(self) -> int:
...
@property
def field(self) -> str:
...
@property
def stash(self) -> dict[str, typing.Any]:
...
@property
def list_index(self) -> int:
...
ctx.offset is the offset (relative to the start of the model) of the field currently being parsed. ctx.field is the name of the field. ctx.stash not only holds the values of the previously parsed fields but can also be used to stash information for later use. Note that changing a previously parsed field's value via the stash does change the field's actual value.
The rest of the context's properties will be discussed later.
Compound types
Lists
You can have a list of any basic type:
class Model(bellhop.Model):
array: list[int] = bellhop.Field(length=1, list_length=4)
obj = Model(b"\x00\x01\x02\x03")
print(obj) # Model(array=[0, 1, 2, 3])
If the list length is not specified via the bellhop.Field, then you must implement the resolve_list_length class method:
@classmethod
def resolve_list_length(cls, ctx: bellhop.ParsingContext) -> int:
...
The individual item length (if applicable) can be specified via either a bellhop.Field or resolve_length. Every element of a list will have the same length.
You can set the list length to a negative value. This will cause elements to be continually added until a bellhop.TerminateList exception is raised (i.e., from a callback method).
If you add list_post=True to the bellhop.Field, the list_post_processing class method will be called for every item in the list:
@classmethod
def list_post_processing(cls, ctx: bellhop.ParsingContext, item: typing.Any) -> typing.Any:
...
ctx.list_index will equal the index within the list of the current item. item will be the parsed item and you must return either the item or a replacement item (which must still match the expected type).
Unions
You can have a union of any basic type, any list type, and None:
class Model(bellhop.Model):
field: int | list[bytes] | None
You must implement the resolve_union class method:
@classmethod
def resolve_union(cls, ctx: bellhop.ParsingContext) -> typing.Any:
...
This method must retain the type to use. If you want to use None (which consumes zero bytes), you can return either None or types.NoneType. When returning a list type, you must be specific. For example, using the example above, you would have to return list[bytes] and not list.
To sidestep ambiguities, the length of a union field must be specified via resolve_length and not bellhop.Field.
Fallback
It may be the case that you have a field which you think will match a particular bellhop.Model subclass but you're not sure. You can specify the field as
class Model(bellhop.Model):
field: Submodel | bytes = bellhop.Field(fallback=True)
In such a case, you wouldn't have to implement resolve_union (unless there were another subclass in the union). Instead, the parser would first try to parse the field as a Submodel and then, if that failed, it would backtrack and treat it as a bytes.
Custom fields
You can specify custom parsing for a particular field, even one not of a basic type, by setting custom=True in bellhop.Field. Its length will be determined by either bellhop.Field or resolve_length. The appropriate number of bytes will be read and then passed to resolve_custom:
@classmethod
def resolve_custom(cls, ctx: bellhop.ParsingContest, chunk: bytes) -> typing.Any:
...
Raw fields
Sometimes you can't know how long a field will be until you start reading its bytes. For example, suppose the bytes of a str field are preceded by a one-byte length (e.g., b"\x05hello"). resolve_custom won't work for you here since you'd have to first create a field for the length. Instead you can do
class Model(bellhop.Model):
word: str = bellhop.Field(raw=True)
@classmethod
def resolve_raw(cls, ctx: bellhop.ParsingContext, reader: Callable[[int], bytes]) -> typing.Any:
length = reader(1)[0]
return reader(length).decode("utf-8")
obj = Model(b"\x05helloabc")
assert obj.word == "hello"
A field cannot be both custom and raw.
Configuration
You can provide a configuration object to your model that can set the default endianness and even add new basic types.
Default endianness
As stated above, if an integer's endianness is not stated, it defaults to native endianness. You can change this in your configuration:
class Model(bellhop.Model):
__config__ = bellhop.Configuration(endian=bellhop.Endian.big)
...
If one model inherits from another, then the child model will inherit its parent's endianness unless the child specifies it in its own configuration.
New basic types
You can expand the list of basic types which your model accepts via an implementation:
@dataclasses.dataclass
class Foo:
x: int
y: int
def foo_builder(chunk: bytes) -> Foo:
x, y = struct.unpack(">HH", chunk)
return Foo(x=x, y=y)
implementation = bellhop.Implementation(Foo, builder=foo_builder, length=4)
class Model(bellhop.Model):
__config__ = bellhop.Configuration(implementations=implementation)
foo: Foo
obj = Model(b"\x00\x01\x00\x02")
assert obj.foo.x == 1
assert obj.foo.y == 2
The implementations argument to Configuration can either be a single implementation or an iterable thereof.
If the implementation's length is not provided, then the length will be determined by either bellhop.Field or resolve_length.
If the builder is not provided, then the class' constructor will be used (meaning it has to take a bytes as its only argument).
As with endianness, child models inherit implementations from their ancestors.
Padding
You can state that padding bytes should follow a field by
class Model(bellhop.Model):
flag: bool = bellhop.Field(padding=1)
num: int = bellhop.Field(length=1)
obj = Model(b"\x00\xff\x01")
assert obj.num == 1
If you set padding=None, then you must implement
@classmethod
def resolve_padding(cls, ctx: bellhop.ParsingContext) -> int:
...
Post init
You can define a __post_init__ method which will be called after all of the fields have been parsed:
class Model(bellhop.Model):
num: int = bellhop.Field(length=1)
def __post_init__(self) -> None:
self.num += 1
obj = Model(b"\x00")
assert obj.num == 1
Errors
The are several error types that can be raised by the parsing logic. All of them inherit from bellhop.Error and have a chain attribute. The chain is a description of where the parsing logic was when the error occurred. For example,
class Submodel(bellhop.Model):
flag: bool
num: int = bellhop.Field(length=1, post=True)
@classmethod
def post_processing(cls, ctx: bellhop.ParsingContext, value: typing.Any) -> typing.Any:
return 1/0
class Model(bellhop.Model):
chunk: bytes = bellhop.Field(length=4)
sub: Submodel
try:
Model(bytes(6))
except bellhop.Error as e:
assert isinstance(e, bellhop.UserCallbackError)
assert e.chain == [(Model, "sub", 4), (Submodel, "num", 1)]
assert isinstance(e.__cause__, ZeroDivisionError)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bellhop_parse-0.1.0.tar.gz.
File metadata
- Download URL: bellhop_parse-0.1.0.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0563648fd4328d52d221fcc7f583eb1b7cd7c5cb06c12a476f0c9ea622e1c5b7
|
|
| MD5 |
6cc00a97a625f5aa87734c8784ac6f22
|
|
| BLAKE2b-256 |
bcc5b17032f93cc3a18b016bb412ca0fa21b16031992d70de5b68f53e4e53415
|
File details
Details for the file bellhop_parse-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bellhop_parse-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5b65495995b289ad2e8d8c4885d61e57e583602028225634b33d79eaa97d5c1
|
|
| MD5 |
d63bbc1b88235f87548629de7757f475
|
|
| BLAKE2b-256 |
b8db2a99b00a88ea71e8bab7633bc273d56523064160e58eed395135400be47d
|