Skip to main content

Parse binary data using declarative field layout and native Python properties

Project description

Build Status

ByteField

A Python library for parsing/manipulating binary data with easily accessible Python properties inspired by Django. The library is still in development. ByteField supports:

  • Variable length fields
  • Nested structures
  • Parsing only accessed fields

Quick example

ByteField allows to define binary data layout declaratively which then maps to underlying bytes:

from bytefield import *

class Header(ByteStruct):
    magic = StringField(length=5)
    length = IntegerField()
    array = ArrayField(shape=None, elem_field_type=IntegerField)
    floating = FloatField()

header = Header(magic='bytes', floating=3.14)
header.length = 3
header.array = list(range(1, header.length + 1))
print(header.data)

Output:

bytearray(b'bytes\x03\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\xc3\xf5H@')`

Example: parse a JPEG header

You can embed other structure declarations inside structures:

from bytefield import *

class RGB(ByteStruct):
    r = IntegerField(signed=False, size=1)
    g = IntegerField(signed=False, size=1)
    b = IntegerField(signed=False, size=1)

class Marker(ByteStruct):
    marker = IntegerField(size=2, signed=False)
    length = IntegerField(size=2, signed=False)
    identifier = StringField(length=5, encoding='ascii')
    version = IntegerField(size=2, signed=False)
    density = IntegerField(size=1, signed=False)
    x_density = IntegerField(size=2, signed=False)
    y_density = IntegerField(size=2, signed=False)
    x_thumbnail = IntegerField(size=2, signed=False)
    y_thumbnail = IntegerField(size=2, signed=False)
    thumb_data = ArrayField(shape=None, elem_field_type=RGB)

class JPEGHeader(ByteStruct):
    soi = IntegerField(size=2, signed=False)
    marker = StructField(Marker)

with open('image.jpg', 'rb') as f:
    # Parse the JPEG header
    header = JPEGHeader(f.read())

    # Resize the thumbnail data
    header.marker.resize(
        Marker.thumb_data_field, header.marker.x_thumbnail * header.marker.y_thumbnail
    )

    # Display the thumbnail
    display_thumbnail(header.marker.thumb_data)

Writing custom struct logic

You can create high-level structures which define their own behavior depending on the data contained within the struct:

from bytefield import *

class DynamicFloatArray(ByteStruct):
    length = IntegerField(signed=False)
    array_data = ArrayField(None, FloatField)

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # When instantiated, resize the array according to its length
        self.resize(DynamicFloatArray.array_data_field, self.length)

data = bytearray(b'\x03\x00\x00\x00\x00\x00\x80?\x00\x00\x00@\x00\x00@@')
print(DynamicFloatArray(data))

Output:

[DynamicFloatArray object at 0x1c88e709e50]
length (int): 3
array_data (ndarray): [1.0 2.0 3.0]

Variable fields

Bytefield supports fields with unknown type/size:

from bytefiel import *

TYPE_INTEGER = 0
TYPE_FLOAT = 1
TYPE_STRING = 2

class DynamicString(ByteStruct):
    length = IntegerField(signed=False)
    str = StringField(None)

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.resize(DynamicString.str_field, self.length)

class Content(ByteStruct):
    content_type = IntegerField(signed=False, size=2)
    content_data = VariableField()  # a variable field that will be resized when parsing the struct

    def __init__(self, data: bytearray = None, *args, **kwargs):
        super().__init__(data, *args, **kwargs)
        resize_bytes = not bool(data)
        if self.content_type == TYPE_INTEGER:
            self.resize(Content.content_data_field, IntegerField(), resize_bytes=resize_bytes)
        elif self.content_type == TYPE_FLOAT:
            self.resize(Content.content_data_field, FloatField(), resize_bytes=resize_bytes)
        elif self.content_type == TYPE_STRING:
            self.resize(Content.content_data_field, StructField(DynamicString), resize_bytes=resize_bytes)

write = Content()
write.content_type = TYPE_STRING
write.resize(Content.content_data_field, StructField(DynamicString), resize_bytes=True)
write.content_data.str = 'content'
write.content_data.length = len(write.content_data.str)

read = Content(write.data)
print(f'{write.data} is parsed to:\n{read}')

Output

bytearray(b'\x02\x00\x07\x00\x00\x00content') is parsed to:
[Content object at 0x1c1846888b0]
content_type (int): 2
content_data (DynamicString):
        length (int): 7
        str (str): content

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bytefield-1.0.2.tar.gz (17.1 kB view details)

Uploaded Source

File details

Details for the file bytefield-1.0.2.tar.gz.

File metadata

  • Download URL: bytefield-1.0.2.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for bytefield-1.0.2.tar.gz
Algorithm Hash digest
SHA256 eb5f8c8d51de54d8764041531521e072b4e6208acd3504f1f2260c0d6377599e
MD5 ee9c00dd76834f374e5e68b48f52e536
BLAKE2b-256 11260df7b59f208d76f610589b634c2255555862d37649bb788af856360954c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page