Skip to main content

A library for creating and interpreting binary formats.

Project description

:warning: This project is pre-alpha and there are no guarantees of API stability. The documentation is sometimes more aspirational than accurate.

bitformat

CI badge Docs


bitformat is a Python module for creating, manipulating and interpreting binary data. It also supports parsing and creating more complex binary formats.

It is from the author of the widely used bitstring module.


Features :hammer_and_wrench:

  • The Bits class represents a sequence of binary data of arbitrary length. It provides methods for creating, modifying and interpreting the data.
  • The Format class provides a way to define a binary format using a simple and flexible syntax.
  • A wide array of data types is supported with no arbitrary restrictions on length.
  • Data is always stored efficiently as a contiguous array of bits.

[!NOTE] To see what been added, improved or fixed, and also to see what's coming in the next version, see the release notes.

Documentation :book:

Some Examples :bulb:

Creating some Bits

A variety of constructor methods are available to create Bits, including from binary, hexadecimal or octal strings, formatted strings, byte literals and iterables.

>>> from bitformat import *

>>> a = Bits('0b1010')  # Create from a binary string
>>> b = Bits('u12 = 54')  # Create from a formatted string.
>>> c = Bits.from_bytes(b'\x01\x02\x03')  # Create from a bytes or bytearray object.
>>> d = Bits.pack('f16', -0.75)  # Pack a value into a data type.
>>> e = Bits.join([a, b, c, d])  # The best way to join lots of bits together.

Interpreting those Bits

Although the examples above were created from a variety of data types, the Bits instance doesn't retain any knowledge of how it was created - it's just a sequence of bits. You can therefore interpret them however you'd like:

>>> a.i
-6
>>> b.hex
'036'
>>> c.unpack(['u4', 'f16', 'u4'])
[0, 0.0005035400390625, 3]
>>> d.bytes
b'\xba\x00'

The unpack method is available as a general-case way to unpack the bits into a single or multiple data types. If you only want to unpack to a single data type you can use properties of the Bits as a short-cut.

Data types

A wide range of data types are supported. These are essentially descriptions on how binary data can be converted to a useful value. The Dtype class is used to define these, but usually just the string representation can be used.

Some example data type strings are:

  • 'u3' - a 3 bit unsigned integer.
  • 'i_le32' - a 32 bit little-endian signed integer.
  • 'f64' - a 64 bit IEEE float. Lengths of 16, 32 and 64 are supported.
  • 'bool' - a single bit boolean value.
  • 'bytes10' - a 10 byte sequence.
  • 'hex' - a hexadecimal string.
  • 'bin' - a binary string.
  • '[u8; 40]' - an array of 40 unsigned 8 bit integers.

Byte endianness for floating point and integer data types is specified with _le, _be and _ne suffixes to the base type.

Bit operations

An extensive set of operations are available to query Bits or to create new ones. For example:

>>> a + b  # Concatenation
Bits('0xa036')
>>> c.find('0b11')  # Returns found bit position
22
>>> b.replace('0b1', '0xfe')
Bits('0x03fbf9fdfc')
>>> b[0:10] | d[2:12]  # Slicing and logical operators
Bits('0b1110101101')

Arrays

An Array class is provided which stores a contiguous sequence of Bits of the same data type. This is similar to the array type in the standard module of the same name, but it's not restricted to just a dozen or so types.

>>> r = Array('i5', [4, -3, 0, 1, -5, 15])  # An array of 5 bit signed ints
>>> r -= 2  # Operates on each element
>>> r.unpack()
[2, -5, -2, -1, -7, 13]
>>> r.dtype = 'u6'  # You can freely change the data type
>>> r
Array('u6', [5, 47, 55, 60, 45])
>>> r.to_bits()
Bits('0b000101101111110111111100101101')

A Format example

The Format class can be used to give structure to bits, as well as storing the data in a human-readable form.

>>> f = Format('[width: u12, height: u12, flags: [bool; 4]]')
>>> f.pack([320, 240, [True, False, True, False]])
Bits('0x1400f0a')
>>> print(f)
[
    width: u12 = 320,
    height: u12 = 240,
    flags: [bool; 4] = (True, False, True, False)
]
>>> f['height'].value /= 2
>>> f.to_bits()
Bits('0x140078a')
>>> f.to_bits() == 'u12=320, u12=120, 0b1010'
True

The Format and its fields can optionally have names (the Format above is unnamed, but its fields are named). In this example the pack method was used with appropriate values, which then returned a Bits object. The Format now contains all the interpreted values, which can be easily accessed and modified.

The final line in the example above demonstrates how new Bits objects can be created when needed by promoting other types, in this case the formatted string is promoted to a Bits object before the comparison is made.

The Format can be used symmetrically to both create and parse binary data:

>>> f.parse(b'x\x048\x10')
28
>>> f
Format([
    'width: u12 = 1920',
    'height: u12 = 1080',
    'flags: [bool; 4] = (False, False, False, True)'
])

The parse method is able to lazily parse the input bytes, and simply returns the number of bits that were consumed. The actual values of the individual fields aren't calculated until they are needed, which allows large and complex file formats to be efficiently dealt with.

More to come :construction:

The bitformat library is still pre-alpha and is being actively developed. I'm hoping to make an alpha release or two in late 2024, with more features added in 2025.

There are a number of important features planned, some of which are from the bitstring library on which much of the core is based, and others are needed for a full binary format experience.

The (unordered) :todo: list includes:

  • Streaming methods. There is no concept of a bit position, or of reading through a Bits. This is available in bitstring, but I want to find a better way of doing it before adding it to bitformat.
  • Field expressions. Rather than hard coding everything in a field, some parts will be calculated during the parsing process. For example in the format '[w: u16, h: u16, [u8; {w * h}]]' the size of the 'u8' array would depend on the values parsed just before it.
  • New field types. Fields like Repeat, Find and If are planned which will allow more flexible formats to be written.
  • Exotic floating point types. In bitstring there are a number of extra floating point types such as bfloat and the MXFP 8, 6 and 4-bit variants. These will be ported over to bitformat.
  • Performance improvements. A primary focus on the design of bitformat is that it should be fast. Early versions won't be well optimized, but tests so far are quite promising, and the design philosophy should mean that it can be made even more performant later.
  • LSB0. Currenlty all bit positions are done with the most significant bit being bit zero (MSB0). I plan to add support for least significant bit zero (LSB0) bit numbering as well.

Copyright (c) 2024 Scott Griffiths

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bitformat-0.1.0.tar.gz (74.0 kB view details)

Uploaded Source

Built Distribution

bitformat-0.1.0-py3-none-any.whl (48.9 kB view details)

Uploaded Python 3

File details

Details for the file bitformat-0.1.0.tar.gz.

File metadata

  • Download URL: bitformat-0.1.0.tar.gz
  • Upload date:
  • Size: 74.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for bitformat-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c74a9244e40ebb03b01a22fcd97dc211f4c67babe37149b85eeaeece16d8af40
MD5 e27d96f5b4ec02dc589ebb986ece54f1
BLAKE2b-256 23ee8bc31eca4838b47603d8739661d1ee38912549a01804a9212d36f78fd554

See more details on using hashes here.

File details

Details for the file bitformat-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bitformat-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 48.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for bitformat-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f01ce3d994e15db96833beb1352815cd70bebd28f5a8ab6f17778ce3052c4626
MD5 bd0b4c57ee2009f0d39724cfaeb7dc88
BLAKE2b-256 74f1533b71d7c6c303c973ec02dd2bfee5119ab83fced5367d2038fe67e7a3e2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page