Skip to main content

Easily _read binary data in a Pythonic way

Project description

blob_reader

Main usage:

from blob_reader import Block
from dataclasses import dataclass


@dataclass
class MyObj(Block):
    _int: int = '2i'
    _txt: bytes = '10s'

    
with open('some_file.bin', 'rb') as fp:
    obj = MyObj.read(fp)

API:

Block.read     # Native reading
Block.read_be  # Big endian
Block.read_le  # Little endian

Block.write
Block.write_be
Block.write_le

# Aliases
Block.read_big_endian     = Block.read_be
Block.read_network        = Block.read_be
Block.read_little_endian  = Block.read_le
Block.write_big_endian    = Block.write_be
Block.write_network       = Block.write_be
Block.write_little_endian = Block.write_le

Struct alignment

Character Byte order Size Alignment
@ native native native
= native standard none
< little-endian standard none
\> big-endian standard none
! network (= big-endian) standard none

If the first character is not one of these, '@' is assumed.

Struct characters

Official documentation

Format C Type Python type Standard _size Notes
x pad byte no value (7)
c char bytes of length 1 1
b signed char integer 1 (1), (2)
B unsigned char integer 1 (2)
? _Bool bool 1 (1)
h short integer 2 (2)
H unsigned short integer 2 (2)
i int integer 4 (2)
I unsigned int integer 4 (2)
l long integer 4 (2)
L unsigned long integer 4 (2)
q long long integer 8 (2)
Q unsigned long long integer 8 (2)
n ssize_t integer (3)
N size_t integer (3)
e (6) float 2 (4)
f float float 4 (4)
d double float 8 (4)
s char[] bytes (9)
p char[] bytes (8)
P void* integer (5)

Notes:

  1. The '?' conversion code corresponds to the _Bool type defined by C99. If this type is not available, it is simulated using a char. In standard mode, it is always represented by one byte.

  2. When attempting to pack a non-integer using any of the integer conversion codes, if the non-integer has a __index__() method then that method is called to convert the argument to an integer before packing.

    Changed in version 3.2: Added use of the __index__() method for non-integers.

  3. The 'n' and 'N' conversion codes are only available for the native _size (selected as the default or with the '@' byte order character). For the standard _size, you can use whichever of the other integer formats fits your application.

  4. For the 'f', 'd' and 'e' conversion codes, the packed representation uses the IEEE 754 binary32, binary64 or binary16 format (for 'f', 'd' or 'e' respectively), regardless of the floating-point format used by the platform.

  5. The 'P' format character is only available for the native byte ordering (selected as the default or with the '@' byte order character). The byte order character '=' chooses to use little- or big-endian ordering based on the host system. The struct module does not interpret this as native ordering, so the 'P' format is not available.

  6. The IEEE 754 binary16 "half precision" type was introduced in the 2008 revision of the IEEE 754 standard. It has a sign bit, a 5-bit exponent and 11-bit precision (with 10 bits explicitly stored), and can represent numbers between approximately 6.1e-05 and 6.5e+04 at full precision. This type is not widely supported by C compilers: on a typical machine, an unsigned short can be used for storage, but not for math operations. See the Wikipedia page on the half-precision floating-point format for more information.

  7. When packing, 'x' inserts one NUL byte.

  8. The 'p' format character encodes a "Pascal string", meaning a short variable-length string stored in a fixed number of bytes, given by the count. The first byte stored is the length of the string, or 255, whichever is smaller. The bytes of the string follow. If the string passed in to pack() is too long (longer than the count minus 1), only the leading count-1 bytes of the string are stored. If the string is shorter than count-1, it is padded with null bytes so that exactly count bytes in all are used. Note that for unpack(), the 'p' format character consumes count bytes, but that the string returned can never contain more than 255 bytes.

  9. For the 's' format character, the count is interpreted as the length of the bytes, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string mapping to or from a single Python byte string, while '10c' means 10 separate one byte character elements (e.g., cccccccccc) mapping to or from ten different Python byte objects. (See Examples for a concrete demonstration of the difference.) If a count is not given, it defaults to 1. For packing, the string is truncated or padded with null bytes as appropriate to make it fit. For unpacking, the resulting bytes object always has exactly the specified number of bytes. As a special case, '0s' means a single, empty string (while '0c' means 0 characters).

Gotchas

  • 2s saves a byte sequence of 2 bytes (ie b'ab')
  • 2c saves a list of 2 times 1 byte (ie [b'a', b'b'])
  • 2p saves a pascal string of maximum 1 byte. As the "2" is including the 'length' byte, so b'ab' would be saved as b'\x01a' instead!

Extensions

Dynamic field sizes

from blob_reader import Block
from dataclasses import dataclass
from io import BytesIO


@dataclass
class MyObj(Block):
    _int: int = 'H'
    _txt: bytes = '{_int}s'

stream = BytesIO(b'\x02abc')
obj = MyObj.read(stream)
# obj = MyObj(_int=2, _txt=b'ab')

The fieldsize should be known before the actual usage. This is enforced for both writing and reading. In writing this doesn't matter too much, but for consistency’s sake it is enforced as well.

Exceptions

EOFError

This is raised when fewer bytes can be read than are actually needed.

ValueError

If some error happened during (un)packing, more information will be given.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blob_reader-1.0.0.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

blob_reader-1.0.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file blob_reader-1.0.0.tar.gz.

File metadata

  • Download URL: blob_reader-1.0.0.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.1 CPython/3.10.10 Windows/10

File hashes

Hashes for blob_reader-1.0.0.tar.gz
Algorithm Hash digest
SHA256 73837e959968d671e72faffe4c6e892a3b56d0bec412e46804208b4afc6535d0
MD5 aa2af699ef3a6d91ebecfb0e79983c62
BLAKE2b-256 95f50c4cf2c7a0487e0fcc63bdcf68deca33c468f43d74c77528e64957579087

See more details on using hashes here.

File details

Details for the file blob_reader-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: blob_reader-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.1 CPython/3.10.10 Windows/10

File hashes

Hashes for blob_reader-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f3abc4e0ab05c6c4ce84b4489b3cff34f6c856c3921f16d5f2e317ced3884c2f
MD5 74154df3541836cc5e63de90e6576af9
BLAKE2b-256 12295a32869f697296bdfe433f5a41e9cc2b7b772fac287ad6f2a223cbdf9769

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page