Skip to main content

A single-file, pure-python package to deal with binary packed data

Project description

rawutil

A single-file pure-python module to deal with binary packed data

Rawutil documentation

rawutil is a python3 module to read and write binary packed data

There is two ways to use it:

  • Like struct, with string structures

  • With the TypeReader and TypeWriter objects

1-String structures

rawutil can be used like struct, with structures stored as strings. rawutil is almost fully compatible with struct. If in a program, you can replace all instances of “struct” by “rawutil”, it should work exactly same (see below for differences).

rawutil has the same 3 main functions as struct:

pack(stct, *data) -> bytes
        Packs the elements in a bytes object as described by the stucture specified by the stct argument

pack(stct, *data, file) -> None
        Packs the elements in the given file-like object as described by the stucture specified by the stct argument

unpack(stct, data, refdata=())
        Unpacks the binary data given as a bytes object as described by the structure in the stct argument, and returns elements as a list
        data can also be a file-like object. In this case, unpacking will start at the beginning of the file (it performs a file.seek(0))
        The refdata option is a tuple which contains the data used by the external references, see below.
        Note that unlike its struct equivalent, it won't raise an exception if the data length doesn't match the structure length.

unpack_from(stct, data, offset=0, refdata=(), getptr=False)
        Unpacks the data as described by the stct argument from the specified offset, and returns elements as a list
        data can also be a file-like object. In this case, unpacking will start at the specified location (performs file.seek(offset))
        The refdata argument is used for external references, see below
        If getptr is True, this function returns *unpacked, ptr* instead of only *unpacked*. The pointer is the offset where the unpacking has ended

rawutil structures can match variable lengths, so there is not any calcsize function.

String structures reference

The structure is a str object.

It can begin by a character to specify the byte order, exactly like struct:

Chr.

Effect

=

Uses the system byte order and alignment

@

Uses the system byte order without alignment

!

Network byte order (same as >)

>

Big endian

<

Little endian

When there is no byte order mark, the byte order defaults to @

Then, the format string really begins. Note that rawutil stuctures can contain as many spaces as you want to improve readability.

Elements

First, all elements usable in struct can be used with rawutil:

Chr.

Type

Description

c

char

Returns a 1-byte bytes object

b

int8

Signed 8-bits (1 byte) integer

B

uint8

Unsigned 8-bits integer

?

bool

Returns a boolean from a byte (False if 0, else True)

h

int16

Signed 16-bits (2 bytes) integer

H

uint16

Unsigned 16-bits integer

i

int32

Signed 32-bits (4 bytes) integer

I

uint32

Unsigned 32-bits integer

l

int32

Signed 32-bits (4 bytes) integer

L

uint32

Unsigned 32-bits integer

q

int64

Signed 64-bits (8 bytes) integer

Q

uint64

Unsigned 64-bits integer

f

float

32-bits float

d

double

64-bits double

s

string

Returns a bytes object

x

void

Padding byte: doesn’t return anything

Note that s should be used with a length: “12s” will return a 12-bytes bytes object, unlike “12c” which returns 12 1-bytes bytes objects. Note also that the P and N are not available, and n is not used as an ssize_t like in struct

There is also new format characters introduced in rawutil:

Chr.

Type

Description

u

int24

Signed 24-bits (3 bytes) integer

U

uint24

Unsigned 24-bits integer

n

string

Null-terminated string

a

Alignment: aligns to a multiple of the specified number

X

hex

Works like s but returns the bytes as an hexadecimal string

$

bytes

Goes to the end

The “n” element returns a bytes object. The string is read from the current pointer position, until a null byte (0x00) is found. The null byte is not included in the returned string. When packing, it packs a bytes object, and adds a null byte at the end.

The “a” element performs an aligment. It should be used like “20a”: the number represents the alignment. At unpacking, it places the pointer at the next multiple of the alignment. It doesn’t return anything. At packing, it will add null bytes until a multiple of the aligment length is reached (skip it in the data arguments). You can set the alignment reference with a “|”, the alignment will then be calculated with regard to the last read “|” character.

The “$” element represents the end. At unpacking, it returns all the remaining unread data as a bytes object, and ends the reading (it places the pointer at the data’s end). At packing, it appends the corresponding bytes object in the data arguments at the end of the packed bytes, and ends the packing.

Then, rawutil adds groups and iterators.

These elements can group other elements and unpack them several times

The () element represents a group. It should be used like that:

“4s I2H (2B 2H) 20a”

All elements between the brackets will be unpacked as a substructure, in a list. Here, it can return for example:

[b’test’, 10000, 326, 1919, [11, 19, 112, 1222] , b’x00x00’]

When packing, all data packed in the group should be in a list, like that.

It is also possible to give an amount of times to unpack the group. The instances of that group will then all be in the same list :

“2(2I)” “2(n)”

May give :

[1, 2, 3, 4] [b’foo’, b’bar’]

Then, the [] element is an iterator. It should be used like that:

“h 4[2B]”

It will read the substructure as many times as precised before the [. It will returns a list of lists, like this:

[-1234, [[11, 12], [111, 112], [9, 99], [31, 112]]]

Finally, the {} iterator will iterate until the end of data is reached (so don’t precise the iterations count). Like [], it returns a list of lists. For examples, this structure:

‘4s {Bn}’

With this data:

b’TESTx01Yesx00x02Nox00’

Returns:

[b’TEST’, [[1, b’Yes’], [2, b’No’]]]

Finally, rawutil includes references

There is two different types of references: external and internal references.

The external references are represented with ‘#’. They are replaced by the corresponding element in the refdata argument. For example, with this call:

data = b'<some bytes>!'
rawutil.unpack('#0c #1s #2c', data, refdata=(1, len(data) - 3, 2))

“#0” is replaced by 1, “#1” by (len(data) - 3), here 10, and “#2” by 2: the final structure is ‘1c 10s 2c’ so it will return:

[b’<’, b’some bytes’, b’>’, b’!’]

Then, the internal references. They are represented by a “/”, and should be used like this:

‘4s 2B /2[2s]’

The number near the “/” is the index of the reference. The reference will be replaced by the unpacked element at the specified index, here the second “B”, so with this data:

b’TESTxffx06aaBBccDDeeFF’

It will return:

[b’TEST’, 255, 6, [[b’aa’], [b’BB’], [b’cc’], [b’DD’], [b’ee’], [b’FF’]]]

Here, the element 2 of the unpacked elements contains 6, so the “/2” is replaced by “6”, so it is interpreted as ‘4s 2B 6[2s]’, so [2s] is unpacked as many times as specified by the element 2. You can also do the same with a group to flatten all those instances in the same list : with that same data, the structure :

‘4s 2B /2(2s)’

Will return :

[b’TEST’, 255, 6, [b’aa’, b’BB’, b’cc’, b’DD’, b’ee’, b’FF’]]

Note that references are local to the group they are in.

Internal references can also be relative, with ‘/p’. You can use for example this structure:

‘2B /p2[2s]’

With this data:

b’x04xffJJkkLLmm’

It will return:

[4, 255, [[b’JJ’, b’kk’, b’LL’, b’mm’]]]

So the “/p2” will be replaced by the element situated 2 elements before, here, the first B, so 4

Objects

You can also use rawutil with objects TypeReader and TypeWriter.

TypeReader(byteorder=’@’) TypeWriter(byteorder=’@’)

The byteorder argument is the used byteorder mark, exactly like the format strings’ one. You can also specify it using the byteorder attribute of these objects.

You can easily subclass it to create a reader or writer class for the format you want.

These two objects have the pack, unpack and unpack_from methods, which are exactly the sames as the module-level ones, but if the byte order is not precised in the structure, it defaults to the byteorder attribute instead of “@”.

First, the TypeReader object can read elements from a bytes-like or file-like object. It has the following methods:

bit(n, bit, length=1)

Returns the specified bits in the n integer. Returns (length) bits

nibbles(n)

Returns the high and low nibbles of a byte

signed_nibbles(n)

Returns the high and low signed nibbles of a byte

All its other methods takes 2 arguments:

TypeReader.uint8(data, ptr=0)

ptr is the offset to start reading. If None, reading starts at the current file position (given by file.tell()), or at 0 if data is a bytes-like object. All its other methods returns (unpacked, ptr), where unpacked is the unpacked elements, and ptr is the offset where the reading ended.

The TypeReader objects have the following methods:

uint8(data, ptr=None)
uint16(data, ptr=None)
uint24(data, ptr=None)
uint32(data, ptr=None)
uint64(data, ptr=None)
int8(data, ptr=None)
int16(data, ptr=None)
int24(data, ptr=None)
int32(data, ptr=None)
int64(data, ptr=None)
float32(data, ptr=None) = float(...)
double(data, ptr=None)  #64 bits double
string(data, ptr=None)  #null-terminated string, like the "n" format character
utf16string(data, ptr=None)  #null-terminated UTF-16 string

Then, the TypeWriter object can pack some elements. It has the following methods: (data argument is the element to pack, out can be the output file-like objects):

nibbles(high, low)  #returns the byte formed by the two nibbles
signed_nibbles(high, low)  #idem with signed nibbles
int8(data, out=None)
int16(data, out=None)
int24(data, out=None)
int32(data, out=None)
int64(data, out=None)
uint8(data, out=None)
uint16(data, out=None)
uint24(data, out=None)
uint32(data, out=None)
uint64(data, out=None)
float32(data, out=None) = float(...)
double(data, out=None)  #64 bits double
string(data, align=0, out=None)  #align is the minimal size to pack. Packs a bytes object as a null-terminated string
utf16string(data, align=0, out=None)
pad(num)  #Returns the given number of null bytes
align(data, alignnment)  #Returns null bytes to fill to a multiple of the alignment

There are not any non-builtin dependencies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rawutil-2.5.0.tar.gz (11.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page