Parse binary files by describing their structure.
Project description
Parse Binary File
Install with
python -m pip install parse_binary_file
Parse binary files by describing their structure.
Description file
The parse_binary_file
package is based on the idea that binary file formats
should be parsed from a description of their format.
If the description file is a list, pbf
assumes that the elemnts are fields.
One can also pass a dictionary, though, to describe more details about the file
format.
There are three top-level items that can be specified:
info
: Describes information about the file as a whole (e.g. endianess).default_options
: Describes defaults options for each field type.fields
: Fields of the file format.
Info
Describes information about the file.
Properties
-
byte_order
: Byte order of the file. Valid values: ['little', 'big', 'native']native
byte ordering uses the byte ordering of the current machine. [Default: 'native'] -
encoding
: File encoding. Used to encode the description file's values to bytes when matching against the file being parsed. See Python's standard encodings for valid values. [Default: 'utf-8'] -
word_size
: Bytes per word of the file. [Default: 4 bytes] [Inactive]
Default Options
Describes parsing information for each type of field. All types can specify default values for a particular field property (discussed below). To specify a default field property first specify the type of the field, then the property.
e.g.
To specify strings are null terminated (\x00
) by default
default_options:
str:
terminator: "\x00"
Fields
Describes the fields of the file format.
Each field must one and only one termination condition.
Termination conditions are determined by the size
, value
, and terminator
fields.
for some types these can be interpreted. For instance, an int
is assumed to
have a size of 4, so a termination condition does not need to be explicitly stated.
All fields can have the properties:
type
: Data type of the field. If a type is enclosed in brackets ([]
) it marks the field as an array-like field. Array like fields are assumed to contain multiple instances of the specified type. For more information see the Array Fields section.size
: Sets the size of the field in bytes. This may be inferred depending on the field'stype
. A size of-1
indicated the field extends until the end of the file. Only the final field may have a length of-1
.value
: Sets the size of the fiel based on the expected value.format
: How to format the field. This is type dependent. Forstr
fields, this represents the encoding of the string and defaults to the file's encoding provided in theinfo
section. For other types this represents how Python's struct.unpack should interpret the field.terminator
: Termination string.is_null
: Used forbytes
data type to indicate all bytes should be null. Should be used in conjunction with thesize
property to indicate the termination condition.name
: Name of the field. Used for accessing the data by name.description
: Description of the field.fields
: Describes the subfields of the field. If a field is made up of subfields, its type must bebytes
or[bytes]
.exec
: Execution hooks for logical processing. [Inactive]
Type
If a field is not provided a type it defaults to bytes
.
Valid types are:
bytes
: Interpret the field as raw bytes. [Default]char
: Single character. (1 byte)str
: String data.bool
: Boolean. (1 byte)short
: Short integer. (2 bytes)u_short
: Unsigned short integer. (2 bytes)int
: Integer.(4 bytes)u_int
: Unisigned integer. (4 bytes)long
: Long integer. (1 word)u_long
: Unsigned long integer. (1 word)long_long
: Double long integer. (2 words)u_long_long
: Unsigned double long integer. (2 words)float
: Floating point number. (4 bytes)double
: Double precision floating point number. (2 words)- Any type may be wrapped in brackets
[]
to indicate the field is an array of that value type. Details of the elements are set using theelements
property.
For more information read about Python struct
's formatting strings.
Termination condition
The termination condition of a field tells the parser how far to read. There are three explicit and one implit termination condition.
Explicit
size
: Sets the size of the field in bytes. This may be inferred depending on the field'stype
.terminator
: Termination string as bytes.value
: Sets the size of the field based on the expected value.
Implicit
subfields
: [Inactive] The parent field can be terminated when the last child field is terminated.
Subfields
Each field can be made up of subfields. Each subfield follows the same pattern as top-level fields. This is a recursive concept, so fields can be nested as deeply as needed.
Fields that are made up of subfields must have their type set to bytes
or
[bytes]
.
If the type is [bytes]
it is assumed that the subfields repeat in a array-like
manner with all sufields being repeated an equal number of times.
i.e. If the data for the parent field ends before all subfields have been read
an exception is raised.
Array Fields
A field may consist of multiple elements. This is where array fields come into
play. For instance, imagine a field that consists of a list of floats. To
indicate a field is array-like, enclose its type in square brackets ('[]')
(e.g. [bytes]
, [float]
).
Execution Hooks
:warning: These fields allow arbitrary Python code to be executed.
It may be necessary to add some basic logic to your description.
For instance, if a field named data
may have a variable length offset buffer
specified by a field named data_offset
.
The exec
field allows you to specify code before (pre
) or after (post
)
data for the field has been loaded and has access to all previous fields.
Each of these fields should contain a lambda function with signature
(data, fields)
where data
is byte string of the current field and
fields
is a parse_binary_file.Data
object representing the file with
all previous fields loaded.
Components
FieldDescription
Describes a field.
Properties
-
type: The type of the field being described. When loading data into the field it will be parsed based on the type.
- [Default: 'bytes']
-
name: Name of the field. This is used to label the field.
-
value: The expected value of the field. This can be usedto indicate a specific value is expected for the field.
-
size: Number of bytes in the field. The last field can have a size of -1 which indicates the field continues to the end of the file.
-
terminator: Sequence of values indicating the end of the field.
-
is_null: Indicates the field only contains null bytes. Settings to true is equivalent to setting
value
equal to all null bytes. Must setsize
ifTrue
. -
description: Description of the field.
-
fields: Used to specifiy the structure of the elements of an array.
Methods
- FieldDescription(**properties): Initializes a new
FieldDescription
with the provided properties.
FileFormat
Container for FieldDescription
s to describe the structure of a file.
Properties
-
fields: List of
FieldDescription
s. -
info: Dictionary of information on the file.
-
Fields can be accessed by name or index using brackets (
[]
)
Methods
-
from_dicts(desc, info, defaults):
@staticmethod
Converts a list of dictionaries into aFileFormat
. -
keys(): Returns a list of the keys of named fields.
Parser
Used for parsing files in a given format.
Properties
- format: A
FileFormat
used to parse files.
Methods
- parse(stream): Returns a
Data
object representing the data fromstream
.
Field
Contains information about a field, including its description and loaded value.
Properties
-
desc:
FieldDescription
that was used to create theField
. -
value: Parsed value.
-
expected_value:
self.desc.value
. Provided for convenience. -
fields: Tuple of children
Field
s. -
size: If the
Field
's value is set, returns its size. If theField
's value has not yet been set return theField
's expected size if available. -
data: Original data as bytes.
-
value_is_valid: Whether the parsed value matches the expected value. If a specific
value
was not specified by theFieldDescription
this will always returnTrue
. -
Values of the
Field
'sFieldDescription
are accessible as properties, as well.
Methods
-
Field(desc: FieldDescription, fields: tuple[Field]): Initializes a new
Field
.desc
should describe the field, andfields
are the childrenField
s. -
from_data(data: bytes, desc: FieldDescription): Creates a new
Field
usingdesc
as its description to parsedata
. -
parse_data(val: bytes, **options): Sets the value of the
Field
based on its description and the provided value.
Data
Represents data from a file.
Properties
-
fields:
Field
s containing the parsed data. -
is_loaded: If data has been loaded yet.
-
value: Values of the parsed data.
-
named_field_values: Returns a dictionary of name-value pairs for named
Field
s. If multiple fields have the same name a tuple is returned with the values in order. -
Field
s are accessible by name and index using brackets ([]
). If multipleField
s have the same name, they are returned as a tuple in order.
Use
This library is intended to be used by describing the struture of a binary file
format in a configuration file. That file is then loaded and used to create a
Data
instance. The Data
instance can then load()
data from a file and
assort it into the correct fields.
Example
We would like to describe a new binary file format of .msg
.
Offset | Size | Name | Description |
---|---|---|---|
0x00 | 0x02 | head | 0xff 0x00 Indicates the file is a .msg file. |
0x02 | 0x04 | size | Total size of the file. |
0x06 | 0x04 | -- | Null bytes to separate header from body. |
0x0a | -- | greeting | Message greeting, null terminated. |
-- | -- | message | Message content. |
We can describe this structure in a YAML file.
msg.yaml
info:
byte_order: 'little'
fields:
- name: 'head'
value: "\xff\x00"
- type: 'int'
name: 'file_size'
description: 'Size of file in bytes'
- size: 4
is_null: true
- type: 'str'
terminator: "\x00"
name: 'greeting'
- type: 'str'
size: -1
name: 'message'
To read a .msg
file we can use the description of it.
import os
import yaml # use pyyaml to load `msg.yaml`
import parse_binary_file as pbf
# load description
with open('msg.yaml') as f:
msg_desc = yaml.safe_load(f)
# create file format
msg_format = pbf.FieldFormat.from_dicts(msg_desc['fields'], info = msg_desc['info'])
# create parser
parser = pbf.Parser(msg_format)
# create data
with open('my_message.msg', 'rb') as f:
data = parser.parse(f)
is_corrupt = (data['file_size'] != os.path.getsize('my_message.msg'))
greeting = data['greeting'].value
message = data[-1].value
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file parse_binary_file-0.0.2.tar.gz
.
File metadata
- Download URL: parse_binary_file-0.0.2.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c658d294980d4182f834c2b8dd568d0b5ee66c78405e488a5bcca352fce115a |
|
MD5 | c91cc9ac51253cc17c6de45c430e4728 |
|
BLAKE2b-256 | e2d50bc933eefe5379b17e1362e688771aa22e22c878f0ef4e60c6a8dc6e3fa6 |
File details
Details for the file parse_binary_file-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: parse_binary_file-0.0.2-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff3f116b02a38feeecbb253aa9e2479216f1ffdcb021edb0016f2aa324a3241a |
|
MD5 | 9ea06c6b0111474df167a85712e4d9e7 |
|
BLAKE2b-256 | 67fa2a98abe7ceb83f1b4d6d0c0024b17b2531ab9a242ff84cbb7415af97d44c |