A framework for analyzing and manipulating binary data
Project description
Binary Abstraction Layer (BAL)
The Binary Abstraction Layer (BAL) package is a tiny framework for analyzing and manipulating binary data. Its guiding principle is that a tree is a natural representation for binary data. For example a firmware may look as follow:
- Zip Data
- ELF
- Header
- Code
- Data
- Images
- Config
- ELF
It defines 3 broad categories of operations on the tree: convert, analyze and modify.
- Converters handle serializing and deserializing binary data.
- Analyzers handle extracting information from the tree representation.
- Modifiers handle arbitrary modification of the binary.
Installation
The BAL package can be installed from PyPi with the following command:
pip install bal
To install the BAL module from the repository, clone the repo and run:
pip install .
To install the BAL package and generate a local copy of its documentation, run:
pip install .[docs]
make html-docs
To install the core BAL module as well as dependencies for the example, run:
pip install .[examples]
Concepts
Each node in the tree is represented as a DataObject
.
A DataObject
can wrap either an unstructured string of raw binary data or a DataModel
(or both).
A DataModel
is an abstract class defining some sort of structured data.
The DataModel
is created when deserializing raw binary data.
It fits the typical definition of data model.
In addition, the BAL framework defines a few interfaces:
bal.context_ioc.AbstractConverter
A converter takes care of unpacking bytes into aDataModel
(i.e. deserializing) and packing itsDataModel
into bytes (i.e. serializing). Its method signatures are inflexible so that they may be called directly by theDataObject
.bal.context_ioc.AbstractModifier
A modifier updates the content of any node within the tree. It may modify the packed or unpacked data. It contains a singlemodify()
method with an undefined signature. It may walk the entire tree, unpacking on the way.bal.context_ioc.AbstractAnalyzer
An analyzer extracts data from a tree. The type of the returned data is defined by the concrete analyzer implementation. It contains a singleanalyze()
method with an undefined signature. It may walk the entire tree, unpacking on the way.bal.context_ioc.BALIocContext
The IoC context provides a simple implementation of the Inversion of Control pattern. It looks up the implementation of a given interface and returns a new instance. It is used to instantiate anAbstractConverter
,AbstractModifier
orAbstractAnalyzer
.- For
AbstractModifier
s andAbstractAnalyzer
s, an interface extending theAbstractModifier
orAbstractAnalyzer
is supplied and an implementation of the interface is returned. - For
AbstractConverters
, an interface extendingDataModel
is supplied and anAbstractConverter
implementation will be returned. This implementation'spack()
method will create an instance of the suppliedDataModel
interface, and itsunpack()
method will serialize an instance of the suppliedDataModel
interface.
- For
bal.context_ioc.BALIoCContextFactory
Creates a configured instance of theBALIoCContext
. It provides methods for the user to register the implementation of interfaces.bal.context.BALContext
A new context is created for each tree. It inherits from theBALIoCContext
. The context contains a reference to the rootDataObject
. It may be used as a cache for analyzers that are either expensive or frequently called. It may also be used to store data that does not fit cleanly into a tree (for example relationships between unrelated nodes).bal.context.BALContextFactory
As implied, it is responsible for creating aBALContext
. The factory is a good place to load external configuration that will be passed to the context. In most settings, the factory would be created when the application starts and destroyed when it dies.bal.context.BALManager
The BAL manager offers a way to look up factories using a key. It is not strictly necessary, and should only be used in applications that need to dynamically retrieve multiple different context factories
The full documentation for the API is available on github.io
Guide
All the code for this guide is contained in the ./example folder.
The first step is to declare a new DataModel
class that defines the data structure for the root
node and its children.
For example, a Xilinx bitstream has 3 children: the header, a sync marker and the config packets.
The format of the header is not known, the sync marker does not have a format and the packets
are an array of unknown data.
class XilinxPacketsInterface(DataModel):
"""
An array of Xilinx register configuration packets.
"""
class XilinxBitstreamHeaderInterface(DataModel):
"""
The Xilinx bitstream header contains unknown information.
"""
class XilinxBitstreamSyncMarkerInterface(DataModel):
"""
The Xilinx bitstream sync marker
"""
class XilinxBitstream(ClassModel[DataObject]):
"""
The root model for a Xilinx bitstream. It contains data objects for a header, sync marker, and packets.
"""
def __init__(
self,
header,
sync_marker,
packets
):
"""
:param DataObject[XilinxBitstreamHeaderInterface] header:
:param DataObject[XilinxBitstreamSyncMarker] sync_marker:
:param DataObject[XilinxPackets] packets:
"""
super(XilinxBitstream, self).__init__((
("header", self.get_header),
("sync_marker", self.get_sync_marker),
("packets", self.get_packets),
))
self.header = header
self.sync_marker = sync_marker
self.packets = packets
def get_header(self):
return self.header
def get_sync_marker(self):
return self.sync_marker
def get_packets(self):
return self.packets
It is important to notice that even though the structure of the children is unknown, an interface is still created for them. As we will see later, it allows an external developer to later define their format as well as their converters.
Now that we have the models, we are ready to create the root converter:
class XilinxBitstreamConverter(AbstractConverter):
"""
Converter for a Xilinx FPGA bitstream
:param BALContext context: The BAL context.
"""
def __init__(self, context):
super(XilinxBitstreamConverter, self).__init__(context)
self.context = context
def unpack(self, data_bytes):
sync_marker = self.context.format.sync_word
sync_marker_index = data_bytes.find(sync_marker)
assert sync_marker_index >= 0, \
"The sync marker is not present in the provided bitstream data"
assert sync_marker_index + len(sync_marker) < len(data_bytes) - 2, \
"The configuration data is expected to contain at least one word size worth of data"
return XilinxBitstream(
DataObject.create_packed(
self.context,
data_bytes[:sync_marker_index],
XilinxBitstreamHeaderInterface
),
DataObject.create_packed(
self.context,
data_bytes[sync_marker_index:sync_marker_index+len(sync_marker)],
XilinxBitstreamSyncMarkerInterface,
),
DataObject.create_packed(
self.context,
data_bytes[sync_marker_index + len(sync_marker):],
XilinxPacketsInterface,
)
)
def pack(self, data_model):
"""
:param XilinxBitstream data_model:
:rtype: bytes
"""
assert isinstance(data_model, XilinxBitstream)
return b"".join([
data_model.get_header().pack(),
self.context.format.sync_word,
data_model.get_packets().pack()
])
This is already getting a bit more complicated.
The converter takes a BALContext
as an argument which implies that a converter instance must be
dedicated to a specific bitstream.
The unpack()
method does not instantiate any of its children DataModel
, it only creates a
DataObject
that wraps the packed data for that model.
It provides the DataObject
with the interface of the wrapped data model.
The DataObject
uses the interface to extract basic information about the packed data (i.e. type
and description from the interface name and its docstring).
It uses the interface when it is unpacked as well, looking up a converter implementation for that
interface inside the BALContext
(remember that it inherits from the BALIoCContext
).
This is an important property as it allows the tree to be "lazily" unpacked.
The user controls exactly when a given child is unpacked (if it gets unpacked at all) which can
lead to significantly better performances in many use cases.
Last but not least, we need a BALContext
and BALFactoryContext
implementation:
class XilinxContext(BALContext):
"""
:param Dict[Type[DataModel],Type[AbstractConverter]] converters_by_type:
:param Dict[Type[AnalyzerInterface],Type[AbstractAnalyzer]] analyzers_by_type:
:param Dict[Type[ModifierInterface],Type[AbstractModifier]] modifiers_by_type:
:param bytes bytes: The bytes making up the bitstream.
"""
def __init__(
self,
converters_by_type,
analyzers_by_type,
modifiers_by_type,
bytes
):
super(XilinxContext, self).__init__(
converters_by_type,
analyzers_by_type,
modifiers_by_type
)
self._bitstream = DataObject.create_packed(self, bytes, XilinxBitstream)
def get_data(self):
"""
:rtype: DataObject[XilinxBitstream]
"""
return self._bitstream
class XilinxContextFactory(BALContextFactory):
def __init__(self):
super(XilinxContextFactory, self).__init__()
def create(self, data):
"""
:param bytes bytes: The bytes for the Xilinx FPGA bitstream
:rtype: XilinxContext
"""
return XilinxContext(
self._converters_by_type,
self._analyzers_by_type,
self._modifiers_by_type,
data
)
Since our Xilinx implementation is pretty limited, both the context and its factory are trivial.
Let's see our implementation in action:
import wget
context_factory = XilinxContextFactory()
# Register the XilinxBitsreamConverter
context_factory.register_converter(XilinxBitstream, XilinxBitstreamConverter)
lx9_bin = wget.download('https://redballoonsecurity.com/files/JwfEU4veQSNFao8h/lx9.bin')
with open(lx9_bin, "rb") as f:
data = f.read()
context = context_factory.create(data)
bitstream_object = context.get_data()
print("Bitstream object: {}".format(bitstream_object))
print("Bitstream model type: {}".format(bitstream_object.get_model_type()))
print("Bitstream model description: {}".format(bitstream_object.get_model_description()))
print("\nUNPACKING\n")
bitstream_object.unpack()
print("Bitstream object: {}".format(bitstream_object))
print("\nHEADER\n")
header_object = bitstream_object.get_model().get_header()
print("Bitstream header object: {}".format(header_object))
print("Bitstream header model type: {}".format(header_object.get_model_type()))
print("Bitstream header model description: {}".format(header_object.get_model_description()))
This script should print:
Bitstream object: PackedXilinxBitstream(340604)
Bitstream model type: XilinxBitstream
Bitstream model description: The root model for a Xilinx bitstream. It contains a header and packets data objects.
UNPACKING
Bitstream object: XilinxBitstream({
header: PackedXilinxBitstreamHeaderInterface(16),
sync_marker: PackedXilinxBitstreamSyncMarkerInterface(4),
packets: PackedXilinxPacketsInterface(340584),
})
HEADER
Bitstream header object: PackedXilinxBitstreamHeaderInterface(16)
Bitstream header model type: XilinxBitstreamHeaderInterface
Bitstream header model description: The Xilinx bitstream header contains unknown information.
As you can see from the output, the BAL framework already has a bunch of information about the structure of the bitstream. It uses the docstring defined on the interfaces to pull a description of the data models, even if they cannot be unpacked yet.
This is it for this guide. Your next steps might be to implement the XilinxPacketsInterface, XilinxBitstreamHeaderInterface, and XilinxBitstreamSyncMarkerInterface interfaces and implement their respective converters. If you want to learn more about writing a full chain of converters, analyzers and modifiers, head over to the bal-xilinx project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bal-0.1.tar.gz
.
File metadata
- Download URL: bal-0.1.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef2f265f7572bf1af0bfcbddb16dfa4a5927b1819ed2419c860de55d571251ab |
|
MD5 | 9424ccca358db6fa2b58236615cc90e8 |
|
BLAKE2b-256 | 4a5c5d0c4d43150a15b755bcbed3dce3254a7f8d38f9ff8c764fa00c80279b7f |
File details
Details for the file bal-0.1-py3-none-any.whl
.
File metadata
- Download URL: bal-0.1-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.33.0 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a830bef47df8459cb5e8411c38ccc814d3ff66403f8848caa32d41e9fad119a0 |
|
MD5 | 3acbafaff4ccabfe9859ea7efeff0412 |
|
BLAKE2b-256 | 3ff6a006f378319cebc6b4bb2d506e152e1b1a794edbe0b4db425623805b8eee |