Fast string extraction from binary buffers.
Project description
binary2strings - Python module to extract strings from binary blobs
Python module to extract Ascii, Utf8, and wide strings from binary data. Supports Unicode characters. Fast wrapper around c++ compiled code. This is designed to extract strings from binary content such as compiled executables.
Supported extracting strings of formats:
- Utf8 (8-bit Unicode variable length characters)
- Wide-character strings (UCS-2 Unicode fixed 16-bit characters)
International language string extraction is supported for both Utf8 and wide-character string standards - for example Chinese simplified, Japanese, and Korean strings will be extracted.
Optionally uses a machine learning model to filter out erroneous junk strings.
Installation
Recommended installation method:
pip install binary2strings
Alternatively, download the repo and run:
python setup.py install
Documentation
Api:
import binary2strings as b2s
[(string, encoding, span, is_interesting),] =
b2s.extract_all_strings(buffer, min_chars=4, only_interesting=False)
Parameters:
- buffer: A bytes array to extract strings from. All strings within this buffer will be extracted.
- min_chars: (default 4) Minimum number of characters in a valid extracted string. Recommended minimum 4 to reduce noise.
- only_interesting: Boolean on whether only interesting strings should be returned. Interesting strings are non-gibberish strings, and a lightweight machine learning model is used for this identification. This will filter out the vast majority of junk strings, with a low risk of filtering out strings you care about.
Returns an array of tuples ordered according to the order in which they are located in the binary:
- string: The resulting string that was extracted in standard python string. All strings are converted to Utf8 here.
- encoding: "UTF8" | "WIDE_STRING". This is the encoding of the original string within the binary buffer.
- span: (start, end) tuple describing byte indices of where the string starts and ends within the buffer.
- is_interesting: Boolean describing whether the string is likely interesting. An interesting string is defined as non-gibberish. A machine learning model is used to compute this flag.
Example usages
Example usage:
import binary2strings as b2s
data = b"hello world\x00\x00a\x00b\x00c\x00d\x00\x00"
result = b2s.extract_all_strings(data, min_chars=4)
print(result)
# [
# ('hello world', 'UTF8', (0, 10), True),
# ('abcd', 'WIDE_STRING', (13, 19), False)
# ]
It also supports international languages, eg:
import binary2strings as b2s
# "hello world" in Chinese simplified
string = "\x00世界您好\x00"
data = bytes(string, 'utf-8')
result = b2s.extract_all_strings(data, min_chars=4)
print(result)
# [
# ('世界您好', 'UTF8', (1, 12), False)
# ]
Example extracting all strings from a binary file:
import binary2strings as b2s
with open("C:\\Windows\\System32\\cmd.exe", "rb") as i:
data = i.read()
for (string, type, span, is_interesting) in b2s.extract_all_strings(data):
print(f"{type}:{is_interesting}:{string}")
Example extracting only interesting strings from a binary file:
import binary2strings as b2s
with open("C:\\Windows\\System32\\cmd.exe", "rb") as i:
data = i.read()
for (string, type, span, is_interesting) in b2s.extract_all_strings(data, only_interesting=True):
print(f"{type}:{is_interesting}:{string}")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file binary2strings-0.1.13.tar.gz
.
File metadata
- Download URL: binary2strings-0.1.13.tar.gz
- Upload date:
- Size: 59.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6395fc97c4d908b36e08f5a558a79d371a843a8b308e21a0e2b489591877620 |
|
MD5 | 24960aaf7733e6180b4e4790c9afdcd8 |
|
BLAKE2b-256 | 3e276b4f5883936eba87d4e9c7177b6c413d71749ab691da43bf475c992df93a |
File details
Details for the file binary2strings-0.1.13-cp310-cp310-win_amd64.whl
.
File metadata
- Download URL: binary2strings-0.1.13-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 160.8 kB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02be02f5964726d4a001fb1a23c7feb02d71bfe9f4dbc15f899ef445a1904115 |
|
MD5 | e0923feed37253328bb0bd98ce92e9c8 |
|
BLAKE2b-256 | 45de180dc8de1be742b065f42714e0c16062b15e53588addb1452679bfd5fcc9 |