Python subprocess wrapper around the Rust `stringsext` crate.
Project description
stringsext
A Python wrapper for the stringsext
command-line tool, providing a convenient interface for extracting strings from binary files.
Installation
-
First, ensure you have the
stringsext
command-line tool installed on your system. You can find installation instructions here. -
Install the Python library using pip:
pip install stringsext
Basic Usage
Here's a simple example of how to use the stringsext
library:
from pathlib import Path
from stringsext.core import Stringsext
from stringsext.encoding import EncodingName
# Create a Stringsext instance
extractor = Stringsext()
# Configure the extraction
results = (
extractor.encoding(EncodingName.UTF_8, chars_min=4)
.add_file(Path("example.bin"))
.run()
)
# Parse the results
findings = results.parse()
# Print the findings
for finding in findings:
print(f"Found: {finding.content}, encoding: {finding.encoding_info.name}")
Parsing Results
The parse
method is a crucial part of the stringsext
library. It converts the raw output from the stringsext
command-line tool into Python objects, making it easier to work with the results in your code.
After running the extraction with the run()
method, you can call parse()
on the results to get a list of StringFinding
objects:
findings = results.parse()
Each StringFinding
object contains the following information:
content
: The extracted stringinput_file
: The path to the file from which the string was extractedoffset_info
: Information about the string's location in the fileencoding_info
: Information about the encoding of the string
Here's an example of how to work with the parsed results:
for finding in findings:
print(f"Content: {finding.content}")
print(f"File: {finding.input_file}")
print(f"Offset: {finding.offset_info.exact}")
print(f"Encoding: {finding.encoding_info.name}")
print("---")
The parse_stringsext_output
function handles the parsing of the raw output. It's used internally by the parse()
method, but you can also use it directly if you have raw stringsext
output:
from stringsext.parse import parse_stringsext_output
raw_output = "... raw stringsext output ..."
files = [Path("example.bin")]
encodings = [EncodingName.UTF_8]
findings = parse_stringsext_output(raw_output, files, encodings)
Advanced Features
Multiple Encodings
You can search for strings in multiple encodings:
extractor = Stringsext()
results = (
extractor.encoding(EncodingName.UTF_8, chars_min=4)
.encoding(EncodingName.UTF_16LE, chars_min=4)
.encoding(EncodingName.UTF_16BE, chars_min=4)
.add_file(Path("example.bin"))
.run()
)
Unicode Block Filtering
Filter strings based on Unicode blocks:
from stringsext.encoding import UnicodeBlockFilter
extractor = Stringsext()
results = (
extractor.encoding(EncodingName.UTF_8)
.unicode_block_filter(UnicodeBlockFilter.ARABIC)
.add_file(Path("example.bin"))
.run()
)
ASCII Filtering
Apply ASCII filters to refine your search:
from stringsext.encoding import AsciiFilter
extractor = Stringsext()
results = (
extractor.encoding(EncodingName.ASCII)
.ascii_filter(AsciiFilter.PRINTABLE)
.add_file(Path("example.bin"))
.run()
)
Multiple Files
Search through multiple files in one go:
extractor = Stringsext()
results = (
extractor.encoding(EncodingName.UTF_8)
.add_file(Path("file1.bin"))
.add_file(Path("file2.bin"))
.run()
)
Example: Extracting UUIDs
Here's an example of how to use stringsext
to extract UUIDs from a binary file:
import re
from pathlib import Path
from stringsext.core import Stringsext
from stringsext.encoding import EncodingName
UUID_PATTERN = r"[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}"
def extract_uuids(content: str) -> list[str]:
return list(set(re.findall(UUID_PATTERN, content)))
extractor = Stringsext()
findings = (
extractor.encoding(EncodingName.UTF_8, chars_min=36)
.encoding(EncodingName.UTF_16LE, chars_min=36)
.encoding(EncodingName.UTF_16BE, chars_min=36)
.add_file(Path("example.bin"))
.run()
.parse()
)
for finding in findings:
uuids = extract_uuids(finding.content)
for uuid in uuids:
print(f"Found UUID: {uuid}, encoding: {finding.encoding_info.name}")
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file stringsext-0.1.1.tar.gz
.
File metadata
- Download URL: stringsext-0.1.1.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8cb3e3a657debd8a3f33f47162134692d6e5ee8a001b02d2837014f38c1302f |
|
MD5 | 8910f28541d08d32f2a4dc7dff6fee9a |
|
BLAKE2b-256 | 07e3644acf7a0d4778af25792af90daa615ec9f912fff3cbcf6bbdc3db156812 |
File details
Details for the file stringsext-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: stringsext-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 958b586145eb6e834e6f6c17802ef03b0843eeb6380318b5cd5b108cf13c46e5 |
|
MD5 | 4514ced5feca850fdc0a10da92b88360 |
|
BLAKE2b-256 | f8fb2cfb764877e928d0f9b8e091f681a992e35ed8efbf77a318e1b1c5a890ff |