Skip to main content

Python subprocess wrapper around the Rust `stringsext` crate.

Project description

stringsext

A Python wrapper for the stringsext command-line tool, providing a convenient interface for extracting strings from binary files.

Installation

  1. First, ensure you have the stringsext command-line tool installed on your system. You can find installation instructions here.

  2. Install the Python library using pip:

pip install stringsext

Basic Usage

Here's a simple example of how to use the stringsext library:

from pathlib import Path
from stringsext.core import Stringsext
from stringsext.encoding import EncodingName

# Create a Stringsext instance
extractor = Stringsext()

# Configure the extraction
results = (
    extractor.encoding(EncodingName.UTF_8, chars_min=4)
    .add_file(Path("example.bin"))
    .run()
)

# Parse the results
findings = results.parse()

# Print the findings
for finding in findings:
    print(f"Found: {finding.content}, encoding: {finding.encoding_info.name}")

Parsing Results

The parse method is a crucial part of the stringsext library. It converts the raw output from the stringsext command-line tool into Python objects, making it easier to work with the results in your code.

After running the extraction with the run() method, you can call parse() on the results to get a list of StringFinding objects:

findings = results.parse()

Each StringFinding object contains the following information:

  • content: The extracted string
  • input_file: The path to the file from which the string was extracted
  • offset_info: Information about the string's location in the file
  • encoding_info: Information about the encoding of the string

Here's an example of how to work with the parsed results:

for finding in findings:
    print(f"Content: {finding.content}")
    print(f"File: {finding.input_file}")
    print(f"Offset: {finding.offset_info.exact}")
    print(f"Encoding: {finding.encoding_info.name}")
    print("---")

The parse_stringsext_output function handles the parsing of the raw output. It's used internally by the parse() method, but you can also use it directly if you have raw stringsext output:

from stringsext.parse import parse_stringsext_output

raw_output = "... raw stringsext output ..."
files = [Path("example.bin")]
encodings = [EncodingName.UTF_8]

findings = parse_stringsext_output(raw_output, files, encodings)

Advanced Features

Multiple Encodings

You can search for strings in multiple encodings:

extractor = Stringsext()
results = (
    extractor.encoding(EncodingName.UTF_8, chars_min=4)
    .encoding(EncodingName.UTF_16LE, chars_min=4)
    .encoding(EncodingName.UTF_16BE, chars_min=4)
    .add_file(Path("example.bin"))
    .run()
)

Unicode Block Filtering

Filter strings based on Unicode blocks:

from stringsext.encoding import UnicodeBlockFilter

extractor = Stringsext()
results = (
    extractor.encoding(EncodingName.UTF_8)
    .unicode_block_filter(UnicodeBlockFilter.ARABIC)
    .add_file(Path("example.bin"))
    .run()
)

ASCII Filtering

Apply ASCII filters to refine your search:

from stringsext.encoding import AsciiFilter

extractor = Stringsext()
results = (
    extractor.encoding(EncodingName.ASCII)
    .ascii_filter(AsciiFilter.PRINTABLE)
    .add_file(Path("example.bin"))
    .run()
)

Multiple Files

Search through multiple files in one go:

extractor = Stringsext()
results = (
    extractor.encoding(EncodingName.UTF_8)
    .add_file(Path("file1.bin"))
    .add_file(Path("file2.bin"))
    .run()
)

Example: Extracting UUIDs

Here's an example of how to use stringsext to extract UUIDs from a binary file:

import re
from pathlib import Path
from stringsext.core import Stringsext
from stringsext.encoding import EncodingName

UUID_PATTERN = r"[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}"

def extract_uuids(content: str) -> list[str]:
    return list(set(re.findall(UUID_PATTERN, content)))

extractor = Stringsext()
findings = (
    extractor.encoding(EncodingName.UTF_8, chars_min=36)
    .encoding(EncodingName.UTF_16LE, chars_min=36)
    .encoding(EncodingName.UTF_16BE, chars_min=36)
    .add_file(Path("example.bin"))
    .run()
    .parse()
)

for finding in findings:
    uuids = extract_uuids(finding.content)
    for uuid in uuids:
        print(f"Found UUID: {uuid}, encoding: {finding.encoding_info.name}")

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stringsext-0.1.1.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

stringsext-0.1.1-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file stringsext-0.1.1.tar.gz.

File metadata

  • Download URL: stringsext-0.1.1.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for stringsext-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e8cb3e3a657debd8a3f33f47162134692d6e5ee8a001b02d2837014f38c1302f
MD5 8910f28541d08d32f2a4dc7dff6fee9a
BLAKE2b-256 07e3644acf7a0d4778af25792af90daa615ec9f912fff3cbcf6bbdc3db156812

See more details on using hashes here.

File details

Details for the file stringsext-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: stringsext-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for stringsext-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 958b586145eb6e834e6f6c17802ef03b0843eeb6380318b5cd5b108cf13c46e5
MD5 4514ced5feca850fdc0a10da92b88360
BLAKE2b-256 f8fb2cfb764877e928d0f9b8e091f681a992e35ed8efbf77a318e1b1c5a890ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page