Skip to main content

High-performance text/logs parser (e.g., whois, nuclei, dns/dhcp logs) with Rust core

Project description

structly

Structly — Rust-powered parser made for massive telemetry and log workloads.

Dev CI Main CI Coverage Package version


Source Code: https://github.com/bytevader/structly


Structly is a high-performance parsing toolkit that combines a Rust core with a Pythonic API.
True to its name, Structly turns massive amounts of unstructured input into clean, structured outputs without slowing your workflows.

It is built for teams who need to sift through and parse large volumes of operational telemetry—syslog, DNS, DHCP, IPAM, firewall, routing, whois, nuclei, etc. — faster and with less memory overhead than pure-Python pipelines.

Structly’s design maximises throughput without sacrificing the clarity of Python’s API. If you need reliable, deterministic log/text parsing at scale, Structly is built to slot into your pipeline—and leave Python-only alternatives far behind.

Why Structly?

  • Native-speed extraction. Parsing logic is compiled to Rust and exposed through PyO3, giving Structly microsecond-level latency per record while staying drop-in compatible with Python workflows.
  • Inline log intelligence. Inline mode recognises key=value tokens anywhere on the line, not only at the start, so modern, densely packed logs are handled without regex backtracking.
  • Predictable memory profile. The parser works on raw byte ranges and chunked batching, preventing the transient allocations often seen in Python log frameworks.
  • Proven advantage over Python stacks. Benchmarks show Structly parsing synthetic DNS and firewall workloads ~4× faster than libraries such as pygrok, pyparsing, regex, or logparser3, while preserving full fidelity of the extracted fields.

When to Choose Structly Over Python Parsers

Scenario Why Structly Wins
Large batches (10k+ lines per file) Native code + optional Rayon parallelism keeps throughput >600k lines/s.
Dense inline logs (key=value …) Inline mode uses Aho–Corasick plus delimiter scans—no regex backtracking.
Multi-field WHOIS records Rust implementation extracts complex sections in ~0.016s vs ~0.07s for regex.
Repeated runs in pipelines parse_iter and parse_chunks stream results with predictable memory usage.
CPU-bound environments Rayon policies let you scale across cores or run single-threaded deterministically.

Installation

If you are working from this Git repository:

# Clone the repo and enter it
git clone https://github.com/bytevader/structly.git
cd structly

# Install requirements
pip install -e '.[dev]'
# or
python3 -m pip install -r requirements-dev.txt

# Build the native extension (release mode recommended)
make install-rust

# or, if you manage environments manually:
python3 -m maturin develop --release

Structly targets Python 3.9+ with the abi3 wheel and does not require a specific virtual environment layout.

Core Concepts

Configuration

from structly import StructlyConfig, FieldSpec, FieldPattern, Mode, StructlyParser

cfg = StructlyConfig.from_mapping({
    "domain": {"patterns": ["sw:Domain:"]},
    "registrar": FieldSpec(
            patterns=[
                FieldPattern.starts_with("Registrar:"),
                FieldPattern.regex(r"^\s*Registrar:\s*(?P<val>.+)$"),
                FieldPattern.regex(r"^\s*(?P<val>.+\[Tag = .+\])$"),
            ],
        ),
    "nameservers": {
        "patterns": ["sw:Name Server:"],
        "mode": Mode.all.value,
        "unique": True,
        "return": "list",
    },
})
parser = StructlyParser(cfg)

Patterns accept sw: (starts-with) and r: (regex) prefixes, returning lists or deduplicating values is built in.
You can use either just strings for patterns like this:
"sw:Domain:" - but keep in mind that the pattern string should start with sw: or r:
Or you can use FieldPattern model that is more readable:\

FieldPattern.starts_with("Registrar:"),
FieldPattern.regex(r"^\s*Registrar:\s*(?P<val>.+)$"),

Layouts: line vs inline

  • Line layout (default). Extracts values that appear immediately after the prefix at the start of a line—ideal for classic syslog, WHOIS, or structured plaintext.
  • Inline layout. Use StructlyParser(..., field_layout="inline", inline_value_delimiters=" \t,;|") to scan for tokens anywhere on the line. Choose your own delimiter set for unusual formats.

Inline mode retains regex support and deduplication logic while significantly outperforming Python regex loops.

Rayon Policies

rayon_policy controls native parallelism:

  • "never" (default): deterministic single-thread execution.
  • "always": enables Rayon for parse_many and chunked paths—best on multi-core hosts.
  • "auto": lets the runtime pick (currently equivalent to "always").

This policy is also respected by helper functions (prepare_parser, parse_text, etc).

Execution Modes

Method When to Use Notes
parse(text) Single document Returns a dict of field→value.
parse_tuple(text) Positional accesses Saves dictionary overhead when order matters.
parse_many(list[str]) Moderate batches (fits in RAM) Processes eagerly and returns a list.
parse_iter(iterable, chunk_size) Streaming pipelines Yields one record at a time (or per chunk) without retaining previous results.
parse_chunks(iterable, chunk_size) ETL batching Chunked output for bulk writes (default 512).

chunk_size must be a positive integer; invalid inputs raise immediately, keeping bugs discoverable early.

Usage

WHOIS example

from structly import StructlyConfig, FieldSpec, Mode, StructlyParser

WHOIS_SAMPLE = """\
Domain Name: EXAMPLE-CONTACT.COM
Registry Domain ID: 123456789_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.example-registrar.com
Registrar URL: https://www.example-registrar.com
Updated Date: 2024-03-11T07:12:34Z
Creation Date: 2010-06-18T13:45:21Z
Registry Expiry Date: 2030-06-18T13:45:21Z
Registrar: Example Registrar, Inc.
Registrar IANA ID: 199
Registrant Name: Example Holdings Privacy
Registrant Organization: Example Holdings
Registrant Street: 123 Example Ave
Registrant City: San Francisco
Registrant State/Province: CA
Registrant Postal Code: 94105
Registrant Country: US
Registrant Phone: +1.5555550000
Registrant Email: noc@example-holdings.com
Tech Email: tech@example-holdings.com
Name Server: NS1.EXAMPLE.NET
Name Server: NS2.EXAMPLE.NET
Name Server: NS3.EXAMPLE.NET
DNSSEC: unsigned
Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
Status: clientRenewProhibited https://icann.org/epp#clientRenewProhibited
"""

cfg = StructlyConfig.from_mapping({
    "domain": {"patterns": ["sw:Domain Name:"]},
    "registrar": {"patterns": ["sw:Registrar:"]},
    "created": {"patterns": ["sw:Creation Date:"]},
    "expiry": {"patterns": ["sw:Registry Expiry Date:"]},
    "nameservers": {
        "patterns": ["sw:Name Server:"],
        "mode": Mode.all.value,
        "unique": True,
        "return": "list",
    },
    "statuses": {
        "patterns": ["sw:Status:"] ,
        "mode": Mode.all.value,
        "unique": True,
        "return": "list",
    },
})

parser = StructlyParser(cfg)
result = parser.parse(WHOIS_SAMPLE)

print(result["domain"])
# EXAMPLE-CONTACT.COM
print(result["nameservers"])
# ['NS1.EXAMPLE.NET', 'NS2.EXAMPLE.NET', 'NS3.EXAMPLE.NET']

Method examples

from structly import StructlyConfig, StructlyParser

cfg = StructlyConfig.from_mapping({
    "ts": {"patterns": ["sw:ts="]},
    "host": {"patterns": ["sw:host="]},
    "status": {"patterns": ["sw:status="]},
})
parser = StructlyParser(
    cfg,
    field_layout="inline",
    inline_value_delimiters=" ",
)

sample_lines = [
    "ts=2025-01-01T00:00:01Z host=api.demo status=ok latency=41ms",
    "ts=2025-01-01T00:00:02Z host=web.demo status=warn latency=88ms",
]

single = parser.parse(sample_lines[0])
# {'ts': '2025-01-01T00:00:01Z', 'host': 'api.demo', 'status': 'ok'}

ordered = parser.parse_tuple(sample_lines[0])
# ('2025-01-01T00:00:01Z', 'api.demo', 'ok')

batch = parser.parse_many(sample_lines)
# [{'ts': ...}, {'ts': ...}]

streamed = list(parser.parse_iter(sample_lines, chunk_size=1))
# parsed docs yielded one at a time

chunked = list(parser.parse_chunks(sample_lines, chunk_size=2))
# [[{'ts': ...}, {'ts': ...}]]

Benchmarks

Benchmarks live in benchmarks/ and can be run from the repository root:

# Synthetic log workloads (Structly inline vs Python libraries)
python3 benchmarks/benchmark_structly_vs_kv_parsers.py --dataset firewall

# WHOIS extraction vs pure Python regex pipelines
python3 benchmarks/benchmark_structly_vs_python.py

# Direct comparison to whois-parser
python3 benchmarks/benchmark_structly_vs_whoisparser.py

Each script prints a PrettyTable summary; fastest parsers are highlighted in green, slowest in red.
Check benchmarks/README.md for examples.

Fixtures & Testing

Synthetic datasets (10k lines each) cover DNS, DHCP, IPAM, firewall/netflow, and router logs under tests/data/. Tests verify both accuracy and long-run stability:

  • tests/functional/test_inline_logs.py compares inline extractions to a Python baseline.
  • tests/functional/test_memory_soak.py guards against RSS leaks on large runs.
  • Unit tests cover API validation, streaming methods, and rayon policy handling.

Run the suite after installing dev requirements:

python3 -m pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

structly-1.0.1.tar.gz (2.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

structly-1.0.1-cp37-abi3-win_amd64.whl (839.6 kB view details)

Uploaded CPython 3.7+Windows x86-64

structly-1.0.1-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ x86-64

structly-1.0.1-cp37-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.9 MB view details)

Uploaded CPython 3.7+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file structly-1.0.1.tar.gz.

File metadata

  • Download URL: structly-1.0.1.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.1

File hashes

Hashes for structly-1.0.1.tar.gz
Algorithm Hash digest
SHA256 6fc4e21e47359f9434a5dab2bec5bf7258ffa7e171292dbc7d4e82edfb517a69
MD5 94cfe353e1879add73c5f4b12d306595
BLAKE2b-256 f91ccc3a3470dc60c4ae32c4fa36feb3518171e167a475fc3afdb0610e11a6ca

See more details on using hashes here.

File details

Details for the file structly-1.0.1-cp37-abi3-win_amd64.whl.

File metadata

  • Download URL: structly-1.0.1-cp37-abi3-win_amd64.whl
  • Upload date:
  • Size: 839.6 kB
  • Tags: CPython 3.7+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.1

File hashes

Hashes for structly-1.0.1-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 7b330492cd200454c237c7222cd1171e1aec9ee01e462660047361b98da92706
MD5 a8f6c958979bd72f841cd8e848407218
BLAKE2b-256 840876f1b2be07c7df5be180c4b7c3b6a9d3ccba740a4bc7e066f752c5870ee7

See more details on using hashes here.

File details

Details for the file structly-1.0.1-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for structly-1.0.1-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bdb2b8fc418801906f8dcceb2b53d15d3470c529481ffc57b79e85588d4cfe97
MD5 0fda07b3c382a841d3f797962c47e90a
BLAKE2b-256 fa2ef1df43050edb23dab84c92012a77e1b313669e9f92fea8989e64b75bd2dd

See more details on using hashes here.

File details

Details for the file structly-1.0.1-cp37-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for structly-1.0.1-cp37-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 635f3caebec47a6dd0b21e9268a78b596bd3f09c56306a2ca971b4bb050d00c2
MD5 818a0afed726a5617ef41e8152834c7c
BLAKE2b-256 594f4be053c06d4e17755a91eb0e95aaecd147c47e3e6046a57467b1f76c167d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page