High-performance text/logs parser (e.g., whois, nuclei, dns/dhcp logs) with Rust core
Project description
Structly — Rust-powered parser made for massive telemetry and log workloads.
Source Code: https://github.com/bytevader/structly
Structly is a high-performance parsing toolkit that combines a Rust core with a Pythonic API.
True to its name, Structly turns massive amounts of unstructured input into clean, structured outputs without slowing your workflows.
It is built for teams who need to sift through and parse large volumes of operational telemetry—syslog, DNS, DHCP, IPAM, firewall, routing, whois, nuclei, etc. — faster and with less memory overhead than pure-Python pipelines.
Structly’s design maximises throughput without sacrificing the clarity of Python’s API. If you need reliable, deterministic log/text parsing at scale, Structly is built to slot into your pipeline—and leave Python-only alternatives far behind.
Why Structly?
- Native-speed extraction. Parsing logic is compiled to Rust and exposed through PyO3, giving Structly microsecond-level latency per record while staying drop-in compatible with Python workflows.
- Inline log intelligence. Inline mode recognises
key=valuetokens anywhere on the line, not only at the start, so modern, densely packed logs are handled without regex backtracking. - Predictable memory profile. The parser works on raw byte ranges and chunked batching, preventing the transient allocations often seen in Python log frameworks.
- Proven advantage over Python stacks. Benchmarks show Structly parsing synthetic DNS and firewall workloads ~4× faster than libraries such as
pygrok,pyparsing,regex, orlogparser3, while preserving full fidelity of the extracted fields.
When to Choose Structly Over Python Parsers
| Scenario | Why Structly Wins |
|---|---|
| Large batches (10k+ lines per file) | Native code + optional Rayon parallelism keeps throughput >600k lines/s. |
Dense inline logs (key=value …) |
Inline mode uses Aho–Corasick plus delimiter scans—no regex backtracking. |
| Multi-field WHOIS records | Rust implementation extracts complex sections in ~0.016s vs ~0.07s for regex. |
| Repeated runs in pipelines | parse_iter and parse_chunks stream results with predictable memory usage. |
| CPU-bound environments | Rayon policies let you scale across cores or run single-threaded deterministically. |
Installation
If you are working from this Git repository:
# Clone the repo and enter it
git clone https://github.com/bytevader/structly.git
cd structly
# Install requirements
pip install -e '.[dev]'
# or
python3 -m pip install -r requirements-dev.txt
# Build the native extension (release mode recommended)
make install-rust
# or, if you manage environments manually:
python3 -m maturin develop --release
Structly targets Python 3.9+ with the abi3 wheel and does not require a specific virtual environment layout.
Core Concepts
Configuration
from structly import StructlyConfig, FieldSpec, FieldPattern, Mode, StructlyParser
cfg = StructlyConfig.from_mapping({
"domain": {"patterns": ["sw:Domain:"]},
"registrar": FieldSpec(
patterns=[
FieldPattern.starts_with("Registrar:"),
FieldPattern.regex(r"^\s*Registrar:\s*(?P<val>.+)$"),
FieldPattern.regex(r"^\s*(?P<val>.+\[Tag = .+\])$"),
],
),
"nameservers": {
"patterns": ["sw:Name Server:"],
"mode": Mode.all.value,
"unique": True,
"return": "list",
},
})
parser = StructlyParser(cfg)
Patterns accept sw: (starts-with) and r: (regex) prefixes, returning lists or deduplicating values is built in.
You can use either just strings for patterns like this:
"sw:Domain:" - but keep in mind that the pattern string should start with sw: or r:
Or you can use FieldPattern model that is more readable:\
FieldPattern.starts_with("Registrar:"),
FieldPattern.regex(r"^\s*Registrar:\s*(?P<val>.+)$"),
Layouts: line vs inline
- Line layout (default). Extracts values that appear immediately after the prefix at the start of a line—ideal for classic syslog, WHOIS, or structured plaintext.
- Inline layout. Use
StructlyParser(..., field_layout="inline", inline_value_delimiters=" \t,;|")to scan for tokens anywhere on the line. Choose your own delimiter set for unusual formats.
Inline mode retains regex support and deduplication logic while significantly outperforming Python regex loops.
Rayon Policies
rayon_policy controls native parallelism:
"never"(default): deterministic single-thread execution."always": enables Rayon forparse_manyand chunked paths—best on multi-core hosts."auto": lets the runtime pick (currently equivalent to"always").
This policy is also respected by helper functions (prepare_parser, parse_text, etc).
Execution Modes
| Method | When to Use | Notes |
|---|---|---|
parse(text) |
Single document | Returns a dict of field→value. |
parse_tuple(text) |
Positional accesses | Saves dictionary overhead when order matters. |
parse_many(list[str]) |
Moderate batches (fits in RAM) | Processes eagerly and returns a list. |
parse_iter(iterable, chunk_size) |
Streaming pipelines | Yields one record at a time (or per chunk) without retaining previous results. |
parse_chunks(iterable, chunk_size) |
ETL batching | Chunked output for bulk writes (default 512). |
chunk_size must be a positive integer; invalid inputs raise immediately, keeping bugs discoverable early.
Usage
WHOIS example
from structly import StructlyConfig, FieldSpec, Mode, StructlyParser
WHOIS_SAMPLE = """\
Domain Name: EXAMPLE-CONTACT.COM
Registry Domain ID: 123456789_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.example-registrar.com
Registrar URL: https://www.example-registrar.com
Updated Date: 2024-03-11T07:12:34Z
Creation Date: 2010-06-18T13:45:21Z
Registry Expiry Date: 2030-06-18T13:45:21Z
Registrar: Example Registrar, Inc.
Registrar IANA ID: 199
Registrant Name: Example Holdings Privacy
Registrant Organization: Example Holdings
Registrant Street: 123 Example Ave
Registrant City: San Francisco
Registrant State/Province: CA
Registrant Postal Code: 94105
Registrant Country: US
Registrant Phone: +1.5555550000
Registrant Email: noc@example-holdings.com
Tech Email: tech@example-holdings.com
Name Server: NS1.EXAMPLE.NET
Name Server: NS2.EXAMPLE.NET
Name Server: NS3.EXAMPLE.NET
DNSSEC: unsigned
Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
Status: clientRenewProhibited https://icann.org/epp#clientRenewProhibited
"""
cfg = StructlyConfig.from_mapping({
"domain": {"patterns": ["sw:Domain Name:"]},
"registrar": {"patterns": ["sw:Registrar:"]},
"created": {"patterns": ["sw:Creation Date:"]},
"expiry": {"patterns": ["sw:Registry Expiry Date:"]},
"nameservers": {
"patterns": ["sw:Name Server:"],
"mode": Mode.all.value,
"unique": True,
"return": "list",
},
"statuses": {
"patterns": ["sw:Status:"] ,
"mode": Mode.all.value,
"unique": True,
"return": "list",
},
})
parser = StructlyParser(cfg)
result = parser.parse(WHOIS_SAMPLE)
print(result["domain"])
# EXAMPLE-CONTACT.COM
print(result["nameservers"])
# ['NS1.EXAMPLE.NET', 'NS2.EXAMPLE.NET', 'NS3.EXAMPLE.NET']
Method examples
from structly import StructlyConfig, StructlyParser
cfg = StructlyConfig.from_mapping({
"ts": {"patterns": ["sw:ts="]},
"host": {"patterns": ["sw:host="]},
"status": {"patterns": ["sw:status="]},
})
parser = StructlyParser(
cfg,
field_layout="inline",
inline_value_delimiters=" ",
)
sample_lines = [
"ts=2025-01-01T00:00:01Z host=api.demo status=ok latency=41ms",
"ts=2025-01-01T00:00:02Z host=web.demo status=warn latency=88ms",
]
single = parser.parse(sample_lines[0])
# {'ts': '2025-01-01T00:00:01Z', 'host': 'api.demo', 'status': 'ok'}
ordered = parser.parse_tuple(sample_lines[0])
# ('2025-01-01T00:00:01Z', 'api.demo', 'ok')
batch = parser.parse_many(sample_lines)
# [{'ts': ...}, {'ts': ...}]
streamed = list(parser.parse_iter(sample_lines, chunk_size=1))
# parsed docs yielded one at a time
chunked = list(parser.parse_chunks(sample_lines, chunk_size=2))
# [[{'ts': ...}, {'ts': ...}]]
Benchmarks
Benchmarks live in benchmarks/ and can be run from the repository root:
# Synthetic log workloads (Structly inline vs Python libraries)
python3 benchmarks/benchmark_structly_vs_kv_parsers.py --dataset firewall
# WHOIS extraction vs pure Python regex pipelines
python3 benchmarks/benchmark_structly_vs_python.py
# Direct comparison to whois-parser
python3 benchmarks/benchmark_structly_vs_whoisparser.py
Each script prints a PrettyTable summary; fastest parsers are highlighted in green, slowest in red.
Check benchmarks/README.md for examples.
Fixtures & Testing
Synthetic datasets (10k lines each) cover DNS, DHCP, IPAM, firewall/netflow, and router logs under tests/data/. Tests verify both accuracy and long-run stability:
tests/functional/test_inline_logs.pycompares inline extractions to a Python baseline.tests/functional/test_memory_soak.pyguards against RSS leaks on large runs.- Unit tests cover API validation, streaming methods, and rayon policy handling.
Run the suite after installing dev requirements:
python3 -m pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file structly-1.0.1.tar.gz.
File metadata
- Download URL: structly-1.0.1.tar.gz
- Upload date:
- Size: 2.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fc4e21e47359f9434a5dab2bec5bf7258ffa7e171292dbc7d4e82edfb517a69
|
|
| MD5 |
94cfe353e1879add73c5f4b12d306595
|
|
| BLAKE2b-256 |
f91ccc3a3470dc60c4ae32c4fa36feb3518171e167a475fc3afdb0610e11a6ca
|
File details
Details for the file structly-1.0.1-cp37-abi3-win_amd64.whl.
File metadata
- Download URL: structly-1.0.1-cp37-abi3-win_amd64.whl
- Upload date:
- Size: 839.6 kB
- Tags: CPython 3.7+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b330492cd200454c237c7222cd1171e1aec9ee01e462660047361b98da92706
|
|
| MD5 |
a8f6c958979bd72f841cd8e848407218
|
|
| BLAKE2b-256 |
840876f1b2be07c7df5be180c4b7c3b6a9d3ccba740a4bc7e066f752c5870ee7
|
File details
Details for the file structly-1.0.1-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: structly-1.0.1-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdb2b8fc418801906f8dcceb2b53d15d3470c529481ffc57b79e85588d4cfe97
|
|
| MD5 |
0fda07b3c382a841d3f797962c47e90a
|
|
| BLAKE2b-256 |
fa2ef1df43050edb23dab84c92012a77e1b313669e9f92fea8989e64b75bd2dd
|
File details
Details for the file structly-1.0.1-cp37-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: structly-1.0.1-cp37-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.7+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
635f3caebec47a6dd0b21e9268a78b596bd3f09c56306a2ca971b4bb050d00c2
|
|
| MD5 |
818a0afed726a5617ef41e8152834c7c
|
|
| BLAKE2b-256 |
594f4be053c06d4e17755a91eb0e95aaecd147c47e3e6046a57467b1f76c167d
|