Skip to main content

Fast WHOIS parsing core with a Pythonic surface area.

Project description

structly_whois

Structly-powered WHOIS parsing.

Main CI Coverage PyPI

Fast WHOIS parser powered by structly and msgspec.

structly_whois wraps Structly's compiled parsers with a modern Python API so you can normalize noisy WHOIS payloads, auto-detect TLD-specific overrides, and emit JSON-ready records without hauling heavy regex DSLs or dateparser into your hot path.

This library parses raw WHOIS text, it does not perform WHOIS lookups. Be mindful of data handling obligations (GDPR/ICANN/etc.)

Highlights

  • Structly speed – Per-TLD configurations are compiled by Structly, keeping parsing under a millisecond/record even on commodity hardware.
  • Typed surface – msgspec-based WhoisRecord structs, py.typed wheels, and a CLI entrypoint (structly-whois) for quick inspection.
  • Configurable – Inject your own Structly configs, register TLD overrides at runtime, or extend the base field definitions without forking.
  • Lean dependencies – No dateparser or required by default. Plug in a date_parser callable only when locale-aware coercion is truly needed.
  • Batched & streaming friendlyparse_many and parse_chunks let you process millions of payloads from queues, tarballs, or S3 archives without buffering everything in memory.

Installation

pip install structly-whois               # end users
pip install -e '.[dev]'                  # contributors (installs Ruff, pytest, etc.)

Python 3.9+ is supported. Wheels ship py.typed markers for static analyzers.

Quickstart

from structly_whois import WhoisParser

parser = WhoisParser()
payload = """
          Domain Name: example.com
          Registrar: Example Registrar LLC
          Creation Date: 2020-01-01T12:00:00Z
          Registry Expiry Date: 2030-01-01T12:00:00Z
          Name Server: NS1.EXAMPLE.COM
          Name Server: NS2.EXAMPLE.COM
          Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
          Registrant Name: Example DNS
          """

record = parser.parse_record(payload, domain="example.com")
print(record.domain)
print(record.statuses)
print(record.registered_at)
print(record.to_dict())

If you omit domain, structly_whois inspects the payload to infer the domain/TLD and automatically picks the right Structly configuration. Need just a mapping instead of a structured record? Use parse/parse_many directly:

parser = WhoisParser(preload_tlds=("com", "net"))
parsed = parser.parse(payload)  # returns {"domain_name": ..., "registrar": ..., ...}

batch = parser.parse_many(
    [payload_1, payload_2],
    domain=["example.com", "example.net"],
    tld="com",  # optional hint; omit to auto-select per domain
)
for result in batch:
    print(result["domain_name"])

CLI usage

structly-whois tests/samples/whois/google.com.txt \
  --domain google.com \
  --record --json \
  --date-parser tests.common.helpers:iso_to_datetime

The CLI mirrors the Python API: pass --record to emit a structured WhoisRecord, --lowercase to normalize strings, and --date-parser module:callable when you want custom date coercion.

Advanced usage

Batched parsing

parser = WhoisParser()
payloads: list[str] = fetch_from_queue()
records = parser.parse_many(payloads, to_records=True, lowercase=True)
for record in records:
    ingest(record)  # bulk insert, emit to Kafka, etc.

Streaming note: to_records=True buffers input

parse_many(..., to_records=True) yields WhoisRecord instances. Building those structs requires both the parsed fields and the original raw payload, so the incoming iterable is materialized into a list. When processing very large streams, chunk the input so memory stays bounded:

from itertools import islice
from structly_whois import WhoisParser

def chunked(iterator, size: int):
    iterator = iter(iterator)
    while True:
        chunk = list(islice(iterator, size))
        if not chunk:
            return
        yield chunk

parser = WhoisParser()

for payload_chunk in chunked(iter_whois_payloads(), 1024):
    records = parser.parse_many(payload_chunk, to_records=True)
    for record in records:
        ingest(record)

Optional date parser hook

structly_whois intentionally avoids bundling dateparser. If you need locale-specific conversions, pass a callable either when constructing the parser or per method:

from datetime import datetime

def date_hook(value: str) -> datetime:
    return datetime.fromisoformat(value.replace("Z", "+00:00"))

parser = WhoisParser(date_parser=date_hook)
record = parser.parse_record(raw_whois, domain="example.dev", date_parser=date_hook)

For multilingual registries, the simplest plug-in is dateparser.parse.

NOTE: It can cut throughput by more than half.

Streaming from S3

import boto3
import gzip
import tarfile
from structly_whois import WhoisParser

def iter_whois_payloads(bucket: str, key: str):
    """Stream WHOIS samples from an S3-hosted tar.gz without touching disk."""
    s3 = boto3.client("s3")
    obj = s3.get_object(Bucket=bucket, Key=key)
    with gzip.GzipFile(fileobj=obj["Body"]) as gz:
        with tarfile.open(fileobj=gz, mode="r:") as tar:
            for member in tar:
                if not member.isfile():
                    continue
                raw = tar.extractfile(member).read().decode("utf-8", errors="ignore")
                yield raw

parser = WhoisParser()
payloads = iter_whois_payloads("whois-dumps", "2024-12.tar.gz")

for chunk in parser.parse_chunks(payloads, chunk_size=512):
    process(chunk)  # bulk insert, publish, etc.

Kafka batch ingestion

Need to process live WHOIS feeds? benchmarks/scripts/consume_and_parse.py shows how to wire WhoisParser into a Kafka consumer, group messages by TLD, and issue parse_many calls per bucket. Grouping domains ensures each batch uses the right Structly override and minimizes parser cache churn, so .com.br payloads never run through .com rules while still keeping throughput high.

Performance tip: pass domain= or tld= when you know it

Inference keeps things convenient, but the fastest path is to tell the parser what you already know:

from structly_whois import WhoisParser

parser = WhoisParser()

# Fastest path: you know the exact domain
record = parser.parse_record(raw_text, domain="example.com")

# Fast bulk parsing: you know every payload shares the same TLD
parsed = parser.parse_many(payloads, tld="com")
records = parser.parse_many(payloads, tld="com", to_records=True)

If you omit both domain and tld, WhoisParser inspects the payload and picks the right override automatically. That path is still efficient, but providing hints avoids the inference work entirely.

Custom Structly Config overrides

structly_whois is built for easy extensibility—you can extend the bundled Structly configs or replace them entirely, so parser behavior stays configurable without forking.

from structly import FieldPattern
from structly_whois import StructlyConfigFactory, WhoisParser

factory = StructlyConfigFactory(
    base_field_definitions={
        "domain_name": {"patterns": [FieldPattern.regex(r"^dn:\s*(?P<val>[a-z0-9.-]+)$")]},
    },
    tld_overrides={},
)
parser = WhoisParser(preload_tlds=("dev",), config_factory=factory)
parser.register_tld(
    "app",
    {
        "domain_name": {
            "extend_patterns": [FieldPattern.starts_with("App Domain:")],
        }
    },
)

API overview

Component Description
structly_whois.WhoisParser High-level parser with batching, record conversion, and optional CLI integration.
structly_whois.StructlyConfigFactory Factory that builds Structly configs with base fields + TLD overrides.
structly_whois.records.WhoisRecord Typed msgspec struct with to_dict() for JSON serialization.
structly_whois.normalize_raw_text Fast trimming of noise, privacy banners, and multiline headers.
structly_whois.cli Argparse-powered CLI that mirrors the Python API.

Benchmarks

make bench runs benchmarks/run_benchmarks.py, comparing structly_whois against whois-parser and python-whois. Default settings parse all fixtures ×100 iterations on a MacBook Pro (M4, Python 3.14):

backend records records/s avg latency (ms)
structly-whois 18400 7,788 0.128
structly-whois+dateutil 18400 7,130 0.14
structly-whois+dateparser 18400 804 1.244
whois-parser 18400 19 52.724
python-whois 18400 368 2.718

“dateutil” uses date_parser=dateutil.parser.parse; “dateparser” uses date_parser=dateparser.parse. Both illustrate how heavier date coercion affects throughput.

Example invocations:

# run every backend on all fixtures (default BENCHMARK_BACKENDS env)
python benchmarks/run_benchmarks.py

# run a custom backend list while keeping all fixtures
BENCHMARK_BACKENDS="structly-whois,structly-whois+date" \
  python benchmarks/run_benchmarks.py --iterations 100 --domains all

# focus on a couple of tricky registries with fewer iterations
python benchmarks/run_benchmarks.py --iterations 10 --domains google.com google.com.br

Add --save-result to persist the summary to benchmarks/results.md (or a custom --output path); otherwise runs print results to stdout only.

Development

make lint     # Ruff (E/F/W/I/UP/B/SIM)
make fmt      # Ruff formatter across src/tests/benchmarks
make test     # pytest + coverage (Hypothesis fixtures)
make cov      # coverage xml/report (≥90%)
make bench    # compare structly_whois vs whois-parser/python-whois

See CONTRIBUTING.md for versioning, release, and pull-request guidelines. CI (GitHub Actions) runs lint/test/build on every push; pushes to dev publish wheels to TestPyPI and tags vX.Y.Z publish to PyPI.

License

MIT © Nikola Stankovic.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

structly_whois-1.0.1.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

structly_whois-1.0.1-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file structly_whois-1.0.1.tar.gz.

File metadata

  • Download URL: structly_whois-1.0.1.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for structly_whois-1.0.1.tar.gz
Algorithm Hash digest
SHA256 af29266a18163e72dddef90057b6aecf48b7e274d65e2fde9d11e7c55dacdd81
MD5 86a8a65c09391686e2b8c46f3dd093fa
BLAKE2b-256 42ab0a07818e5f66997202c556391d44c7c95c4a39c649b7bc5b8f45bf3eae93

See more details on using hashes here.

File details

Details for the file structly_whois-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for structly_whois-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9631605723ec865706555e6a17cf65297d46b3c0d5a031a91dfcf64ab7a8b962
MD5 e186ad3d573750b8085899259f0f4267
BLAKE2b-256 0eb0e51e2c6096fc567de6727e65e76494f505e7b94db5e3b4c096fab351fa49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page