Skip to main content

Fast WHOIS parsing core with a Pythonic surface area.

Project description

structly_whois

Structly-powered WHOIS parsing.

Main CI Coverage PyPI

Fast WHOIS parser powered by structly and msgspec.

structly_whois wraps Structly's compiled parsers with a modern Python API so you can normalize noisy WHOIS payloads, auto-detect TLD-specific overrides, and emit JSON-ready records without hauling heavy regex DSLs or dateparser into your hot path.

Highlights

  • Structly speed – Per-TLD configurations are compiled by Structly, keeping parsing under a millisecond/record even on commodity hardware.
  • Typed surface – msgspec-based WhoisRecord structs, py.typed wheels, and a CLI entrypoint (structly-whois) for quick inspection.
  • Configurable – Inject your own Structly configs, register TLD overrides at runtime, or extend the base field definitions without forking.
  • Lean dependencies – No dateparser or required by default. Plug in a date_parser callable only when locale-aware coercion is truly needed.
  • Batched & streaming friendlyparse_many and parse_chunks let you process millions of payloads from queues, tarballs, or S3 archives without buffering everything in memory.

Installation

pip install structly-whois               # end users
pip install -e '.[dev]'                  # contributors (installs Ruff, pytest, etc.)

Python 3.9+ is supported. Wheels ship py.typed markers for static analyzers.

Quickstart

from structly_whois import WhoisParser

parser = WhoisParser()
payload = """\
Domain Name: example.com
Registrar: Example Registrar LLC
Creation Date: 2020-01-01T12:00:00Z
Registry Expiry Date: 2030-01-01T12:00:00Z
Name Server: NS1.EXAMPLE.COM
Name Server: NS2.EXAMPLE.COM
Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
Registrant Name: Example DNS
"""

record = parser.parse_record(payload, domain="example.com")
print(record.domain)
print(record.statuses)
print(record.registered_at)
print(record.to_dict())

If you omit domain, structly_whois inspects the payload to infer the domain/TLD and automatically picks the right Structly configuration.

CLI usage

structly-whois tests/samples/whois/google.com.txt \
  --domain google.com \
  --record --json \
  --date-parser tests.common.helpers:iso_to_datetime

The CLI mirrors the Python API: pass --record to emit a structured WhoisRecord, --lowercase to normalize strings, and --date-parser module:callable when you want custom date coercion.

Advanced usage

Batched parsing

parser = WhoisParser()
payloads: list[str] = fetch_from_queue()
records = parser.parse_many(payloads, to_records=True, lowercase=True)
for record in records:
    ingest(record)  # bulk insert, emit to Kafka, etc.

Optional date parser hook

structly_whois intentionally avoids bundling dateparser. If you need locale-specific conversions, pass a callable either when constructing the parser or per method:

from datetime import datetime

def date_hook(value: str) -> datetime:
    return datetime.fromisoformat(value.replace("Z", "+00:00"))

parser = WhoisParser(date_parser=date_hook)
record = parser.parse_record(raw_whois, domain="example.dev", date_parser=date_hook)

For multilingual registries, the simplest plug-in is dateparser.parse.

NOTE: It can cut throughput by more than half.

Streaming from S3

import boto3
import gzip
import tarfile
from structly_whois import WhoisParser

def iter_whois_payloads(bucket: str, key: str):
    """Stream WHOIS samples from an S3-hosted tar.gz without touching disk."""
    s3 = boto3.client("s3")
    obj = s3.get_object(Bucket=bucket, Key=key)
    with gzip.GzipFile(fileobj=obj["Body"]) as gz:
        with tarfile.open(fileobj=gz, mode="r:") as tar:
            for member in tar:
                if not member.isfile():
                    continue
                raw = tar.extractfile(member).read().decode("utf-8", errors="ignore")
                yield raw

parser = WhoisParser()
payloads = iter_whois_payloads("whois-dumps", "2024-12.tar.gz")

for chunk in parser.parse_chunks(payloads, chunk_size=512):
    process(chunk)  # bulk insert, publish, etc.

Custom Structly Config overrides

structly_whois is built for easy extensibility—you can extend the bundled Structly configs or replace them entirely, so parser behavior stays configurable without forking.

from structly import FieldPattern
from structly_whois import StructlyConfigFactory, WhoisParser

factory = StructlyConfigFactory(
    base_field_definitions={
        "domain_name": {"patterns": [FieldPattern.regex(r"^dn:\s*(?P<val>[a-z0-9.-]+)$")]},
    },
    tld_overrides={},
)
parser = WhoisParser(preload_tlds=("dev",), config_factory=factory)
parser.register_tld(
    "app",
    {
        "domain_name": {
            "extend_patterns": [FieldPattern.starts_with("App Domain:")],
        }
    },
)

API overview

Component Description
structly_whois.WhoisParser High-level parser with batching, record conversion, and optional CLI integration.
structly_whois.StructlyConfigFactory Factory that builds Structly configs with base fields + TLD overrides.
structly_whois.records.WhoisRecord Typed msgspec struct with to_dict() for JSON serialization.
structly_whois.normalize_raw_text Fast trimming of noise, privacy banners, and multiline headers.
structly_whois.cli Argparse-powered CLI that mirrors the Python API.

Benchmarks

make bench runs benchmarks/run_benchmarks.py, comparing structly_whois against whois-parser and python-whois. Default settings parse all 105 fixtures ×100 iterations on a MacBook Pro (M4, Python 3.14):

backend records records/s avg latency (ms)
structly-whois 10,500 7,779 0.129
structly-whois + dateutil 10,500 3,236 0.309
structly-whois + dateparser 10,500 996 1.004
python-whois 10,500 196 5.096
whois-parser 10,500 17 58.229

“dateutil” uses date_parser=dateutil.parser.parse; “dateparser” uses date_parser=dateparser.parse. Both illustrate how heavier date coercion affects throughput.

Development

make lint     # Ruff (E/F/W/I/UP/B/SIM)
make fmt      # Ruff formatter across src/tests/benchmarks
make test     # pytest + coverage (Hypothesis fixtures)
make cov      # coverage xml/report (≥90%)
make bench    # compare structly_whois vs whois-parser/python-whois

See CONTRIBUTING.md for versioning, release, and pull-request guidelines. CI (GitHub Actions) runs lint/test/build on every push; pushes to dev publish wheels to TestPyPI and tags vX.Y.Z publish to PyPI.

License

MIT © Nikola Stankovic.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

structly_whois-1.0.0.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

structly_whois-1.0.0-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file structly_whois-1.0.0.tar.gz.

File metadata

  • Download URL: structly_whois-1.0.0.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for structly_whois-1.0.0.tar.gz
Algorithm Hash digest
SHA256 000d6cee01d4ce130c2ef93cdc54bf778eb4a8a756d09ae9f3bfee08251fd8d8
MD5 eb436b7beb3561325d085bef01a6fc94
BLAKE2b-256 9b213e1b3dc97db711ee7eaa436c830bc7aa2d8d6e76e695530226002b30cbaf

See more details on using hashes here.

File details

Details for the file structly_whois-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for structly_whois-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d471358f141da800a08a8e1e87b4c268dd5842bee7297bfe106684a7fc0b723e
MD5 ba3d69ba4942bd0ba971e81ebf71e4d7
BLAKE2b-256 8c6d84fbe0947b288a0f0d2675159074cd1882d9623f923311721573b601553f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page