Skip to main content

RFC-aware email parsing, normalization, extraction, and DNS health checks with env-config and a phonenumbers-like API.

Project description

📧 emailtoolkit

License: MIT Python 3.12+ Type hints: PEP 561 Status: Beta

RFC-aware email parsing, normalization, extraction, and DNS health checks with a clean, phonenumbers-style API. Env-configurable. Privacy-safe logging. Optional CLI.


✨ Why emailtoolkit

  • Practical and strict where it matters Syntax validated with email_validator, IDN via idna, DNS health via dnspython (optional).
  • Real-world canonicalization Provider-aware normalization and canonical comparison. Gmail dot and plus rules, googlemail aliasing, optional plus-stripping for common providers.
  • Production ergonomics .env and config.json support, structured data classes, robust logging with PII redaction, TTL-cached DNS.
  • Simple API Import functions or use the EmailTools class. Also ships a CLI for quick checks and pipelines.

🧩 Features

  • Parse, validate, normalize, canonicalize, compare
  • Extract addresses from free text with Unicode-aware regex
  • DNS health checks with MX and A/AAAA lookups, TTL caching
  • IDN handling with punycode conversion
  • Disposable domain filtering from file or URL source
  • Config precedence: environment > .env > config.json > defaults
  • Privacy by default: email redaction in logs and exceptions
  • CLI entry point for scripting and ops

🚀 Install

From source (editable)

# from repo root
python -m venv .venv
. .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -U pip

# optional extras:
#   dns -> dnspython
#   dotenv -> python-dotenv
pip install -e ".[dns,dotenv]"

From PyPI (when published)

pip install emailtoolkit
# with optional extras
pip install "emailtoolkit[dns,dotenv]"

⚙️ Configuration

emailtoolkit loads configuration in this order:

  1. Environment variables
  2. .env file in the working directory (if python-dotenv is installed)
  3. config.json if you pass --config or build_tools("/path/to/config.json")
  4. Internal defaults

Environment variables

Variable Type Default Description
EMAILTK_LOG_LEVEL str INFO Logging level: DEBUG INFO WARNING ERROR
EMAILTK_REQUIRE_MX bool true If true, deliverability requires MX. If false, MX or A/AAAA is enough
EMAILTK_REQUIRE_DELIVERABILITY bool false If true, parse raises if deliverability fails
EMAILTK_ALLOW_SMTPUTF8 bool true Allow UTF-8 local parts per RFC 6531
EMAILTK_DNS_TIMEOUT_SECONDS float 2.0 DNS timeout seconds
EMAILTK_DNS_TTL_SECONDS int 900 TTL for cached DNS answers
EMAILTK_USE_DNSPYTHON bool true Use dnspython when available
EMAILTK_EXTRACT_UNIQUE bool true Deduplicate by canonical form during extraction
EMAILTK_EXTRACT_MAX_RESULTS int or empty empty Hard cap on extractor results. Empty or 0 means no cap
EMAILTK_NORMALIZE_CASE bool true Lowercase domain on normalize
EMAILTK_GMAIL_CANON bool true Apply Gmail dot and plus canonicalization rules
EMAILTK_TREAT_DISPOSABLE_AS_INVALID bool false If true, disposable domains cause parse to raise
EMAILTK_BLOCK_PRIVATE_TLDS bool false Enforce known public suffixes if provided
EMAILTK_PUBLIC_SUFFIX_FILE path empty File with known public suffixes, one per line
EMAILTK_DISPOSABLE_SOURCE file://... or url://... or none none Source for disposable domains
EMAILTK_ENABLE_SMTP_PROBE bool false Reserved for optional SMTP probing module
EMAILTK_SMTP_PROBE_TIMEOUT float 3.0 Probe timeout
EMAILTK_SMTP_PROBE_CONCURRENCY int 5 Probe concurrency
EMAILTK_SMTP_PROBE_HELO str example.com HELO/EHLO identity
EMAILTK_PII_REDACT_LOGS bool true Mask emails in logs and exceptions
EMAILTK_PII_REDACT_STYLE mask or none mask Redaction style

Example .env

EMAILTK_LOG_LEVEL=INFO
EMAILTK_REQUIRE_MX=true
EMAILTK_REQUIRE_DELIVERABILITY=false
EMAILTK_DNS_TIMEOUT_SECONDS=1.5
EMAILTK_DNS_TTL_SECONDS=60
EMAILTK_USE_DNSPYTHON=true
EMAILTK_PII_REDACT_LOGS=true
EMAILTK_PII_REDACT_STYLE=mask
EMAILTK_DISPOSABLE_SOURCE=file://./disposable.txt

Example config.json

{
  "log_level": "INFO",
  "extract_unique": true,
  "extract_max_results": null,
  "require_mx": true,
  "require_deliverability": false,
  "allow_smtputf8": true,
  "dns_timeout_seconds": 2.0,
  "dns_ttl_seconds": 900,
  "use_dnspython": true,
  "normalize_case": true,
  "gmail_style_canonicalization": true,
  "treat_disposable_as_invalid": false,
  "block_private_tlds": false,
  "known_public_suffixes": null,
  "disposable_source": "none"
}

🧪 Quick start

import emailtoolkit as et

et.is_valid("Test.User+sales@Gmail.com")            # True
et.normalize("Test.User+sales@Gmail.com")           # "Test.User+sales@gmail.com"
et.canonical("t.e.s.t+sales@googlemail.com")        # "test@gmail.com"
et.compare("t.e.s.t+sales@googlemail.com", "test@gmail.com")  # True

e = et.parse("Alice@example.com")
print(e.normalized)          # "Alice@example.com"
print(e.domain_info.has_mx)  # may be True/False depending on resolver

found = et.extract("Contact a@example.com, A@EXAMPLE.com, junk@@bad")
print([x.normalized for x in found])  # ["a@example.com"]

Prefer a configured instance:

from emailtoolkit import EmailTools
from emailtoolkit.utils.config import load_config

tools = EmailTools(load_config("./config.json"))
tools.is_valid("user@例え.テスト")

🛠️ CLI

# from anywhere once installed
emailtoolkit parse "Test.User+foo@Gmail.com"
emailtoolkit validate "user@example.com"
emailtoolkit normalize "Test.User+foo@Gmail.com"
emailtoolkit canonical "t.e.s.t+bar@googlemail.com"
emailtoolkit domain example.com
echo "a@example.com, t.e.s.t+z@gmail.com" | emailtoolkit extract --limit 5

# use a config
emailtoolkit --config ./emailtoolkit/configs/config.example.json parse "user@domain.com"

📚 API reference

Data models

from emailtoolkit import Email, DomainInfo, EmailParseException
  • Email

    • original str
    • local str
    • domain str
    • ascii_email str
    • normalized str
    • canonical str
    • domain_info DomainInfo
    • valid_syntax bool
    • deliverable_dns bool
    • reason Optional[str]
  • DomainInfo

    • domain str
    • ascii_domain str
    • mx_hosts tuple[str, ...]
    • a_hosts tuple[str, ...]
    • has_mx bool
    • has_a bool
    • disposable bool
  • EmailParseException(ValueError) Includes domain_info for context.

Module functions

import emailtoolkit as et

et.parse(raw: str) -> Email
et.is_valid(raw: str) -> bool
et.normalize(raw: str) -> str
et.canonical(raw: str) -> str
et.extract(text: str) -> list[Email]
et.compare(a: str, b: str) -> bool
et.domain_health(domain: str) -> DomainInfo
et.build_tools(overrides_path: str | None = None) -> EmailTools

Class

from emailtoolkit import EmailTools
from emailtoolkit.utils.config import Config, load_config

tools = EmailTools(cfg=Config())              # or EmailTools(config_path="config.json") via loader
tools.parse(...)

Behavior notes:

  • parse uses email_validator for syntax and normalization only. Deliverability is decided by our DNS layer:

    • If require_mx is true, deliverable means MX exists.
    • If require_mx is false, deliverable means MX or A/AAAA exists.
  • Gmail canonicalization:

    • googlemail.com is treated as gmail.com for identity comparison.
    • Dots are stripped and plus-tags are removed for Gmail if enabled.

🔒 Security and privacy

  • PII redaction in logs is enabled by default. Control with:

    • EMAILTK_PII_REDACT_LOGS=true|false
    • EMAILTK_PII_REDACT_STYLE=mask|none
  • Do not log raw emails in your app. The logger masks the local part by default.

  • If you enable SMTP probing in the future, keep it opt-in, rate limited, and legally vetted.


🧰 Development

# lint and type checks (examples; use your preferred tools)
pip install -e ".[dns,dotenv]" pytest ruff mypy
ruff check src
mypy src/emailtoolkit
pytest -q

Optional sanity test script example is in the repo root:

python test_emailtoolkit.py

🧭 Roadmap

  • Async resolver and extractor for high concurrency
  • Provider rules registry loaded from data files
  • Optional SMTP RCPT probe with strict rate limits
  • Public suffix enforcement with a bundled list
  • Disposable domain updater command

🙏 Acknowledgments

This project stands on the shoulders of these excellent libraries:

  • email_validator by Joshua Tauberer — Unlicense (public domain)
  • dnspython — ISC license (optional dependency)
  • idna — BSD 3-Clause

Full texts in THIRD_PARTY_NOTICES.md. Thank you to the maintainers and contributors of these projects.


📦 License

MIT. See LICENSE.

Third-party licenses are included in THIRD_PARTY_NOTICES.md.


💡 Contributing

  • Open an issue with clear reproduction steps or a focused proposal.
  • Small PRs preferred. Include tests and update docs where relevant.
  • Keep performance and privacy top of mind.

⭐ Support

If this toolkit helps you, star the repo and share it. Issues and PRs welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emailtoolkit-0.1.1.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

emailtoolkit-0.1.1-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file emailtoolkit-0.1.1.tar.gz.

File metadata

  • Download URL: emailtoolkit-0.1.1.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for emailtoolkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 13b55ceb9c7e9118a695415040a5b1d162ea16a91e9858756f36f534a1b02d80
MD5 b03387c5678aa1fc1f3b165edfa88796
BLAKE2b-256 ddaaf91bb1b1ab9ace8059512a6cd63d84d925305b705707a827e59556f36621

See more details on using hashes here.

Provenance

The following attestation bundles were made for emailtoolkit-0.1.1.tar.gz:

Publisher: release-pypi.yml on ImYourBoyRoy/emailtoolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file emailtoolkit-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: emailtoolkit-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for emailtoolkit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 185d9c23aa348a274f0ded5dcc45deacbfae78935bf872f819cbf680e19cc0a4
MD5 3b375ffed5bec1f97e0268da0d3dcbd1
BLAKE2b-256 dfb0a4030aef9de955c561d7742c1e686f399193c25814c99ee1e639c16d851f

See more details on using hashes here.

Provenance

The following attestation bundles were made for emailtoolkit-0.1.1-py3-none-any.whl:

Publisher: release-pypi.yml on ImYourBoyRoy/emailtoolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page