Skip to main content

RFC-aware email parsing, normalization, extraction, and DNS health checks with env-config and a phonenumbers-like API.

Project description

📧 emailtoolkit

PyPI Version Python Versions CI Status License Typing: PEP 561

RFC‑aware email parsing, normalization, extraction, and DNS health checks with a clean, phonenumbers‑style API.


✨ Design goals

  • Simple API Be as easy as phonenumbers. Import module‑level functions for quick tasks, or instantiate EmailTools for tuned, high‑performance use.

  • Practical validation Separate syntax validation (via email_validator) from deliverability checks. Enforce your own DNS policy (require MX, or allow A/AAAA fallback).

  • Provider‑aware identity Correctly determine that test.user@gmail.com and testuser+sales@googlemail.com are the same identity using canonicalization rules.

  • Operations‑ready Native env, .env, and config.json support; PII‑safe logging; TTL‑cached DNS; robust CLI.


🚀 Installation

pip install emailtoolkit
# extras for DNS and .env support
pip install "emailtoolkit[dns,dotenv]"

🧪 Quick start

import emailtoolkit as et

# Validate
et.is_valid("Test.User+sales@Gmail.com")  # True

# Canonical form (provider‑specific rules)
et.canonical("t.e.s.t+sales@googlemail.com")  # "test@gmail.com"

# Compare by canonical identity
et.compare("t.e.s.t+sales@googlemail.com", "test@gmail.com")  # True

# Extract from free text (returns Email objects)
found = et.extract("Contact a@example.com, A@EXAMPLE.com, and junk@@bad.")
print([e.normalized for e in found])  # ["a@example.com", "A@example.com"]

🛠️ Command‑line interface (CLI)

# Canonical form
emailtoolkit canonical "t.e.s.t+bar@googlemail.com"
# → test@gmail.com

# Domain DNS health (JSON)
emailtoolkit domain example.com
# {
#   "domain": "example.com",
#   "ascii_domain": "example.com",
#   "mx_hosts": [],
#   "a_hosts": ["93.184.216.34"],
#   "has_mx": false,
#   "has_a": true,
#   "disposable": false
# }

# Extract from stdin
echo "Contact me at a@example.com" | emailtoolkit extract

⚙️ Configuration

Load precedence:

  1. Environment variables (e.g., EMAILTK_LOG_LEVEL)
  2. .env in the working directory (requires dotenv extra)
  3. config.json (when passed to CLI --config or build_tools("/path/to/config.json"))
  4. Internal defaults

Environment variables (full)

Variable Type Default Description
EMAILTK_LOG_LEVEL str INFO Logging level: DEBUG INFO WARNING ERROR
EMAILTK_REQUIRE_MX bool true If true, deliverability requires MX. If false, MX or A/AAAA is enough
EMAILTK_REQUIRE_DELIVERABILITY bool false If true, parse raises if deliverability fails
EMAILTK_ALLOW_SMTPUTF8 bool true Allow UTF‑8 local parts per RFC 6531
EMAILTK_DNS_TIMEOUT_SECONDS float 2.0 DNS timeout seconds
EMAILTK_DNS_TTL_SECONDS int 900 TTL for cached DNS answers
EMAILTK_USE_DNSPYTHON bool true Use dnspython when available
EMAILTK_EXTRACT_UNIQUE bool true Deduplicate by canonical form during extraction
EMAILTK_EXTRACT_MAX_RESULTS int or empty empty Hard cap on extractor results. Empty or 0 means no cap
EMAILTK_NORMALIZE_CASE bool true Lowercase domain on normalize
EMAILTK_GMAIL_CANON bool true Apply Gmail dot and plus canonicalization rules
EMAILTK_TREAT_DISPOSABLE_AS_INVALID bool false If true, disposable domains cause parse to raise
EMAILTK_BLOCK_PRIVATE_TLDS bool false Enforce known public suffixes if provided
EMAILTK_PUBLIC_SUFFIX_FILE path empty File with known public suffixes, one per line
EMAILTK_DISPOSABLE_SOURCE file://... or url://... or none none Source for disposable domains
EMAILTK_ENABLE_SMTP_PROBE bool false Reserved for optional SMTP probing module
EMAILTK_SMTP_PROBE_TIMEOUT float 3.0 Probe timeout
EMAILTK_SMTP_PROBE_CONCURRENCY int 5 Probe concurrency
EMAILTK_SMTP_PROBE_HELO str example.com HELO/EHLO identity
EMAILTK_PII_REDACT_LOGS bool true Mask emails in logs and exceptions
EMAILTK_PII_REDACT_STYLE mask or none mask Redaction style

See .env.example for a ready‑to‑copy template.


🧱 Disposable domain filtering

Create a text file and point to it:

# disposable.txt
# Lines beginning with # are comments
# Domains are matched case‑insensitively on ASCII form
mailinator.com
10minutemail.com
sharklasers.com

Enable via .env:

EMAILTK_DISPOSABLE_SOURCE=file://./disposable.txt

Optionally set:

EMAILTK_TREAT_DISPOSABLE_AS_INVALID=true

This will raise EmailParseException when parsing addresses on those domains.


🤖 Agents, MCP servers, and tool‑calling

from pydantic import BaseModel, Field
import emailtoolkit as et

class EmailInput(BaseModel):
    email: str = Field(..., description="Email address to parse")

class DomainInput(BaseModel):
    domain: str = Field(..., description="Domain to inspect")

def tool_parse(args: EmailInput):
    e = et.parse(args.email)
    return {
        "normalized": e.normalized,
        "canonical": e.canonical,
        "deliverable": e.deliverable_dns,
        "domain": e.domain_info.ascii_domain,
    }

def tool_domain(args: DomainInput):
    d = et.domain_health(args.domain)
    return {
        "domain": d.ascii_domain,
        "has_mx": d.has_mx,
        "has_a": d.has_a,
        "disposable": d.disposable,
    }

📚 API surface

import emailtoolkit as et
from emailtoolkit import EmailTools, Email, DomainInfo, EmailParseException

# module functions
et.parse(raw: str) -> Email
et.is_valid(raw: str) -> bool
et.normalize(raw: str) -> str
et.canonical(raw: str) -> str
et.extract(text: str) -> list[Email]
et.compare(a: str, b: str) -> bool
et.domain_health(domain: str) -> DomainInfo
et.build_tools(overrides_path: str | None = None) -> EmailTools

# dataclasses
Email(
  original, local, domain, ascii_email, normalized, canonical,
  domain_info: DomainInfo, valid_syntax: bool, deliverable_dns: bool, reason: str|None
)
DomainInfo(domain, ascii_domain, mx_hosts, a_hosts, has_mx, has_a, disposable)

🔒 Security & privacy

  • PII redaction in logs is on by default (EMAILTK_PII_REDACT_LOGS).
  • Avoid logging raw addresses in your application.
  • If SMTP probing is enabled in the future, keep it opt‑in, rate‑limited, and legally reviewed.

🧰 Development

pip install -e ".[dns,dotenv]" pytest ruff mypy
ruff check src
mypy src/emailtoolkit
pytest -q

🙏 Acknowledgments

Built on:

  • email_validator by Joshua Tauberer (Unlicense)
  • dnspython (ISC) [optional]
  • idna (BSD‑3‑Clause)

See THIRD_PARTY_NOTICES.md for license texts.


📦 License

MIT. See LICENSE. Third‑party licenses in THIRD_PARTY_NOTICES.md.


⭐ Support

If this toolkit helps you, star the repo and share it. Issues and PRs welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emailtoolkit-0.1.2.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

emailtoolkit-0.1.2-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file emailtoolkit-0.1.2.tar.gz.

File metadata

  • Download URL: emailtoolkit-0.1.2.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for emailtoolkit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5a7c3662233c23528c27153b8f2678a5a3f12a3c6a2d5fcc487c8b0d3eaef424
MD5 1cde8aa2decea734593a7ab5ba60ed43
BLAKE2b-256 eed2e6cfb02fbc47846ef09928b8d488c13faa9c13602e73ee41139e76f319c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for emailtoolkit-0.1.2.tar.gz:

Publisher: release-pypi.yml on ImYourBoyRoy/emailtoolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file emailtoolkit-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: emailtoolkit-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for emailtoolkit-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 11a26560df1c2beb8c809a5724ec76bb019daf8a09b2b54cf4ac73e3b2f260cd
MD5 e10b6f9ab5fd52d834e93b9d4bc51769
BLAKE2b-256 14d0960ba80661748c9c2c45509b7bf30bc22d3e11253849ad8e363f8c0e26df

See more details on using hashes here.

Provenance

The following attestation bundles were made for emailtoolkit-0.1.2-py3-none-any.whl:

Publisher: release-pypi.yml on ImYourBoyRoy/emailtoolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page