RFC-aware email parsing, normalization, extraction, and DNS health checks with env-config and a phonenumbers-like API.
Project description
📧 emailtoolkit
RFC‑aware email parsing, normalization, extraction, and DNS health checks with a clean, phonenumbers‑style API.
✨ Design goals
-
Simple API Be as easy as phonenumbers. Import module‑level functions for quick tasks, or instantiate
EmailToolsfor tuned, high‑performance use. -
Practical validation Separate syntax validation (via
email_validator) from deliverability checks. Enforce your own DNS policy (require MX, or allow A/AAAA fallback). -
Provider‑aware identity Correctly determine that
test.user@gmail.comandtestuser+sales@googlemail.comare the same identity using canonicalization rules. -
Operations‑ready Native env,
.env, andconfig.jsonsupport; PII‑safe logging; TTL‑cached DNS; robust CLI.
⭐ Features
- Automatic Cloudflare Decoding: Transparently finds and decodes Cloudflare-protected email addresses from HTML.
- Robust Extraction: Discovers emails from free text,
mailto:links, and other common formats. - Canonical Identity: Intelligently compares emails, understanding that
test.user@gmail.comis the same astestuser+sales@googlemail.com. - DNS Health Checks: Validates domain deliverability by checking for MX and A/AAAA records.
- Disposable Domain Filtering: Flags or blocks emails from known disposable providers.
- Configurable: Fine-tune behavior with environment variables,
.envfiles, or aconfig.json.
🚀 Installation
pip install emailtoolkit
# extras for DNS and .env support
pip install "emailtoolkit[dns,dotenv]"
🧪 Quick start
import emailtoolkit as et
# Validate
et.is_valid("Test.User+sales@Gmail.com") # True
# Canonical form (provider‑specific rules)
et.canonical("t.e.s.t+sales@googlemail.com") # "test@gmail.com"
# Compare by canonical identity
et.compare("t.e.s.t+sales@googlemail.com", "test@gmail.com") # True
# Extract from free text (returns Email objects)
found = et.extract("Contact a@example.com, A@EXAMPLE.com, and junk@@bad.")
print([e.normalized for e in found]) # ["a@example.com", "A@example.com"]
🛠️ Command‑line interface (CLI)
# Canonical form
emailtoolkit canonical "t.e.s.t+bar@googlemail.com"
# → test@gmail.com
# Domain DNS health (JSON)
emailtoolkit domain example.com
# {
# "domain": "example.com",
# "ascii_domain": "example.com",
# "mx_hosts": [],
# "a_hosts": ["93.184.216.34"],
# "has_mx": false,
# "has_a": true,
# "disposable": false
# }
# Extract from stdin
echo "Contact me at a@example.com" | emailtoolkit extract
⚙️ Configuration
Load precedence:
- Environment variables (e.g.,
EMAILTK_LOG_LEVEL) .envin the working directory (requiresdotenvextra)config.json(when passed to CLI--configorbuild_tools("/path/to/config.json"))- Internal defaults
Environment variables (full)
| Variable | Type | Default | Description |
|---|---|---|---|
EMAILTK_LOG_LEVEL |
str | INFO |
Logging level: DEBUG INFO WARNING ERROR |
EMAILTK_REQUIRE_MX |
bool | true |
If true, deliverability requires MX. If false, MX or A/AAAA is enough |
EMAILTK_REQUIRE_DELIVERABILITY |
bool | false |
If true, parse raises if deliverability fails |
EMAILTK_ALLOW_SMTPUTF8 |
bool | true |
Allow UTF‑8 local parts per RFC 6531 |
EMAILTK_DNS_TIMEOUT_SECONDS |
float | 2.0 |
DNS timeout seconds |
EMAILTK_DNS_TTL_SECONDS |
int | 900 |
TTL for cached DNS answers |
EMAILTK_USE_DNSPYTHON |
bool | true |
Use dnspython when available |
EMAILTK_EXTRACT_UNIQUE |
bool | true |
Deduplicate by canonical form during extraction |
EMAILTK_EXTRACT_MAX_RESULTS |
int or empty | empty | Hard cap on extractor results. Empty or 0 means no cap |
EMAILTK_NORMALIZE_CASE |
bool | true |
Lowercase domain on normalize |
EMAILTK_GMAIL_CANON |
bool | true |
Apply Gmail dot and plus canonicalization rules |
EMAILTK_TREAT_DISPOSABLE_AS_INVALID |
bool | false |
If true, disposable domains cause parse to raise |
EMAILTK_BLOCK_PRIVATE_TLDS |
bool | false |
Enforce known public suffixes if provided |
EMAILTK_PUBLIC_SUFFIX_FILE |
path | empty | File with known public suffixes, one per line |
EMAILTK_DISPOSABLE_SOURCE |
file://... or url://... or none |
none |
Source for disposable domains |
EMAILTK_ENABLE_SMTP_PROBE |
bool | false |
Reserved for optional SMTP probing module |
EMAILTK_SMTP_PROBE_TIMEOUT |
float | 3.0 |
Probe timeout |
EMAILTK_SMTP_PROBE_CONCURRENCY |
int | 5 |
Probe concurrency |
EMAILTK_SMTP_PROBE_HELO |
str | example.com |
HELO/EHLO identity |
EMAILTK_PII_REDACT_LOGS |
bool | true |
Mask emails in logs and exceptions |
EMAILTK_PII_REDACT_STYLE |
mask or none |
mask |
Redaction style |
See .env.example for a ready‑to‑copy template.
🧱 Disposable domain filtering
Create a text file and point to it:
# disposable.txt
# Lines beginning with # are comments
# Domains are matched case‑insensitively on ASCII form
mailinator.com
10minutemail.com
sharklasers.com
Enable via .env:
EMAILTK_DISPOSABLE_SOURCE=file://./disposable.txt
Optionally set:
EMAILTK_TREAT_DISPOSABLE_AS_INVALID=true
This will raise EmailParseException when parsing addresses on those domains.
🤖 Agents, MCP servers, and tool‑calling
from pydantic import BaseModel, Field
import emailtoolkit as et
class EmailInput(BaseModel):
email: str = Field(..., description="Email address to parse")
class DomainInput(BaseModel):
domain: str = Field(..., description="Domain to inspect")
def tool_parse(args: EmailInput):
e = et.parse(args.email)
return {
"normalized": e.normalized,
"canonical": e.canonical,
"deliverable": e.deliverable_dns,
"domain": e.domain_info.ascii_domain,
}
def tool_domain(args: DomainInput):
d = et.domain_health(args.domain)
return {
"domain": d.ascii_domain,
"has_mx": d.has_mx,
"has_a": d.has_a,
"disposable": d.disposable,
}
📚 API surface
import emailtoolkit as et
from emailtoolkit import EmailTools, Email, DomainInfo, EmailParseException
# module functions
et.parse(raw: str) -> Email
et.is_valid(raw: str) -> bool
et.normalize(raw: str) -> str
et.canonical(raw: str) -> str
et.extract(text: str) -> list[Email]
et.compare(a: str, b: str) -> bool
et.domain_health(domain: str) -> DomainInfo
et.build_tools(overrides_path: str | None = None) -> EmailTools
# dataclasses
Email(
original, local, domain, ascii_email, normalized, canonical,
domain_info: DomainInfo, valid_syntax: bool, deliverable_dns: bool, reason: str|None
)
DomainInfo(domain, ascii_domain, mx_hosts, a_hosts, has_mx, has_a, disposable)
🔒 Security & privacy
- PII redaction in logs is on by default (
EMAILTK_PII_REDACT_LOGS). - Avoid logging raw addresses in your application.
- If SMTP probing is enabled in the future, keep it opt‑in, rate‑limited, and legally reviewed.
🧰 Development
pip install -e ".[dns,dotenv]" pytest ruff mypy
ruff check src
mypy src/emailtoolkit
pytest -q
🙏 Acknowledgments
Built on:
- email_validator by Joshua Tauberer (Unlicense)
- dnspython (ISC) [optional]
- idna (BSD‑3‑Clause)
See THIRD_PARTY_NOTICES.md for license texts.
📦 License
MIT. See LICENSE. Third‑party licenses in THIRD_PARTY_NOTICES.md.
⭐ Support
If this toolkit helps you, star the repo and share it. Issues and PRs welcome.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file emailtoolkit-0.1.3.tar.gz.
File metadata
- Download URL: emailtoolkit-0.1.3.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e0f2f0e37b6e3eaeb2e3aedd919e3bca708bf56d36224655528e9769f1ab35e
|
|
| MD5 |
831614b5ec0c51551c742b307929123a
|
|
| BLAKE2b-256 |
4cd95fa3c7e3d0ad58a98ac49005daae023406aac5fee151a8e5385ceed1a121
|
Provenance
The following attestation bundles were made for emailtoolkit-0.1.3.tar.gz:
Publisher:
release-pypi.yml on ImYourBoyRoy/emailtoolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
emailtoolkit-0.1.3.tar.gz -
Subject digest:
9e0f2f0e37b6e3eaeb2e3aedd919e3bca708bf56d36224655528e9769f1ab35e - Sigstore transparency entry: 391828386
- Sigstore integration time:
-
Permalink:
ImYourBoyRoy/emailtoolkit@90d443cf09045e60c3d64e2cb82bb0e23434b0e1 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/ImYourBoyRoy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@90d443cf09045e60c3d64e2cb82bb0e23434b0e1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file emailtoolkit-0.1.3-py3-none-any.whl.
File metadata
- Download URL: emailtoolkit-0.1.3-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26cff2af89632a02e1715b62b990b3a724ef26a3eeb10916d7543e5318815145
|
|
| MD5 |
761eeba337819a9820e1fdb37c376079
|
|
| BLAKE2b-256 |
d63a9d472ee51b122fbc29021f57e5f6c77f428213eae024456e867efbc0964b
|
Provenance
The following attestation bundles were made for emailtoolkit-0.1.3-py3-none-any.whl:
Publisher:
release-pypi.yml on ImYourBoyRoy/emailtoolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
emailtoolkit-0.1.3-py3-none-any.whl -
Subject digest:
26cff2af89632a02e1715b62b990b3a724ef26a3eeb10916d7543e5318815145 - Sigstore transparency entry: 391828404
- Sigstore integration time:
-
Permalink:
ImYourBoyRoy/emailtoolkit@90d443cf09045e60c3d64e2cb82bb0e23434b0e1 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/ImYourBoyRoy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@90d443cf09045e60c3d64e2cb82bb0e23434b0e1 -
Trigger Event:
release
-
Statement type: