A Python library to automatically detect and fix common typos in email addresses
Project description
Email Typo Fixer
A Python library to automatically detect and fix common typos in email addresses using intelligent algorithms and domain knowledge.
Features
- Smart Typo Detection: Uses Levenshtein distance algorithm to detect and correct email typos
- Domain Correction: Fixes common domain typos (e.g.,
gamil.com→gmail.com) - Extension Validation: Validates and corrects top-level domains using the Public Suffix List
- Email Normalization: Normalizes email addresses by removing invalid characters and formatting
- Configurable: Customizable typo dictionary and distance thresholds
- Logging Support: Built-in logging for debugging and monitoring
Installation
pip install email-typo-fixer
Quick Start
from email_typo_fixer import normalize_email, EmailTypoFixer
# Simple function interface
corrected_email = normalize_email("user@gamil.com")
print(corrected_email) # user@gmail.com
# Class interface for more control
fixer = EmailTypoFixer(max_distance=2)
corrected_email = fixer.normalize("user@yaho.com")
print(corrected_email) # user@yahoo.com
Usage Examples
Basic Email Correction
from email_typo_fixer import normalize_email
# Fix common domain typos
normalize_email("john.doe@gamil.com") # → john.doe@gmail.com
normalize_email("jane@yaho.com") # → jane@yahoo.com
normalize_email("user@outlok.com") # → user@outlook.com
normalize_email("test@hotmal.com") # → test@hotmail.com
# Fix extension typos
normalize_email("user@example.co") # → user@example.com
normalize_email("user@site.rog") # → user@site.org
Advanced Usage with Custom Configuration
from email_typo_fixer import EmailTypoFixer
import logging
# Create a custom logger
logger = logging.getLogger("email_fixer")
logger.setLevel(logging.INFO)
# Custom typo dictionary
custom_typos = {
'companytypo': 'company',
'orgtypo': 'org',
}
# Initialize with custom settings
fixer = EmailTypoFixer(
max_distance=3, # Allow more distant corrections
typo_domains=custom_typos, # Use custom typo dictionary
logger=logger # Use custom logger
)
# Fix emails with custom rules
corrected = fixer.normalize("user@companytypo.com")
print(corrected) # user@company.com
Email Validation and Normalization
from email_typo_fixer import EmailTypoFixer
fixer = EmailTypoFixer()
try:
# Normalize and validate
email = fixer.normalize(" USER@EXAMPLE.COM ")
print(email) # user@example.com
# Remove invalid characters
email = fixer.normalize("us*er@exam!ple.com")
print(email) # user@example.com
except ValueError as e:
print(f"Invalid email: {e}")
API Reference
normalize_email(email: str) -> str
Simple function interface for email normalization.
Parameters:
email(str): The email address to normalize
Returns:
str: The corrected and normalized email address
Raises:
ValueError: If the email cannot be fixed or is invalid
EmailTypoFixer
Main class for email typo correction with customizable options.
__init__(max_distance=2, typo_domains=None, logger=None)
Parameters:
max_distance(int): Maximum Levenshtein distance for extension corrections (default: 2)typo_domains(dict): Custom dictionary of domain typos to correctionslogger(logging.Logger): Custom logger instance
normalize(email: str) -> str
Normalize and fix typos in an email address.
Parameters:
email(str): The email address to normalize
Returns:
str: The corrected and normalized email address
Raises:
ValueError: If the email cannot be fixed or is invalid
Default Typo Corrections
The library includes built-in corrections for common email provider typos:
| Typo | Correction |
|---|---|
| gamil | gmail |
| gmial | gmail |
| gnail | gmail |
| gmaill | gmail |
| yaho | yahoo |
| yahho | yahoo |
| outlok | outlook |
| outllok | outlook |
| outlokk | outlook |
| hotmal | hotmail |
| hotmial | hotmail |
| homtail | hotmail |
| hotmaill | hotmail |
Error Handling
The library raises ValueError exceptions for emails that cannot be corrected:
from email_typo_fixer import normalize_email
try:
normalize_email("invalid.email") # Missing @ symbol
except ValueError as e:
print(f"Cannot fix email: {e}")
try:
normalize_email("user@") # Missing domain
except ValueError as e:
print(f"Cannot fix email: {e}")
Requirements
- Python 3.8+
- Levenshtein >= 0.25.0
- publicsuffixlist >= 0.10.0
Development
Setting up for Development
# Clone the repository
git clone https://github.com/yourusername/email-typo-fixer.git
cd email-typo-fixer
# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Activate the virtual environment
poetry shell
Running Tests
# Run tests with coverage
poetry run pytest
# Run tests with verbose output
poetry run pytest -v
# Run specific test file
poetry run pytest tests/test_email_typo_fixer.py
Code Quality
# Format code with Black
poetry run black email_typo_fixer tests
# Lint with flake8
poetry run flake8 email_typo_fixer tests
# Type checking with mypy
poetry run mypy email_typo_fixer
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
[0.1.0] - 2025-08-07
- Initial release
- Basic email typo correction functionality
- Support for domain and extension corrections
- Configurable typo dictionary
- Comprehensive test suite
Acknowledgments
- Uses the python-Levenshtein library for string distance calculations
- Uses publicsuffixlist for domain validation
- Inspired by various email validation libraries in the Python ecosystem
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file email_typo_fixer-0.1.0.tar.gz.
File metadata
- Download URL: email_typo_fixer-0.1.0.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
067891bd4a296b0f22df69b1d63765a934d463bd1dcf64473cea0d9214f2fa57
|
|
| MD5 |
326bf2357a1549b958381ae63b162a32
|
|
| BLAKE2b-256 |
0a53b3b7da089364ab1cecef2c53f4dc7d1daa4aa8e17990782949decb514ae1
|
File details
Details for the file email_typo_fixer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: email_typo_fixer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd773724e7d06341ad7485839064689ca0242a2189f825309e9c064d4888e83a
|
|
| MD5 |
093b08ac08b30b48e8f3849c5f169263
|
|
| BLAKE2b-256 |
4730e90935bc8470c3b2780c4f9d6e79dfa7d73855281fa881323003c9750c92
|