Un outil Python pour extraire des adresses email depuis des pages web ou des fichiers texte.
Project description
mailgrab
mailgrab is a Python tool designed to extract email addresses from web pages or text files. It uses regular expressions for email extraction and Playwright for web scraping. This tool is perfect for collecting email addresses from multiple sources.
Features
- Extracts email addresses from a URL or text file
- Extracts emails from
mailto:links in HTML - Uses Playwright for headless web scraping
- Searches with regular expressions
- Simple command-line interface (CLI)
- Can be used as a Python module
Installation
Install mailgrab from PyPI:
pip install mailgrab
⚠️ Make sure to install Playwright browsers:
python -m playwright install
Usage
CLI (Command Line)
$ mailgrab --help
usage: mailgrab [-h] (--url WEBSITE_URL | --file PATH_TO_FILE) [-v]
Collection of emails in text file or website page.
options:
-h, --help show this help message and exit
--url WEBSITE_URL Website url to read and extract emails
--file PATH_TO_FILE Path to file to read and extract emails
-v, --version show program's version number and exit
Examples
mailgrab --url "https://example.com" # Extract emails from https://example.com
mailgrab --file "file.txt" # Extract emails from file.txt
mailgrab -v # Show program's version
As a Python module
import mailgrab as mgb # or from mailgrab import *
# Validate the path to a file containing emails
path = mgb.validate_path("file.txt")
# Read file content
with open(path, "r") as f:
content = f.read()
# Extract emails from content
emails = mgb.extract_emails(content)
# Display emails using the built-in printer
mgb.print_emails(emails)
CLI Example Output
[¤] Found 3 unique email address(es):
1) contact@example.com
2) info@example.org
3) support@sample.net
Path validation
When using the --file option or validate_path() function, mailgrab ensures:
- the path exists,
- it is a valid file,
- it can be opened for reading.
If not, a MailgrabError with a clear message is raised.
Contributing
Want to improve this project? Awesome! Please read the contributing guidelines before submitting a pull request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mailgrab-1.3.0.tar.gz.
File metadata
- Download URL: mailgrab-1.3.0.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf5def022ece76c09801798cf6f2c92abb36fb9c69270028cb66ac4508f7dd43
|
|
| MD5 |
b9d0bb925bb8fd0b38dc2ee840a52cca
|
|
| BLAKE2b-256 |
7bb3ebb234bd0249f2b0015f5e8a389cbccd5de45af06627feb6ab702b9fa1fc
|
File details
Details for the file mailgrab-1.3.0-py3-none-any.whl.
File metadata
- Download URL: mailgrab-1.3.0-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5b74d068f8af59632acd0d92c88b3086091bc2084b8032e2081cac510b3504c
|
|
| MD5 |
48fb38a69cca3130c4a97774c5a57959
|
|
| BLAKE2b-256 |
ab9fcbd9778e5e28c232e66bc6e46f73634abeaaea3df42e58d63a76c6850104
|