Skip to main content

Un outil Python pour extraire des adresses email depuis des pages web ou des fichiers texte.

Project description

PyPI Downloads PyPI GitHub issues GitHub last commit License Python

mailgrab

mailgrab is a Python tool designed to extract email addresses from web pages or text files. It uses regular expressions for email extraction and Playwright for web scraping. This tool is perfect for collecting email addresses from multiple sources.

Features

  • Extracts email addresses from a URL or text file
  • Extracts emails from mailto: links in HTML
  • Uses Playwright for headless web scraping
  • Searches with regular expressions
  • Simple command-line interface (CLI)
  • Can be used as a Python module

Installation

Install mailgrab from PyPI:

pip install mailgrab

⚠️ Make sure to install Playwright browsers:

python -m playwright install

Usage

CLI (Command Line)

$ mailgrab --help                            
usage: mailgrab [-h] (--url WEBSITE_URL | --file PATH_TO_FILE) [-v]

Collection of emails in text file or website page.

options:
  -h, --help           show this help message and exit
  --url WEBSITE_URL    Website url to read and extract emails
  --file PATH_TO_FILE  Path to file to read and extract emails
  -v, --version        show program's version number and exit

Examples

mailgrab --url "https://example.com"        # Extract emails from https://example.com
mailgrab --file "file.txt"                  # Extract emails from file.txt
mailgrab -v                                 # Show program's version

As a Python module

import mailgrab as mgb  # or from mailgrab import *

# Validate the path to a file containing emails
path = mgb.validate_path("file.txt")

# Read file content
with open(path, "r") as f:
    content = f.read()

# Extract emails from content
emails = mgb.extract_emails(content)

# Display emails using the built-in printer
mgb.print_emails(emails)

CLI Example Output

[¤] Found 3 unique email address(es):

 1) contact@example.com
 2) info@example.org
 3) support@sample.net

Path validation

When using the --file option or validate_path() function, mailgrab ensures:

  • the path exists,
  • it is a valid file,
  • it can be opened for reading.

If not, a MailgrabError with a clear message is raised.

Contributing

Want to improve this project? Awesome! Please read the contributing guidelines before submitting a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mailgrab-1.3.0.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mailgrab-1.3.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file mailgrab-1.3.0.tar.gz.

File metadata

  • Download URL: mailgrab-1.3.0.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for mailgrab-1.3.0.tar.gz
Algorithm Hash digest
SHA256 bf5def022ece76c09801798cf6f2c92abb36fb9c69270028cb66ac4508f7dd43
MD5 b9d0bb925bb8fd0b38dc2ee840a52cca
BLAKE2b-256 7bb3ebb234bd0249f2b0015f5e8a389cbccd5de45af06627feb6ab702b9fa1fc

See more details on using hashes here.

File details

Details for the file mailgrab-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: mailgrab-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for mailgrab-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5b74d068f8af59632acd0d92c88b3086091bc2084b8032e2081cac510b3504c
MD5 48fb38a69cca3130c4a97774c5a57959
BLAKE2b-256 ab9fcbd9778e5e28c232e66bc6e46f73634abeaaea3df42e58d63a76c6850104

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page