Skip to main content

Un outil Python pour extraire des adresses email depuis des pages web ou des fichiers texte.

Project description

PyPI Downloads PyPI GitHub issues GitHub last commit License Python

mailgrab

mailgrab is a Python tool designed to extract email addresses from web pages or text files. It uses regular expressions for email extraction and Playwright for web scraping. This tool is perfect for collecting email addresses from multiple sources.

Features

  • Extracts email addresses from a URL or text file
  • Uses Playwright for headless web scraping
  • Searches with regular expressions
  • Simple command-line interface (CLI)
  • Can be used as a Python module

Installation

Install mailgrab from PyPI:

pip install mailgrab

⚠️ Make sure to install Playwright browsers:

python -m playwright install

Usage

CLI (Command Line)

$ mailgrab --help                            
usage: mailgrab [-h] (--url WEBSITE_URL | --file PATH_TO_FILE) [-v]

Collection of emails in text file or website page.

options:
  -h, --help           show this help message and exit
  --url WEBSITE_URL    Website url to read and extract emails
  --file PATH_TO_FILE  Path to file to read and extract emails
  -v, --version        show program's version number and exit

Examples

mailgrab --url "https://example.com"        # Extract emails from https://example.com
mailgrab --file "file.txt"                  # Extract emails from file.txt
mailgrab -v                                 # Show program's version

As a Python module

import mailgrab as mgb  # or from mailgrab import *

# Validate the path to a file containing emails
path = mgb.validate_path("file.txt")

# Read file content
with open(path, "r") as f:
    content = f.read()

# Extract emails from content
emails = mgb.extract_emails(content)

# Display emails using the built-in printer
mgb.print_emails(emails)

CLI Example Output

[¤] Found 3 unique email address(es):

 1) contact@example.com
 2) info@example.org
 3) support@sample.net

Path validation

When using the --file option or validate_path() function, mailgrab ensures:

  • the path exists,
  • it is a valid file,
  • it can be opened for reading.

If not, a MailgrabError with a clear message is raised.

Contributing

Want to improve this project? Awesome! Please read the contributing guidelines before submitting a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mailgrab-1.2.1.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mailgrab-1.2.1-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file mailgrab-1.2.1.tar.gz.

File metadata

  • Download URL: mailgrab-1.2.1.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for mailgrab-1.2.1.tar.gz
Algorithm Hash digest
SHA256 393aa7c09b94871e5ec6edcd8fac8f39e7d8c73fca56a09b62df80c2cef32c98
MD5 80095834a581001c529fb1be74488bab
BLAKE2b-256 71ab877e9d21b4d118bda7c0138ad65ce3338b6a20321f3e22a4b42c3bcd70cd

See more details on using hashes here.

File details

Details for the file mailgrab-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: mailgrab-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for mailgrab-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 253a26ec34a8cfb914b20904781fe07721c41f0cfdf57deb4f4b1f27e7ada7ec
MD5 655970b9ef7e0caff412df5c7ba130e1
BLAKE2b-256 874f6f887dcdfa06e59258b3bc79d31ac9787eb2c7699c0eedd90b497c33b726

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page