A Python tool for scraping email addresses from websites
Project description
PyMailScraper
PyMailScraper is a powerful and easy-to-use Python tool for scraping email addresses from websites. It can crawl multiple pages, respect throttling limits, and save results in a convenient CSV format.
Installation
You can install PyMailScraper using pip:
pip install pymailscraper
Usage
PyMailScraper can be used both as a command-line tool and as a Python library.
Command-line Usage
After installation, you can use PyMailScraper from the command line:
pymailscraper [OPTIONS]
Options:
-
-u
,--urls
: One or more URLs to scrape. You can provide multiple URLs separated by spaces. Example:pymailscraper -u https://example.com https://another-example.com
-
-f
,--file
: Path to a file containing URLs (one per line). Example:pymailscraper -f urls.txt
-
-o
,--output
: Output CSV file path (default: "email_results.csv"). Example:pymailscraper -u https://example.com -o my_results.csv
-
-d
,--depth
: Maximum depth to crawl (default: 3). Example:pymailscraper -u https://example.com -d 5
-
-p
,--pages
: Maximum number of pages to crawl per website (default: 100). Example:pymailscraper -u https://example.com -p 50
-
--common-pages-only
: Crawl only common pages (default: False). Example:pymailscraper -u https://example.com --common-pages-only
-
--use-common-pages
: Use common pages in crawling (default: False). Example:pymailscraper -u https://example.com --use-common-pages
-
--throttle
: Delay between requests in seconds (default: 0). Example:pymailscraper -u https://example.com --throttle 1.5
-
--auto-throttle
: Automatically adjust throttle on 'Too many requests' responses. Example:pymailscraper -u https://example.com --auto-throttle
-
--max-throttle
: Maximum throttle delay in seconds (default: 5). Example:pymailscraper -u https://example.com --auto-throttle --max-throttle 10
Python Library Usage
You can also use PyMailScraper in your Python scripts:
from pymailscraper import EmailScraper
scraper = EmailScraper(
output_file="results.csv",
max_depth=3,
max_pages=100,
throttle=1.0,
auto_throttle=True
)
urls = ["https://example.com", "https://another-example.com"]
scraper.run(urls)
Examples
- Scrape a single website:
pymailscraper -u https://example.com
- Scrape multiple websites:
pymailscraper -u https://example.com https://another-example.com
- Scrape websites from a file with custom output and depth:
pymailscraper -f urls.txt -o results.csv -d 5
- Use auto-throttling with a maximum of 50 pages per site:
pymailscraper -u https://example.com --auto-throttle -p 50
Output
PyMailScraper saves the results in a CSV file with the following columns:
- URL: The page where the email was found
- Email: The email address
- Name: Any associated name found (if available)
Ethical Usage
Please use this tool responsibly. Always respect the website's terms of service, robots.txt files, and any legal restrictions on scraping. Be mindful of the load you're putting on websites and use throttling when appropriate.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License.
Support
If you encounter any problems or have any questions, please open an issue on the GitHub repository.
Happy scraping!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pymailscraper-0.1.0.tar.gz
.
File metadata
- Download URL: pymailscraper-0.1.0.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a162d7d65f7a096546f813b3c6b372f27eca06aa2291fd2ec29ef5821e26981d |
|
MD5 | 8a3d364528c0b09b6303db003320e323 |
|
BLAKE2b-256 | dc0d36a89b4635d1dc3aeca17a0db8b0b576eed793a74b9e964ae2108a765dfd |
File details
Details for the file pymailscraper-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pymailscraper-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72f0160ea820b718d46f0cf551a024d2c0be50e26c957718423e68082d97934f |
|
MD5 | c75e81c5402e099860f748a4946efd99 |
|
BLAKE2b-256 | 1a93962a09e03a8ba0adda9380840c7740a0750e8a8e61157398f64c627b47e5 |