Skip to main content

Extract email addresses from given URL.

Project description

Extract emails from a given website

Documentation

Requirements

  • Python >= 3.6

  • requests

  • selenium

Installation

pip install extract_emails

Usage

With default browsers

from extract_emails import EmailExtractor
from extract_emails.browsers import ChromeBrowser


with ChromeBrowser() as browser:
    email_extractor = EmailExtractor("http://www.tomatinos.com/", browser, depth=2)
    emails = email_extractor.get_emails()


for email in emails:
    print(email)
    print(email.as_dict())

# Email(email="bakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
# {'email': 'bakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}
# Email(email="freshlybakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
# {'email': 'freshlybakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}
from extract_emails import EmailExtractor
from extract_emails.browsers import RequestsBrowser


with RequestsBrowser() as browser:
    email_extractor = EmailExtractor("http://www.tomatinos.com/", browser, depth=2)
    emails = email_extractor.get_emails()


for email in emails:
    print(email)
    print(email.as_dict())

# Email(email="bakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
# {'email': 'bakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}
# Email(email="freshlybakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
# {'email': 'freshlybakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}

With custom browser

from extract_emails import EmailExtractor
from extract_emails.browsers import BrowserInterface

from selenium import webdriver
from selenium.webdriver.firefox.options import Options


class FirefoxBrowser(BrowserInterface):
    def __init__(self):
        ff_options = Options()
        self._driver = webdriver.Firefox(
            options=ff_options, executable_path="/home/di/geckodriver",
        )

    def close(self):
        self._driver.quit()

    def get_page_source(self, url: str) -> str:
        self._driver.get(url)
        return self._driver.page_source


with FirefoxBrowser() as browser:
    email_extractor = EmailExtractor("http://www.tomatinos.com/", browser, depth=2)
    emails = email_extractor.get_emails()

for email in emails:
    print(email)
    print(email.as_dict())

# Email(email="bakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
# {'email': 'bakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}
# Email(email="freshlybakedincloverdale@gmail.com", source_page="http://www.tomatinos.com/")
# {'email': 'freshlybakedincloverdale@gmail.com', 'source_page': 'http://www.tomatinos.com/'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract_emails-4.0.3.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

extract_emails-4.0.3-py2.py3-none-any.whl (17.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file extract_emails-4.0.3.tar.gz.

File metadata

  • Download URL: extract_emails-4.0.3.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.2.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for extract_emails-4.0.3.tar.gz
Algorithm Hash digest
SHA256 ec1d7ac7e8e891ef78dc45f72681ad84b4b285467fad93461ec592b3260056aa
MD5 01113f602947ce8cf4ec0dd5600af29e
BLAKE2b-256 a3fa007654d140f58ebb5d1b136bac636484ac55103e0ed03ee96dff96ab6be1

See more details on using hashes here.

File details

Details for the file extract_emails-4.0.3-py2.py3-none-any.whl.

File metadata

  • Download URL: extract_emails-4.0.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.2.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for extract_emails-4.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c7c0f558134970f83ef1573338de468efff6b24c7846032d3e0ddffb9fcb3ace
MD5 6506a87adecf9bb7ef5a810c0bdbef6f
BLAKE2b-256 b2aeab0304fc0985c89c16e3aa2f420c9fc20d09a3a483744adc866000fa5aa8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page