Skip to main content

Customizable Web Scrapper to get alerts when criteria is met on web sites.

Project description

WebScraper

  • This program can scrap data from websites using different scrapers, and send an email when matches/ changes deadening on the scraper used
  • There are 2 types of scrapers:
    • Generic: Can scrap any website, but might not be as exact
    • Specific: Can scrap only specific websites, but will be more exact

Generic Scrapers

  • Text
  • Diff

Specific Scrapers

  • Cars.com

How to use

Text

  1. Set these specific env variables
  2.  SCRAPER=text # Scraper to use
     URL=<URL> # URL to scrape
     TEXT=<TEXT> # Text to look for
    
  3. Ensure all other required env variables are set

Diff

  1. Set these specific env variables
  2.  SCRAPER=diff # Scraper to use
     URL=<URL> # URL to scrape
     PERCENTAGE=<PERCENTAGE_DIFF> # Percentage difference to look for
    
  3. Ensure all other required env variables are set

Cars.com

  1. Set these specific env variables
  2.  SCRAPER=cars_com # Scraper to use
     URL=https://www.cars.com/shopping/results/ # URL to scrape, must be on the results page, for a specific search
    
  3. Ensure all other required env variables are set

Required env variables

SLEEP_TIME_SEC= # Time to sleep between each scrape
SENDER_EMAIL= # Email to send from
FROM_EMAIL= # Name to send from i.e. '"Web Scraper" <no-reply@jstockley.com>'
RECEIVER_EMAIL= # Email to send to
PASSWORD= # Password for the sender's email
SMTP_SERVER= # SMTP server to use
SMTP_PORT= # SMTP port to use
TLS= # True/False to use TLS

Running multiple of the same scraper

To run 2+ scrapers of the same type, i.e. 2 diff scrapers, make sure the host folder mapping is different Ex:

  diff-scraper-1:
    image: jnstockley/web-scraper:latest
    volumes:
      - ./diff-scraper-1-data/:/app/data/
    environment:
      - TZ=America/Chicago
      - SCRAPER=diff
      - URL=https://google.com
      - PERCENTAGE=5
      - SLEEP_TIME_SEC=21600

  diff-scraper-2:
    image: jnstockley/web-scraper:latest
    volumes:
      - ./diff-scraper-2-data/:/app/data/
    environment:
      - TZ=America/Chicago
      - SCRAPER=diff
      - URL=https://yahoo.com
      - PERCENTAGE=5
      - SLEEP_TIME_SEC=21600

Project details


Release history Release notifications | RSS feed

This version

1.2.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

js_web_scraper-1.2.5.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

js_web_scraper-1.2.5-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file js_web_scraper-1.2.5.tar.gz.

File metadata

  • Download URL: js_web_scraper-1.2.5.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for js_web_scraper-1.2.5.tar.gz
Algorithm Hash digest
SHA256 433f00cc6af72c28bfa0aa553b9ef352c624ebf8faeeba4c8a283123fc6e782c
MD5 c57436fb188946e66b603ed24da47f82
BLAKE2b-256 2dab1b5febeffa6d8c7c98be7b513528df40240278a4b7d567e06c9eba0241a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for js_web_scraper-1.2.5.tar.gz:

Publisher: ci-cd.yml on jnstockley/web-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file js_web_scraper-1.2.5-py3-none-any.whl.

File metadata

  • Download URL: js_web_scraper-1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for js_web_scraper-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ed0e7483ef53587eb825ad2e76b4fdb197aaa9c74844f6cb7619f0b425c6b8da
MD5 1ae39f7d5d72bf6bffc13981c3c8ef7d
BLAKE2b-256 2fb57dfe85ebad9b153f7ec28a7d7b04a594d3739b3b595dd7ae8d1538ebc5bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for js_web_scraper-1.2.5-py3-none-any.whl:

Publisher: ci-cd.yml on jnstockley/web-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page