Skip to main content

Dynamic web scraper specifically designed for websites that dynamically load elements (such as AngularJS). It runs at specified time intervals and notifies you about specific updates on the site.

Project description

😁 Welcome!!

Contents

🌐 Dynamic Web Scraper

💻 Windows and Linux compatible. 💻

This is a dynamic web scraper specifically designed for websites that have to wait for certain elements to load (such as AngularJS). It runs at specified time intervals. This can be used to monitor when a new element is added to the website instead of having to manually refresh it.

Instead of looking at the source, it waits until all elements are loaded to retrieve the results by using selenium and a Firefox driver.

Whenever a new element is discovered, it will notify you and save it to a file so that it doesn't notify you again for that same element in the future.

🚀 Quick start

pipx install dynamic-scraper
dynamic-scraper -u "https://www.example.com" -s "search this text"

💡 Use case examples

This is useful, for example to notify you when a certain keyword is found on a website, such as:

  • New job on a job board
  • New product on an online store
  • New article on a blog post
  • ...

✨ Features

  • Automated Scraping: Runs at user-defined intervals, extracting data without manual input.
  • Notification System: Notifies users via Windows notifications when new data is found.
  • Robust Parsing: Utilizes customizable search strings and regular expressions for data extraction.

📦 Installation

(Go below for manual installation.)

From PyPI

pipx is optional but recommended, you can use pip instead.

pipx:

pipx install dynamic-scraper

pip:

pip install dynamic-scraper

You can also clone the repository and install:

git clone https://github.com/P-ict0/Dynamic-Web-Scraper.git
cd Dynamic-Web-Scraper
pipx install .

📲 Usage

For help:

dynamic-scraper --help

General usage:

dynamic-scraper -u "https://www.example.com" -s "search this text"

Also see common errors if you encounter any issues with the browser.

⚙ Options

Option Short Form Requirement Default Description
--url -u Required None The URL of the webpage from which to fetch data.
--search-string -s Required None The string you want to search for within the webpage.
--version -V Optional None Get current version
--regex -r Optional search_string The regular expression pattern used to store the results nicely.
--interval -i Optional 5 The interval in minutes at which the script should run repeatedly.
--json_path -j Optional Windows: %APPDATA%/Roaming/Dynamic-Scraper/results.json
Linux: $HOME/.dynamic_scraper_results.json
The file path where the found results will be saved as JSON.
--use-previous -p Optional False Use results from previous runs, if present.
--no-headless None Optional False Disable headless mode for the webdriver and run maximized.
--verbose -v, -vv Optional None (threshold: WARNING) Increase verbosity level (-v, -vv, etc.). INFO for -v, DEBUG for -vv.
--quiet -q Optional False Suppress all notifications, only get output in the console.
--locator-type -t Optional xpath Type of locator to wait for the element to load. Options include various HTML attribute types.
--locator-value -l Optional //section[@class='list-item ng-scope'] Value of the locator to search for.

✏ Manual Installation

git clone https://github.com/P-ict0/AngularJS-Dynamic-Web-Scraper.git

Recommended to use a virtual environment:

python3 -m venv venv

source venv/bin/activate  # Linux
venv\Scripts\activate  # Windows
pip install -r requirements.txt

You can now run:

python src/web_scraper/scraper.py [args]

❌ Common errors

You may also need to install the latest geckodriver from here and add it to your PATH.

👥 Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your suggested changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dynamic_scraper-1.1.1.tar.gz (48.6 kB view details)

Uploaded Source

Built Distribution

dynamic_scraper-1.1.1-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file dynamic_scraper-1.1.1.tar.gz.

File metadata

  • Download URL: dynamic_scraper-1.1.1.tar.gz
  • Upload date:
  • Size: 48.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for dynamic_scraper-1.1.1.tar.gz
Algorithm Hash digest
SHA256 96347ce4a3bd92088c30d5a55bd3581c1856ff31a5987943d10b63d292b78e45
MD5 baa8820cc33cac36da25f8476d9405cf
BLAKE2b-256 029c435026a1cc5a624943ea03f8993c88d897ae7ec773952afbaf2cdfeeb3fb

See more details on using hashes here.

File details

Details for the file dynamic_scraper-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for dynamic_scraper-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7e8669187f9fd576f37ee4100c46748cc21e9e9982919371541774eac77e99f0
MD5 50a4f5e0a3d1c253679fd04b3691e4a5
BLAKE2b-256 3f354a32b41e17f8619bbf87fd8e27176eaffb6b62a19bf6dcb79468c99869b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page