Skip to main content

Monitors a website for changes in a text element and publishes an alert (e.g., on telegram)

Project description

GitHub License Python Version from PEP 621 TOML

3.10 3.11 3.12
tests tests tests
Endpoint Badge Endpoint Badge Endpoint Badge

WebWatchr

This python package is a framework around Playwright to monitor a website and receive an alert if the monitored text changes. The setup is quite modular. To specify the website to monitor, you need to define a Callable[[Playwright], str] which is responsible to extract the text you are interested in. Currently, the only available alerting channel is via telegram bot. However, more alerting channels will follow.

[!IMPORTANT] Before you start scraping any website, please make sure that you are allowed to. Besides legal obligations, please consider reaching out to the website owner and please respect robots.txtfiles.

Installation

The package is available via PyPI. You can install it via

pip install web_watchr

If you prefer the latest changes, you can also install it directly from the repository via:

pip install git+https://github.com/Emrys-Merlin/web_watchr

Usage

After the installation, the intended way to invoke the framework is by writing a small runner script (which you can find here):

from playwright.sync_api import Playwright
from web_watchr import Watchr
from web_watchr.compare import DummyComparer

watchr = Watchr(
    comparer=DummyComparer(),
)


@watchr.set_poller
def poll(playwright: Playwright) -> str:
    browser = playwright.chromium.launch(headless=True)
    context = browser.new_context()
    page = context.new_page()
    page.goto("https://www.example.com/")
    text = page.get_by_role("heading").inner_text()
    context.close()
    browser.close()

    return text


if __name__ == "__main__":
    watchr()

The runner consists of three parts:

  1. A new Watchr object is initialized. For illustration purposes, a DummyComparer instance is passed to it, which will indicate that the monitored text has changed no matter the input.
  2. We implement the poll function and decorate it with @watchr.set_poller. The poll function contains all the website-specific logic to extract the text of interest. Most of this function can be automatically generated using playwright codegen.
  3. We invoke watchr, which will poll the website once.

By default, watchr will simply print the text to std out. If you want to receive alerts on your phone via telegram, we need to modify the script slightly:

import os

from playwright.sync_api import Playwright
from web_watchr import Watchr
from web_watchr.alert import TelegramAlerter

watchr = Watchr(
    alerter=TelegramAlerter(
        token=os.getenv("TELEGRAM_TOKEN"),
        chat_id=os.getenv("TELEGRAM_CHAT_ID"),
    )
)


@watchr.set_poller
def poll(playwright: Playwright) -> str:
    browser = playwright.chromium.launch(headless=True)
    context = browser.new_context()
    page = context.new_page()
    page.goto("https://www.example.com/")
    text = page.get_by_role("heading").inner_text()
    context.close()
    browser.close()

    return text


if __name__ == "__main__":
    watchr()

There are two key changes compared to the inital script:

  1. We removed the DummyComparer. By default, Watchr uses an FSComparer which stores the old state in a file. The default location is ~/.local/share/web_watchr/cache, which can be adapted. This has the advantage that the runner does not need to run continously, but can be invoked periodically (e.g., via cron).
  2. We instantiated a TelegramAlerter reading a token and a chat_id from some environment variables. These are secrets of your bot that you need to send messages with it. If you are unsure how to create a bot, please have a look here. To find out your chat_id, you can use the approach mentioned here.

[!CAUTION] Keep your bot token secret. In particular, make sure to never add it to version control. Otherwise, malicious actors can use it for ther purposes.

Running the script will now send updates to your phone via telegram!

Documentation

So far, almost all of the documentation is restricted to this readme. However, you can have a look at the API Reference.

Contribution

If you like what you see and would like to extend it, you can do so by

  • filing an issue with a feature request (no promises on my part though) and
  • forking the repo and opening a pull request.

I'm always happy to chat, so you can also simply reach out and we can talk.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

web_watchr-0.2.1.tar.gz (51.9 kB view details)

Uploaded Source

Built Distribution

web_watchr-0.2.1-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file web_watchr-0.2.1.tar.gz.

File metadata

  • Download URL: web_watchr-0.2.1.tar.gz
  • Upload date:
  • Size: 51.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for web_watchr-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f55dadc6145c8a93d8371b057c58c7e29a207e39a28bb3a886ddddfe03da6fb0
MD5 9c548b2d2f08ffc5fbe1e3dbe18e229b
BLAKE2b-256 9f626d9c95b3b056aff554b9a580109cfb13bce6a84a3d4c2d0c0e42ed8a99c2

See more details on using hashes here.

File details

Details for the file web_watchr-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: web_watchr-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for web_watchr-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 28dc9d5a876dbb6f762ddd0ab5213455b93d44ab2df14a047d84492ea1e4767c
MD5 14d5b314f778404c84cb60fc9bc162b3
BLAKE2b-256 c6ee224a836a9beb4e69149663800678edf67d0c4704c3d3584c6aa466d2fa1f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page