Skip to main content

Watch webpages for changes

Project description

WatchPage

Travis CI Build Status CircleCI Build Status PyPI - Version PyPI - Python Version

Description: Watch webpages for changes

Copyright: 2022-2023 Fabio Castelli (Muflone) muflone@muflone.com

License: GPL-3+

Source code: https://github.com/muflone/watchpage

Documentation: http://www.muflone.com/watchpage/

Description

WatchPage is a simple tool to watch multiple web pages for changes.

It aims to ease the software maintainers to check for changes to the project sites and get any news based on patterns.

System Requirements

Usage

WatchPage is a command line utility and it requires some arguments to be passed:

watchpage --config <CONFIGURATION> --results <RESULTS> [--dump] [--agent <USER AGENT>]

The argument --config refers to a valid YAML configuration file (see below for some examples).

The argument --results must be the path to a directory where to save the results files.

The argument --dump will show the results but it will discard the changes, so they will not be saved in the directory specified in the --results argument.

The argument --agent will be used as default User-Agent for the HTTP/HTTPS requests. If not specified it will use the default WatchPage user agent. You can also pass "" to omit the default user agent.

An example to execute WatchPage will be the following:

watchpage --config docs/muflone_apps.yaml --results output

All the targets specified in the configuration file muflone_apps.yaml will be processed, results will be saved in the output directory and the differences will be printed in the stdout.

Launching again the previous command you will not get any results as there will not be further changes after the previous run. The saved items will be stored in the directory specified in the results argument.

Adding --dump you can observe the returned values but the changes will not be saved.

Configuration file

A configuration file is a YAML specification file with the following values:

  • NAME: a unique string to identify the target to process

  • URL: the page URL to monitor for changes

    You can also specify github:name/repository to point to a GitHub repository

  • PARSER: the parser to use to process the URL. This can be either:

    • html.parser: this will use the default Python HTML parser
    • html5lib: this will use html5lib to process the page
    • lxml: this will use lxml HTML parser
    • xml: this will use lxml XML parser
  • TYPE: specify the type of items to process from the page. This value can be:

    • links: will get all the anchors from a HTML page
    • rss: will get all the link items from a RSS feed
    • text: will process the page as a simple text file
    • github-tags: will get all the tag anchors from a GitHub repository
    • github-tags-zip: will get all the tag anchors from a GitHub repository, filtering only those in .zip format
    • github-tags-tgz: will get all the tag anchors from a GitHub repository, filtering only those in .tar.gz format
  • ABSOLUTE_URLS: a boolean value (true/false) to make the processed URLs as absolute by appending the website from the URL page

  • FILTERS: a list of filters to apply to find the matched items. This can be any of the following:

    • STARTS: the item must begin with the specified string
    • NOT STARTS: the item must not begin with the specified string
    • ENDS: the item must end with the specified string
    • NOT ENDS: the item must not end with the specified string
    • CONTAINS: the item must contain the specified string
    • NOT CONTAINS: the item must not contain the specified string
    • REGEX: the item must match the specified regular expression string
    • NOT REGEX: the item must not match the specified regular expression string
    • TRIM: removes spaces or the specified characters from both left and right
    • LTRIM: removes spaces or the specified characters from the left
    • RTRIM: removes spaces or the specified characters from the right
    • PREPEND: prepend (insert at the start) the specified text
    • APPEND: append (insert at the end) the specified text
    • REMOVE: remove from the item the specified text
    • REPLACE: replace from the item the specified text with a new pattern (specified using WITH:)
    • REVERSE: reverse the item text
    • UPPER: makes the text uppercase
    • LOWER: makes the text lowercase
    • LEFT: return the first leftmost characters
    • RIGHT: return the first rightmost characters
    • REGEX REPLACE: replace from the item a pattern using a regular expression with a new pattern (specified using WITH:)
    • REGEX SEARCH: return the first regular expression match
    • JSON DICT: return the value from a JSON dict with the specified key
    • JSON LIST: return the value from a JSON list with the specified index
  • HEADERS: a dictionary with the headers to set for the request

  • STATUS: a boolean value (true/false) to enable or disable the target

Configuration example files

Some configuration example files can be found in the docs directory.

NAME: watchpage
URL: https://github.com/muflone/watchpage/tags
PARSER: html5lib
TYPE: links
ABSOLUTE_URLS: true
FILTERS:
  - STARTS: 'https://github.com/muflone/'
  - ENDS: '.tar.gz'
STATUS: true

This configuration file will use the html5lib parser to scan all the links in the page that begin with https://github.com/muflone/ and ending with .tar.gz


NAME: watchpage
URL: github:muflone/watchpage
PARSER: html5lib
TYPE: github-tags-tgz
ABSOLUTE_URLS: true
STATUS: true

This configuration file will use the html5lib parser to scan all the tags links for the GitHub repository only extracting the tags ending with .tar.gz


NAME: watchpage
URL: github:muflone/watchpage
PARSER: html5lib
TYPE: github-tags
ABSOLUTE_URLS: true
FILTERS:
  - ENDS: '.tar.gz'
  - REMOVE RIGHT: '.tar.gz'
  - APPEND: '.something'
  - REPLACE: '.something'
    WITH: '.different'
STATUS: true

This configuration file will use the html5lib parser to scan all the tags links for the GitHub repository only extracting the tags ending with .tar.gz and applies some text replacements.


NAME: watchpage
URL: https://github.com/muflone/watchpage/tags
PARSER: html5lib
TYPE: links
ABSOLUTE_URLS: true
FILTERS:
  - STARTS: 'https://github.com/muflone/'
  - ENDS: '.tar.gz'
HEADERS:
  User-Agent: 'WatchPage'
  Foo: 'Bar'
STATUS: true

Custom headers can be specified for each request.


NAME: dbeaver_plugins
URL: https://dbeaver.io/update/ce/latest/plugins/
PARSER: html.parser
TYPE: text
FILTERS:
  - CONTAINS: '.jar'
STATUS: false

This configuration file will use the html parser to scan all the lines in the page containing the text .jar


NAME: gmtp
URL: https://sourceforge.net/projects/gmtp/rss
PARSER: xml
TYPE: rss
FILTERS:
  - ENDS: '.tar.gz/download'
STATUS: true

This configuration file will use the xml parser to scan all the links in the RSS feed ending with .tar.gz/download

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

WatchPage-0.4.1.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

WatchPage-0.4.1-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file WatchPage-0.4.1.tar.gz.

File metadata

  • Download URL: WatchPage-0.4.1.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for WatchPage-0.4.1.tar.gz
Algorithm Hash digest
SHA256 131ef9066a27aba75ab01aa51691971e4a4c69b48f5504d5290f9b7a12ecb903
MD5 14373942e8f35d3a29d89f58cc3af830
BLAKE2b-256 0a7534c34d33d6c7e59e95c3660e32b689eea4862ccf3866a946e66b1bdfd1ca

See more details on using hashes here.

File details

Details for the file WatchPage-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: WatchPage-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for WatchPage-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e79adedaff19b0df85bf54edae778f472a519873bd68d8c9abfce147e24058e6
MD5 fa32c680a70c699b2c58d1112d446439
BLAKE2b-256 8e3880360120e4cfe4ef827dc6b6c6a2b118a2e7b56934e6ce7d8f34b469c4ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page