Watch webpages for changes
Project description
WatchPage
Description: Watch webpages for changes
Copyright: 2022-2023 Fabio Castelli (Muflone) muflone@muflone.com
License: GPL-3+
Source code: https://github.com/muflone/watchpage
Documentation: http://www.muflone.com/watchpage/
Description
WatchPage is a simple tool to watch multiple web pages for changes.
It aims to ease the software maintainers to check for changes to the project sites and get any news based on patterns.
System Requirements
- Python 3.x
- PyYAML 6.0 (https://pypi.org/project/PyYAML/)
- BeautifulSoup4 4.x (https://pypi.org/project/beautifulsoup4/)
- lxml 4.9 (https://pypi.org/project/lxml/)
- html5lib 1.1 (https://pypi.org/project/html5lib/)
Usage
WatchPage is a command line utility and it requires some arguments to be passed:
watchpage --config <CONFIGURATION> --results <RESULTS> [--dump] [--agent <USER AGENT>]
The argument --config
refers to a valid YAML configuration file
(see below for some examples).
The argument --results
must be the path to a directory where to save the
results files.
The argument --dump
will show the results but it will discard the changes, so
they will not be saved in the directory specified in the --results
argument.
The argument --agent
will be used as default User-Agent for the HTTP/HTTPS
requests. If not specified it will use the default WatchPage user agent.
You can also pass ""
to omit the default user agent.
An example to execute WatchPage will be the following:
watchpage --config docs/muflone_apps.yaml --results output
All the targets specified in the configuration file muflone_apps.yaml
will be
processed, results will be saved in the output
directory and the differences
will be printed in the stdout.
Launching again the previous command you will not get any results as there
will not be further changes after the previous run.
The saved items will be stored in the directory specified in the results
argument.
Adding --dump
you can observe the returned values but the changes will not be
saved.
Configuration file
A configuration file is a YAML specification file with the following values:
-
NAME
: a unique string to identify the target to process -
URL
: the page URL to monitor for changesYou can also specify
github:name/repository
to point to a GitHub repository -
PARSER
: the parser to use to process the URL. This can be either: -
TYPE
: specify the type of items to process from the page. This value can be:links
: will get all the anchors from a HTML pagerss
: will get all the link items from a RSS feedtext
: will process the page as a simple text filegithub-tags
: will get all the tag anchors from a GitHub repositorygithub-tags-zip
: will get all the tag anchors from a GitHub repository, filtering only those in.zip
formatgithub-tags-tgz
: will get all the tag anchors from a GitHub repository, filtering only those in.tar.gz
format
-
ABSOLUTE_URLS
: a boolean value (true/false) to make the processed URLs as absolute by appending the website from the URL page -
FILTERS
: a list of filters to apply to find the matched items. This can be any of the following:STARTS
: the item must begin with the specified stringNOT STARTS
: the item must not begin with the specified stringENDS
: the item must end with the specified stringNOT ENDS
: the item must not end with the specified stringCONTAINS
: the item must contain the specified stringNOT CONTAINS
: the item must not contain the specified stringREGEX
: the item must match the specified regular expression stringNOT REGEX
: the item must not match the specified regular expression stringTRIM
: removes spaces or the specified characters from both left and rightLTRIM
: removes spaces or the specified characters from the leftRTRIM
: removes spaces or the specified characters from the rightPREPEND
: prepend (insert at the start) the specified textAPPEND
: append (insert at the end) the specified textREMOVE
: remove from the item the specified textREPLACE
: replace from the item the specified text with a new pattern (specified usingWITH:
)REVERSE
: reverse the item textUPPER
: makes the text uppercaseLOWER
: makes the text lowercaseLEFT
: return the first leftmost charactersRIGHT
: return the first rightmost charactersREGEX REPLACE
: replace from the item a pattern using a regular expression with a new pattern (specified usingWITH:
)REGEX SEARCH
: return the first regular expression matchJSON DICT
: return the value from a JSON dict with the specified keyJSON LIST
: return the value from a JSON list with the specified index
-
HEADERS
: a dictionary with the headers to set for the request -
STATUS
: a boolean value (true/false) to enable or disable the target
Configuration example files
Some configuration example files can be found in the docs
directory.
NAME: watchpage
URL: https://github.com/muflone/watchpage/tags
PARSER: html5lib
TYPE: links
ABSOLUTE_URLS: true
FILTERS:
- STARTS: 'https://github.com/muflone/'
- ENDS: '.tar.gz'
STATUS: true
This configuration file will use the html5lib parser to scan all the links in the page that begin with https://github.com/muflone/ and ending with .tar.gz
NAME: watchpage
URL: github:muflone/watchpage
PARSER: html5lib
TYPE: github-tags-tgz
ABSOLUTE_URLS: true
STATUS: true
This configuration file will use the html5lib parser to scan all the tags links for the GitHub repository only extracting the tags ending with .tar.gz
NAME: watchpage
URL: github:muflone/watchpage
PARSER: html5lib
TYPE: github-tags
ABSOLUTE_URLS: true
FILTERS:
- ENDS: '.tar.gz'
- REMOVE RIGHT: '.tar.gz'
- APPEND: '.something'
- REPLACE: '.something'
WITH: '.different'
STATUS: true
This configuration file will use the html5lib parser to scan all the tags links for the GitHub repository only extracting the tags ending with .tar.gz and applies some text replacements.
NAME: watchpage
URL: https://github.com/muflone/watchpage/tags
PARSER: html5lib
TYPE: links
ABSOLUTE_URLS: true
FILTERS:
- STARTS: 'https://github.com/muflone/'
- ENDS: '.tar.gz'
HEADERS:
User-Agent: 'WatchPage'
Foo: 'Bar'
STATUS: true
Custom headers can be specified for each request.
NAME: dbeaver_plugins
URL: https://dbeaver.io/update/ce/latest/plugins/
PARSER: html.parser
TYPE: text
FILTERS:
- CONTAINS: '.jar'
STATUS: false
This configuration file will use the html parser to scan all the lines in the page containing the text .jar
NAME: gmtp
URL: https://sourceforge.net/projects/gmtp/rss
PARSER: xml
TYPE: rss
FILTERS:
- ENDS: '.tar.gz/download'
STATUS: true
This configuration file will use the xml parser to scan all the links in the RSS feed ending with .tar.gz/download
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file WatchPage-0.4.1.tar.gz
.
File metadata
- Download URL: WatchPage-0.4.1.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 131ef9066a27aba75ab01aa51691971e4a4c69b48f5504d5290f9b7a12ecb903 |
|
MD5 | 14373942e8f35d3a29d89f58cc3af830 |
|
BLAKE2b-256 | 0a7534c34d33d6c7e59e95c3660e32b689eea4862ccf3866a946e66b1bdfd1ca |
File details
Details for the file WatchPage-0.4.1-py3-none-any.whl
.
File metadata
- Download URL: WatchPage-0.4.1-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e79adedaff19b0df85bf54edae778f472a519873bd68d8c9abfce147e24058e6 |
|
MD5 | fa32c680a70c699b2c58d1112d446439 |
|
BLAKE2b-256 | 8e3880360120e4cfe4ef827dc6b6c6a2b118a2e7b56934e6ce7d8f34b469c4ad |