Real-time monitoring of global Wikipedia page edits
Project description
WikiChangeWatcher
Introduction
Wikipedia provides an SSE Stream of all edits made to any page across Wikipedia, which allows you to watch all edits made to all wikipedia pages in real time.
WikiChangeWatcher is just a thin wrapper around an SSE client, pointed at the URL for the SSE stream for wikipedia edits, with some filtering features that allow you to watch for page edit events with specific attributes (e.g. “anonymous” edits with IP addresses in specific ranges, or edits made by a wikipedia user whose username matches a specific regular expression).
Examples
Some example scripts illustrating how to use WikiChangeWatcher are presented in the following sections.
Monitoring “anonymous” page edits made from specific IP address ranges
The following example code watches for edits made by 3 specific IPv4 address ranges.
# Example script showing how to use WikiChangeWatcher to watch for "anonymous" edits to any
# wikipedia page from specific IP address ranges
import time
from wikichangewatcher import WikiChangeWatcher, IpV4Watcher
# Callback function to run whenever an event matching our IPv4 address pattern is seen
def on_match(json_data):
"""
json_data is a JSON-encoded event from the WikiMedia "recent changes" event stream,
as described here: https://www.mediawiki.org/wiki/Manual:RCFeed
"""
print("{user} edited {title_url}".format(**json_data))
# Watch for anonymous edits from some known IP addresses within the UK houses of parliament
# (taken from https://gist.github.com/Jonty/aabb42ab31d970dfb447, probably old/invalid by now)
wc = WikiChangeWatcher([IpV4Watcher(on_match, "192.60.38.225-230"),
IpV4Watcher(on_match, "194.60.38.200-205"),
IpV4Watcher(on_match, "194.60.38.215-219")])
# You can also use the wildcard '*' character within IP addresses; the following line
# sets up a watcher that triggers on any IP address (all anonymous edits)
# wc = WikiChangeWatcher([IpV4Watcher(on_match, "*.*.*.*")])
wc.run()
# Watch for page edits forever until KeyboardInterrupt
try:
while True:
time.sleep(0.1)
except KeyboardInterrupt:
wc.stop()
Monitoring page edits made by usernames that match a regular expression
The following example code watches for edits made by signed-in users with usernames that contain one or more strings matching a regular expression.
# Example script showing how to use WikiChangeWatcher to watch for NON-"anonymous" edits to any
# wikipedia page, by usernames that contain a string matching a regular expression
import time
from wikichangewatcher import WikiChangeWatcher, UsernameRegexSearchWatcher
# Callback function to run whenever an edit by a user with a username containing our regex is seen
def on_match(json_data):
"""
json_data is a JSON-encoded event from the WikiMedia "recent changes" event stream,
as described here: https://www.mediawiki.org/wiki/Manual:RCFeed
"""
print("{user} edited {title_url}".format(**json_data))
# Watch for edits made by users with "bot" in their username
wc = WikiChangeWatcher([UsernameRegexSearchWatcher(on_match, r"[Bb]ot|BOT")])
wc.run()
# Watch for page edits forever until KeyboardInterrupt
try:
while True:
time.sleep(0.1)
except KeyboardInterrupt:
wc.stop()
Monitoring page edit events based on regular expression match on arbitary JSON fields
The following example code watches for any page edit events where the specified JSON field matches contains one or more matches of a regular expression (available JSON fields and their descriptions can be found here).
# Example script showing how to use WikiChangeWatcher to filter page edit events
# by a regular expression match in an arbitrary named field from the JSON event
# provided by the SSE stream of wikipedia page edits
import time
from wikichangewatcher import WikiChangeWatcher, FieldRegexSearchWatcher
# Callback function to run whenever an edit is made to a page that has a regex match in the page URL
def on_match(json_data):
"""
json_data is a JSON-encoded event from the WikiMedia "recent changes" event stream,
as described here: https://www.mediawiki.org/wiki/Manual:RCFeed
"""
print("{user} edited {title_url}".format(**json_data))
# Watch for edits made to any page that has the word "publish" in the page URL
# ("title_url" field in the JSON object)
wc = WikiChangeWatcher([FieldRegexSearchWatcher(on_match, "title_url", r"[Pp]ublish")])
wc.run()
# Watch for page edits forever until KeyboardInterrupt
try:
while True:
time.sleep(0.1)
except KeyboardInterrupt:
wc.stop()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for wikichangewatcher-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 793d81e0e0b7421a0aade749259278c857ec159efbeb3609d76444353f634bba |
|
MD5 | 84f295655801ef30975274924ed55841 |
|
BLAKE2b-256 | 4a4ba8843bbc84d76816ef18df4566b2f84287ad93c411d7f07e2f654c64e46d |