mrkoll.se scraper & RSS generator
Project description
mrkoll-scraper
Tool to scrape and generate an RSS feed from the incredibly creepy mrkoll.se Swedish toplist. Each RSS entry represents the list as it looked at a given point in time, plus convenience links to search for the person's name on Duckduckgo, Flashback, and The Facebook.
Installation
pip install mrkoll-scraper
Usage
CLI
mrks --scrape
Scrapes the current list and saves it to the Shelve database mrks.db
in the current working directory. This operation saves the raw scraped HTML as well as a generated feedgen FeedEntry
. The data is saved in a dictionary with the current date as key, so multiple scrapes during the same day doesn't save a new entry, they only update the existing one.
mrks --regenerate
Iterates through the saved lists and re-generates FeedEntry
's from the raw HTML, in case you've made some changes in the HTML template or so, and want them applied retroactively.
WSGI
mrks.wsgi
contains a beautifully simple WSGI application, that simply outputs an RSS feed based on the data currently saved.
Cron
Crontab to run --scrape
every Monday at midnight:
0 0 * * 1 cd /home/robert/mrkoll-scraper && /home/robert/mrkoll-scraper/venv/bin/mrks -s
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mrkoll_scraper-0.3.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a7b55d9c70a5e1d68d03a58aa662eb55a45047000285b141d5a476a66901b7e |
|
MD5 | fedadf0223cc579fb808912d2ffd6d52 |
|
BLAKE2b-256 | 15ee6c5fa93279a479fc5ed05506921f859e53ae07583500ec34b1245adf2382 |