Skip to main content

mrkoll.se scraper & RSS generator

Project description

mrkoll-scraper

Tool to scrape and generate an RSS feed from the incredibly creepy mrkoll.se Swedish toplist. Each RSS entry represents the list as it looked at a given point in time, plus convenience links to search for the person's name on Duckduckgo, Flashback, and The Facebook.

Installation

pip install mrkoll-scraper

Usage

CLI

mrks --scrape

Scrapes the current list and saves it to the Shelve database mrks.db in the current working directory. This operation saves the raw scraped HTML as well as a generated feedgen FeedEntry. The data is saved in a dictionary with the current date as key, so multiple scrapes during the same day doesn't save a new entry, they only update the existing one.

mrks --regenerate

Iterates through the saved lists and re-generates FeedEntry's from the raw HTML, in case you've made some changes in the HTML template or so, and want them applied retroactively.

WSGI

mrks.wsgi contains a beautifully simple WSGI application, that simply outputs an RSS feed based on the data currently saved.

Cron

Crontab to run --scrape every Monday at midnight:

0 0 * * 1 cd /home/robert/mrkoll-scraper && /home/robert/mrkoll-scraper/venv/bin/mrks -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrkoll-scraper-0.3.2.tar.gz (5.3 kB view hashes)

Uploaded Source

Built Distribution

mrkoll_scraper-0.3.2-py3-none-any.whl (18.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page