Project description

mrkoll-scraper

Tool to scrape and generate an RSS feed from the incredibly creepy mrkoll.se Swedish toplist. Each RSS entry represents the list as it looked at a given point in time, plus convenience links to search for the person's name on Duckduckgo, Flashback, and The Facebook.

Installation

pip install mrkoll-scraper

Usage

CLI

mrks --scrape

Scrapes the current list and saves it to the Shelve database mrks.db in the current working directory. This operation saves the raw scraped HTML as well as a generated feedgen FeedEntry. The data is saved in a dictionary with the current date as key, so multiple scrapes during the same day doesn't save a new entry, they only update the existing one.

mrks --regenerate

Iterates through the saved lists and re-generates FeedEntry's from the raw HTML, in case you've made some changes in the HTML template or so, and want them applied retroactively.

WSGI

mrks.wsgi contains a beautifully simple WSGI application, that simply outputs an RSS feed based on the data currently saved.

Cron

Crontab to run --scrape every Monday at midnight:

0 0 * * 1 cd /home/robert/mrkoll-scraper && /home/robert/mrkoll-scraper/venv/bin/mrks -s

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Release history Release notifications | RSS feed

0.3.4

Aug 2, 2021

0.3.3

Aug 2, 2021

This version

0.3.2

Aug 27, 2020

0.3.1 yanked

Jul 8, 2020

Reason this release was yanked:

Breaking bug, use v0.3.2 instead

0.3.0 yanked

Jun 30, 2020

Reason this release was yanked:

Breaking bug, use v0.3.2 instead

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrkoll-scraper-0.3.2.tar.gz (5.3 kB view hashes)

Uploaded Aug 27, 2020 Source

Built Distribution

mrkoll_scraper-0.3.2-py3-none-any.whl (18.0 kB view hashes)

Uploaded Aug 27, 2020 Python 3

Hashes for mrkoll-scraper-0.3.2.tar.gz

Hashes for mrkoll-scraper-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`881994bbf5a634f686b38d44366ee59bb06deecbcd54d69c8811ddd34efb6a1e`
MD5	`427f854bcdd8b8293da825d9bd438d1c`
BLAKE2b-256	`a5af002798f98b0289fcfbb5ea9f6d465e4a09c4c798b2e8c75334d44f5bd09a`

Hashes for mrkoll_scraper-0.3.2-py3-none-any.whl

Hashes for mrkoll_scraper-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b475ff36097dc268d819fea1a3cef78f96343f11e1464d8c18c1710f4693a71`
MD5	`16bad519b2e09a6e6297cf932a45566a`
BLAKE2b-256	`fea191855e1b491767c9e2bd7a04a1714d86402d417f3044804fe221fe0de737`