Skip to main content

mrkoll.se scraper & RSS generator

Project description

mrkoll-scraper

Tool to scrape and generate an RSS feed from the incredibly creepy mrkoll.se Swedish toplist. Each RSS entry represents the list as it looked at a given point in time, plus convenience links to search for the person's name on Duckduckgo, Flashback, and The Facebook.

Installation

pip install mrkoll-scraper

Usage

CLI

mrks --scrape

Scrapes the current list and saves it to the Shelve database mrks.db in the current working directory. This operation saves the raw scraped HTML as well as a generated feedgen FeedEntry. The data is saved in a dictionary with the current date as key, so multiple scrapes during the same day doesn't save a new entry, they only update the existing one.

mrks --regenerate

Iterates through the saved lists and re-generates FeedEntry's from the raw HTML, in case you've made some changes in the HTML template or so, and want them applied retroactively.

WSGI

mrks.wsgi contains a beautifully simple WSGI application, that simply outputs an RSS feed based on the data currently saved.

Cron

Crontab to run --scrape every Monday at midnight:

0 0 * * 1 cd /home/robert/mrkoll-scraper && /home/robert/mrkoll-scraper/venv/bin/mrks -s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrkoll-scraper-0.3.4.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

mrkoll_scraper-0.3.4-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file mrkoll-scraper-0.3.4.tar.gz.

File metadata

  • Download URL: mrkoll-scraper-0.3.4.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.10

File hashes

Hashes for mrkoll-scraper-0.3.4.tar.gz
Algorithm Hash digest
SHA256 db0482b89f6a13c7e324b26a7b11b4e63e59b88a48ed74e32bc5e88373ba0897
MD5 1671116ef532b8085fba2f4e661a3de2
BLAKE2b-256 d0cd550ae570a78675bdac3c31467c47672f88847276c3b3b202de88528824ad

See more details on using hashes here.

File details

Details for the file mrkoll_scraper-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: mrkoll_scraper-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.10

File hashes

Hashes for mrkoll_scraper-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 813714eec597ed1754273226509e9b5f538856d84f87dd2e995d77a2e2f33d3e
MD5 53885a7836f0c9a0d02f81dc77432825
BLAKE2b-256 88487f73323c11d503c997bca8e9fd5e8de05cb34c770a34acaccbe4d040be5f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page