Skip to main content

Scrape Linux Distro torrents and add to Transmission automatically

Project description

distroscraper

Scrapes Linux distribution web pages for new torrent files and adds them to Transmission. I want to give back to the Free Software community and one way I can do that is by seeding their torrents.

For legitimate redistrbution of permitted content only.

Included are scrapers for these distributions:

  • Arch Linux
  • Debian
  • Fedora
  • Linux Mint
  • Raspberry Pi

Would love to include, if they would provide a legitimate torrent:

  • OPNsense

If your favorite distribution is not mentioned, you can open an issue, or contribute a pull request. Take a look at the other files in /distroscraper/scrape and make it look like those in general.

Installation

Create and configure a Python Virtual Environment for distroscraper:

python -m venv /home/myser/.venv_distroscraper
. /home/myuser/.venv_distroscraper/bin/activate
pip install distroscraper

Create a shell script (I call mine "scrape-and-add.sh") to run the scrapers and pipe output through the transmission torrent adder. For the below script you will also need the 'jq' command line JSON processing tool.

#!/bin/bash

VENV_PATH=/path/to/venv
DISTROS="distroscraper.scrape.archlinux
         distroscraper.scrape.centos
         distroscraper.scrape.debian
         distroscraper.scrape.fedora
         distroscraper.scrape.linuxmint
         distroscraper.scrape.raspberrypi"
TM_ARGS="--host localhost
         --port 9091
         --user myuser
         --password supersecret
         --download-dir /path/to/linuxdownloads"

for d in $DISTROS; do
  $VENV_PATH/bin/python -m $d | \
    jq .href | \
    xargs $VENV_PATH/bin/python -m distroscraper.tm_add_torrent $TM_ARGS
done

Create a systemd timer or cron job to run scrape-and-add.sh periodically. Transmission is smart enough not to add the same torrent multiple times.

Here is an example systemd service file for use with a timer. The file name is intended to be /etc/systemd/system/distroscraper.service.

[Unit]
Description=scrape linux distributions

[Service]
User=service
Group=service
Type=oneshot
Nice=19
ExecStart=/home/service/scrape-and-add.sh

And here is the corresponding timer file. Note the randomized delay ensures that these happen "more or less" daily, but that everyone isn't hitting the download pages simultaneously at midnight. Call this file /etc/systemd/system/distroscraper.timer.

[Unit]
Description=scrape distributions daily

[Timer]
OnCalendar=00:00
RandomizedDelaySec=86400
Persistent=true
Unit=distroscraper.service

[Install]
WantedBy=timers.target

Reporting

If you want to ingest transmissions stats into some logging solution like ELK or Graylog, you can use this which will produce some JSON that can be massaged into these systems:

python -m distroscraper.tm_get_stats

For Graylog, you may be interested in my [GELF JSON Flattener|https://jefftickle.com/projects/gelf-flattener]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distroscraper-1.0.2.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

distroscraper-1.0.2-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file distroscraper-1.0.2.tar.gz.

File metadata

  • Download URL: distroscraper-1.0.2.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for distroscraper-1.0.2.tar.gz
Algorithm Hash digest
SHA256 f7cd2d477da62c1b3f87ab5c8c36bd967f50d1549b21f7bd8fb4c826bddfbf23
MD5 647d48f293aacb8bd1596fbdfffddb6c
BLAKE2b-256 26b5b464f55277eb2a0d29a5ac1d42694f46a76c6ebc930e295e881bd95a8e98

See more details on using hashes here.

File details

Details for the file distroscraper-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: distroscraper-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for distroscraper-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 06c919c05b84dcb9fa29ba007597cc16d24bccf9a43931cd2b31854488ec15e2
MD5 d656200360d04c821b91fcbd2e2604f0
BLAKE2b-256 849fabb793e2fd818fb3f8afcd2fb9bed2ee3cdf990e3c2b9d517aaa1dc7370d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page