Removes dead & duplicate links from markdown files and updates redirects.
Project description
link-reaper
Verifies AND automatically reaps links to keep your lists updated and clean of "zombies".
Unlike other link verifiers, this one will make direct changes to your markdown files instead of just preventing push/pull requests (but it can do that too).
Installation
This project is utilized as a Python package, and requires Python to be installed if utlized directly on your computer.
For personal usage
Here are a couple options for those who want to use the project.
Using Pip Install:
- Have Python installed and the latest version of Pip
- Use
pip install the-link-reaper - See Usage for what you can do with this package.
Docker
The project includes a Dockerfile you can edit and build for your images. See here for an example. A downloadable premade image TBD.
Github Workflow
You can install link-reaper as a python package to use in workflows. See here for an example.
For Developers
Instructions
- Fork this repo (if you want to contribute. If not, skip this step)
- Find/create your directory of choice
- Open a terminal in that directory and use
git clone https://github.com/<your name>/<your fork name here>.gitbut if you are not using a fork, just usehttps://github.com/sharktrexer/link-reaper.git - Create a virtual environment
python -m venv .venv - Install requried dependencies
pip install -r requirements.txtor if you intend to contribute, also dopip install -r requirements_dev.txt - Use
python -m link_reaper.reaper reap yourfile.md -is -mutilizing the many options here to test or play around with the project. The provided example will NOT overwrite your file data. - If you're contributing, follow the steps below
Contributing
Feel free to create Issues or Pull Requests at your leisure. If you are unsure if the PR is a good idea, create an Issue first and I will respond as best as I can.
Before creating a pull request, be sure to use the following commands after implementing your changes (and make sure you installed dependencies from dev_requirements.txt):
# Lint code
ruff check link-reaper
# Apply lint fixes (you may have to do some manually)
ruff check --fix
# Format changes
ruff format link-reaper
# Optional for bonus points
pylint link-reaper
If you don't use the ruff commands, the workflow of this project will fail and it will take longer to merge your potentially beautiful changes!
Usage
Here are the many ways you can utilize this python package.
Terminal
Package Usage: python -m link_reaper.reaper [OPTIONS] COMMAND [ARGS]...
Usage: link-reaper [OPTIONS] COMMAND [ARGS]...
Groups CLI commands under 'link reaper' and prints optional flavor ascii art
Options:
-na, --no_art Disable printed ascii art.
--help Show this message and exit.
Commands:
reap Command that reaps links from markdown files based on your options
Options:
-s, --show_afterlife Create an afterlife-filename.md for each
checked file that only contains the reaped
links.
-m, --merciful Instead of overwriting files, create a reaped-
filename.md for each checked file that contains
applied changes.
-ig, --ignore_ghosts Prevents updating redirecting links.
-id, --ignore_doppelgangers Ignore duplicate links.
-is, --ignore_ssl Disable SSL errors. Not very secure so use with
caution.
-it, --ignore_timeouts Ignore links that time out.
-iu, --ignore_urls TEXT Ignores specific links or general domains you
want to whitelist. Comma separate each entry.
-rs, --reap_status TEXT Status codes you want to be reaped (By default
404, 500, 521 are reaped and 300s are updated).
Enter each code comma separated.
-p, --patience INTEGER Max # of seconds to wait for url to send data
until it times out.
-dl, --disable_logging Prevents creation of any log type files (does
not overwrite -show-afterlife)
-v, --verbose Provide more information on the reaping
process.
--help Show this message and exit.
Examples
Utilizing pip, you can install this package to use not only on your direct computer for any project, but also gives the flexibility of use in containers or workflows.
General Use
In your Python project, you can use pip install the-link-reaper for access to CLI commands. For example, if you want to automatically clean a markdown list in your project,
like a README.md, while understanding what exactly was changed without overwriting data, try:
python link-reaper reap example.md -is -m -s
This will keep the integrity of your document and create new files like
- reaped-example.md | Showcases the changes the program would make to the inputted file if overwritten
- log-example.md | Lists any links that the program couldn't determine were reapable or not
- afterlife-example.md | Lists all the reaped links by themselves
If you like the changes Link Reaper made, rename reaped-example.md to example.md to overwrite the original document with a cleaner link list. Feel free to delete the afterlife & log files.
Whitelisting URLs
If there are certain urls or web domains you'd rather this program ignore, utilize the --ignore_urls option. For example, if you want to ignore a specific url, do:
link-reaper reap example.md -iu https://github.com/sharktrexer/link-reaper
But, lets say you want to ignore ALL github urls, then simply do:
-iu github.com
Or, if you wanted to ignore all of a certain path from github, you could do:
-iu github.com/sharktrexer
And finally, you can mix and match:
-iu https://github.com/sharktrexer/link-reaper,google.com
Blacklisting Status Codes
There may be some status codes some of your urls return that you would like reaped. In that case, use the --reap-status option. Similarly to above, to ignore one or multiple specific codes, you can do:
link-reaper reap example.md -rs 401,402
However, you may want to reap a similar group of status codes. In that case, Link Reaper provides an easy shorthand way to do so, using "*". So if you want all 400 codes to be reaped, then inputting 4* or 4** would do such, as so:
-rs 4*
This also works with only specifying a range of 10, where if you input 30*, all codes from 300-309 would be caught and reaped, like such:
-rs 30*
Mixing and matching is totally fine as well:
-rs 403,30*
And don't worry about erroneous inputs, they'll be ignored.
GitHub Workflow
Link Reaper can be used to verify pushes and pull requests using workflows, without changing any aspect of a document. See below for an example that verifies links without any extra fluff or potential to overwrite changes.
name: Link-Reaper
on:
push:
branches: [ '*' ]
pull_request:
branches: [ '*' ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.13.1'
- name: Install & run link-reaper
run: |
pip install the-link-reaper
link-reaper -na reap README.md -is -m -dl
Dockerfile
Provided in this project is an example Dockerfile that you can use to create a container that verifies a markdown list. For easy copy/paste:
# Dockerfile for link-reaper
FROM python:3.13.1
RUN pip install the-link-reaper
# Command to run link-reaper on your file without overwriting
# Customize as you desire
CMD ["link-reaper", "reap", "yourfile.md", "-is", -m"]
# Now you can use the following commands in your terminal to run:
# docker build -t link-reaper .
# docker run link-reaper
In Progress Features
If you would like to see what is currently in production/what features are planned, visit my trello page here!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file the_link_reaper-0.8.3.tar.gz.
File metadata
- Download URL: the_link_reaper-0.8.3.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd70525aaa683aed8178142a0660e734638ab9364c785e55883fd0bbd287d653
|
|
| MD5 |
383ee583eb1514890fdacd71628a3133
|
|
| BLAKE2b-256 |
8347edbe12f8be7ad030716fc054b4bd737f0deec0ad136b2614edda1d66cafe
|
File details
Details for the file the_link_reaper-0.8.3-py3-none-any.whl.
File metadata
- Download URL: the_link_reaper-0.8.3-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63efefd0c4ef8ac3b53cdacf1e8274a661ae6a6e70b9bfa09fa6e061ea5145ad
|
|
| MD5 |
941ecf98b99dc00952d8bf2b812e904c
|
|
| BLAKE2b-256 |
b3f0024b0849307fb947c865b59b7e408779cb0d9f1913e1364a84d8e32d9083
|