Skip to main content

Check and Fix Outdated URLs

Project description

urlfix: Check and Fix Outdated URLs

PyPI version fury.io DOI Project Status Codecov Test-Package Travis Build PyPI license Documentation Status Total Downloads Monthly Downloads Weekly Downloads Maintenance GitHub last commit GitHub issues GitHub issues-closed

urlfix aims to find all outdated URLs in a given file and fix them.

Supported file formats

urlfix fixes URLs given a file of the following types:

  • MarkDown (.md)

  • Plain Text files (.txt)

  • ReStructured Text (.rst)

  • PDF (.pdf)

  • Word (.docx)

  • ODF (.odf)

Features List

  • Commandline and programmer-friendly modes.

  • Replace outdated URLs/links in a single file

  • Replace outdated URLs/links in a directory

  • Replace outdated URLs/links in the same file or in the same files in a directory i.e inplace.

  • Replace files in nested directories

Installation

The simplest way to install the latest release is as follows:

pip install urlfix

To install the development version:

Open the Terminal/CMD/Git bash/shell and enter

pip install git+https://github.com/Nelson-Gon/urlfix.git

# or for the less stable dev version
pip install git+https://github.com/Nelson-Gon/urlfix.git@dev

Otherwise:

# clone the repo
git clone git@github.com:Nelson-Gon/urlfix.git
cd urlfix
python3 setup.py install

Sample usage

Script Mode

To use at the commandline, please use:

python -m urlfix --mode "f" --verbose 1 --inplace 1 --input-file myfile.md

If not replacing within the same file, then:

python -m urlfix --mode "f" --verbose 1 --inplace 0 --input-file myfile.md --output-file myoutputfile.md

To get help:

python -m urlfix -h 

#usage: main.py [-h] -m MODE -in INPUT_FILE [-o OUTPUT_FILE] -v {False,false,0,True,true,1} -i {False,false,0,True,true,1}
#
#optional arguments:
#  -h, --help            show this help message and exit
#  -m MODE, --mode MODE  Mode to use. One of f for file or d for directory
#  -in INPUT_FILE, --input-file INPUT_FILE
#                        Input file for which link updates are required.
#  -o OUTPUT_FILE, --output-file OUTPUT_FILE
#                        Output file to write to. Optional, only necessary if not replacing inplace
#  -v {False,false,0,True,true,1}, --verbose {False,false,0,True,true,1}
#                        String to control verbosity. Defaults to True.
#  -i {False,false,0,True,true,1}, --inplace {False,false,0,True,true,1}
#                        Should links be replaced inplace? This should be safe but to be sure, test with an output file first.

Programmer-Friendly Mode

from urlfix.urlfix import URLFix
from urlfix.dirurlfix import DirURLFix

Create an object of class URLFix

urlfix_object = URLFix("testfiles/testurls.txt", output_file="replacement.txt")

Replacing URLs

After creating our object, we can replace outdated URLs as follows:

urlfix_object.replace_urls(verbose=1)

The above uses default arguments and will not replace a file inplace. This is a safety mechanism to ensure one does not damage their files.

Since we set verbose to True, we get the following output:

urlfix_object.replace_urls()
Found https://www.r-pkg.org/badges/version/manymodelr in testurls.txt, now validating.. 
Found https://cran.r-project.org/package=manymodelr in testurls.txt, now validating.. 
https://cran.r-project.org/package=manymodelr replaced with https://cran.r-project.org/web/packages/manymodelr/index.html 
in replacement.txt
Found https://tidyverse.org/lifecycle/#maturing in testurls.txt, now validating.. 
https://tidyverse.org/lifecycle/#maturing replaced with https://lifecycle.r-lib.org/articles/stages.html in 
replacement.txt
2 URLs have changed of the 3 links found in testurls.txt
2

To replace silently, simply set verbose to False (which is the default).

urlfix_object.replace_urls()
2 URLs have changed of the 3 links found in testurls.txt
2

If there are URLs known to be valid, pass these to the correct_urls argument to save some time.

urlfix_object.replace_urls(correct_urls=[urls_here]) # Use a Sequence eg tuple, list, etc

Replacing several files in a directory

To replace several files in a directory, we can use DirURLFix as follows.

  • Instantiate an object of class DirURLFix
replace_in_dir = DirURLFix("path_to_dir")
  • Call replace_urls
replace_in_dir.replace_urls()

To report any issues, suggestions or improvement, please do so at issues.

If you would like to cite this work, please use:

Nelson Gonzabato (2021) urlfix: Check and Fix Outdated URLs https://github.com/Nelson-Gon/urlfix

Thank you very much.

“Before software can be reusable it first has to be usable.” – Ralph Johnson

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

urlfix-0.2.1.tar.gz (9.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page