Skip to main content

extract URLs from websites or local HTML files

Project description

grepurl is a command line tool that extracts URLs from a website (or a local HTML file).

Usage

grepurl http://example.com/ # extract all URLs from links and images
grepurl -a http://example.com/foo.htm # only extract from <a> tags (i.e. links)
grepurl -i http://example.com/bar.htm # only extract from <img> tags (i.e. images)
grepurl -r "\.py$" http://example.com/ # only extract links that end in '.py'
grepurl -r "\.zip$" -d http://example.com/ # download all zip files
grepurl -r "\.zip$" -d -o download_dir http://example.com/ # download all zip files into download_dir

Installation using pip

pip install grepurl

Installation from repository

git clone https://github.com/arne-cl/grepurl
cd grepurl
pip install -e .

License

GPLv2 or later.

Authors

Gerome Fournier (original author). His implementation is only available via the Internet Archive.

Arne Neumann (added -l option for local files, minor changes).

GPT-4 (rewrote the script for Python 3 compatibility).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grepurl-0.2.0.tar.gz (3.1 kB view hashes)

Uploaded Source

Built Distribution

grepurl-0.2.0-py3-none-any.whl (3.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page