Skip to main content

extract URLs from websites or local HTML files

Project description

grepurl is a command line tool that extracts URLs from a website (or a local HTML file).

Usage

grepurl http://example.com/ # extract all URLs from links and images
grepurl -a http://example.com/foo.htm # only extract from <a> tags (i.e. links)
grepurl -i http://example.com/bar.htm # only extract from <img> tags (i.e. images)
grepurl -r "\.py$" http://example.com/ # only extract links that end in '.py'
grepurl -r "\.zip$" -d http://example.com/ # download all zip files
grepurl -r "\.zip$" -d -o download_dir http://example.com/ # download all zip files into download_dir

Installation using pip

pip install grepurl

Installation from repository

git clone https://github.com/arne-cl/grepurl
cd grepurl
pip install -e .

License

GPLv2 or later.

Authors

Gerome Fournier (original author). His implementation is only available via the Internet Archive.

Arne Neumann (added -l option for local files, minor changes).

GPT-4 (rewrote the script for Python 3 compatibility).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grepurl-0.2.0.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

grepurl-0.2.0-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file grepurl-0.2.0.tar.gz.

File metadata

  • Download URL: grepurl-0.2.0.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.10.0 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.15

File hashes

Hashes for grepurl-0.2.0.tar.gz
Algorithm Hash digest
SHA256 91249b7020229b5a975b07b7316a2f64e54ed1d3ba367efb532f1d6ba39b5244
MD5 b5cd6eac118efff1df21f12b63a711b5
BLAKE2b-256 e8cc98dc4a7db995a3c5dbfb82358c8c727435796e0c2033ab7e860ea060e806

See more details on using hashes here.

File details

Details for the file grepurl-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: grepurl-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.10.0 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.15

File hashes

Hashes for grepurl-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1c9e83a313f7b7b6639f8cf15654d840999ae6fc2baba728cb914c5687107327
MD5 be1003554df2a9a904ce052f669603cc
BLAKE2b-256 95993e2acca72e817ef4c501e83933c6fce31c332a8501714dbc99e26a2fe5d2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page