Extract all internal and external links from a URL.
Project description
Links-Extractor
Extract all internal and external links from a URL in Python.
Description
Links-Extractor fetches one or more web pages and lists the internal and external hyperlinks found on each page. A link is treated as internal when its host matches the host of the page being scanned, and external otherwise. Empty anchors and javascript:, mailto:, and tel: links are ignored.
Install
pip install links-extractor
This installs the links-extractor command. You can also run the script directly from a clone (python3 extractor.py ...).
Requirements
- Python 3
- Dependencies:
requests,beautifulsoup4,lxml
Install them with:
pip install -r requirements.txt
Usage
Pass one or more URLs as arguments:
links-extractor https://example.com
python3 extractor.py https://example.com
python3 extractor.py https://example.com https://www.python.org
Redirect the output to a file:
python3 extractor.py https://example.com > out.txt
For each URL the script prints the count and list of internal links followed by the count and list of external links.
A full write-up is available at http://com.puter.tips/2016/12/extract-all-internal-and-external-links.html
You may also find the companion project useful: https://github.com/com-puter-tips/SEO-Analysis
Citation
If you use this software, please cite it using the metadata in CITATION.cff.
License
Distributed under the GNU General Public License v3.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file links_extractor_cli-1.4.0.tar.gz.
File metadata
- Download URL: links_extractor_cli-1.4.0.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4dc28ea9112795245437dfc8fd5e538d393b5428eee0b194107ea3541891a1a2
|
|
| MD5 |
aaf30b3105b482819fa09bd910a37000
|
|
| BLAKE2b-256 |
50090df1498d546e72e1a5ba3706fe3204312cbb58de7d0d92caa812d0bd3602
|
File details
Details for the file links_extractor_cli-1.4.0-py3-none-any.whl.
File metadata
- Download URL: links_extractor_cli-1.4.0-py3-none-any.whl
- Upload date:
- Size: 15.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25149e6d91c0ece2cc46aee16d2e4e5cf97262136c8bbe62037b17bd955b40b4
|
|
| MD5 |
66ddc6f537569c8ee76d6ff04bff33d9
|
|
| BLAKE2b-256 |
c63120698d2c8c61c97db36a480e59490ebef58b0466405c8d58139e19460549
|