quickly extract links from html
Project description
Fast Link Extractor
Project under active deveopment
A Python 3.7+ package to extract links from a webpage. Asyncronous functions allows the code to run fast when extracting from many sub-directories.
A use case for this tool is to extract download links for use with wget or fsspec.
Main base-level functions
.link_extractor(): extract links from a given URL
Installation
PyPi
pip install fast-link-extractor
Example
Simply import the package and call link_extractor(). This will output of list of extracted links
import fast_link_extractor as fle
# url to extract links from
base_url = "https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/"
# extract all links from sub directories ending with .nc
# this may take ~10 seconds, there are a lot of sub-directories
links = fle.link_extractor(base_url,
search_subs=True,
regex='.nc$')
If using inside Jupyter or IPython, set ipython=True
import fast_link_extractor as fle
# url to extract links from
base_url = "https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/"
# extract all links from sub directories ending with .nc
# this may take ~10 seconds, there are a lot of sub-directories
links = fle.link_extractor(base_url,
search_subs=True,
ipython=True,
regex='.nc$')
ToDo
- more tests: need more tests
- documentation: need to setup documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file fast_link_extractor-1.0.0.tar.gz.
File metadata
- Download URL: fast_link_extractor-1.0.0.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cccd5a55696b1b24f4afe50cc4c8df79529e509811a177028289edda2b0a0839
|
|
| MD5 |
3c1c79e0e64cf7744eb37ac869ea925d
|
|
| BLAKE2b-256 |
cdbbb8eb8a56b063979f9b9af097fbcd65c95bd2a0e26c42949e69629d3360c7
|