An Easy Scraper for HTML
Project description
easy-scraper-py
An easy scraping tool for HTML
Goal
Re-implementation of tanakh/easy-scraper in Python.
Install from PyPI
pip install easy-scraper-py
Usage Example
<!-- Target -->
<body>
<b>NotMe</b>
<a class=here>Here</a>
<a class=nothere>NotHere</a>
</body>
<!-- Pattern -->
<a class=here>{{ text }}</a>
import easy_scraper
target = r"""<body>
<b>NotMe</b>
<a class=here>Here</a>
<a class=nothere>NotHere</a>
</body>
""" # newlines and spaces are all ignored.
pattern = "<a class=here>{{ text }}</a>"
easy_scraper.match(target, pattern) # [{'text': 'Here'}]
# XML (RSS) scraping
import easy_scraper
import urllib.request
body = urllib.request.urlopen("https://kuragebunch.com/rss/series/10834108156628842505").read().decode()
res = easy_scraper.match(body, "<item><title>{{ title }}</title><link>{{ link }}</link></item>")
for item in res[:5]:
print(item)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
easy-scraper-py-0.1.5.tar.gz
(4.1 kB
view hashes)
Built Distribution
Close
Hashes for easy_scraper_py-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52442a0c9b4e0f0aa170953d9ae258f98525f6f128c91d99333cf352f9f5e8b5 |
|
MD5 | a0c79f100ce1474790a926518d60f705 |
|
BLAKE2b-256 | 63313cb38263d04c2e915930f782fa3fbcbd4ab7c408c94ab7dd06f343de1870 |