Easier wrangling of web documents
Project description
Soupy is a wrapper around BeautifulSoup that makes it easier to build complex queries when wrangling web data.
Here’s an example of a Soupy query.
from soupy import Soupy, Q html = """ <div id="main"> <div>The web is messy</div> and full of traps <div>but Soupy loves you</div> </div>""" print(Soupy(html).find(id='main').children .each(Q.text.strip()) # extract text from each node, trim whitespace .filter(len) # remove empty strings .val()) # dump out of Soupy # ['The web is messy', 'and full of traps', 'but Soupy loves you']
The same query using BeautifulSoup:
from bs4 import BeautifulSoup, NavigableString html = """ <div id="main"> <div>The web is messy</div> and full of traps <div>but Soupy loves you</div> </div>""" result = [] for node in BeautifulSoup(html).find(id='main').children: if isinstance(node, NavigableString): text = node.strip() else: text = node.text.strip() if len(text): result.append(text) print(result)
For more information, see the Soupy Documentation
Installation
pip install soupy
Dependencies
six and BeautifulSoup4
Soupy is supported on Python 2.6+ and 3.3+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size soupy-0.3.tar.gz (10.1 kB) | File type Source | Python version None | Upload date | Hashes View |