Easier wrangling of web documents
Project description
Soupy is a wrapper around BeautifulSoup that makes it easier to build complex queries when wrangling web data.
Here’s an example of a Soupy query.
from soupy import Soupy, Q
html = """
<div id="main">
<div>The web is messy</div>
and full of traps
<div>but Soupy loves you</div>
</div>"""
print(Soupy(html).find(id='main').children
.each(Q.text.strip()) # extract text from each node, trim whitespace
.filter(len) # remove empty strings
.val()) # dump out of Soupy
# ['The web is messy', 'and full of traps', 'but Soupy loves you']
The same query using BeautifulSoup:
from bs4 import BeautifulSoup, NavigableString
html = """
<div id="main">
<div>The web is messy</div>
and full of traps
<div>but Soupy loves you</div>
</div>"""
result = []
for node in BeautifulSoup(html).find(id='main').children:
if isinstance(node, NavigableString):
text = node.strip()
else:
text = node.text.strip()
if len(text):
result.append(text)
print(result)
For more information, see the Soupy Documentation
Installation
pip install soupy
Dependencies
six and BeautifulSoup4
Soupy is supported on Python 2.6+ and 3.3+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
soupy-0.3.tar.gz
(10.1 kB
view details)
File details
Details for the file soupy-0.3.tar.gz
.
File metadata
- Download URL: soupy-0.3.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a65ab9a2f83827df6e0cb2890c94bac7c847082752ba73c094b11c3213a345eb |
|
MD5 | 3826ff46df881f75ee823b161f504513 |
|
BLAKE2b-256 | 76b0badfa91b5789a8af211e32e9836498cdab749b9ec2dd5346b2349f049d06 |