BeautifulSoup interface for lxml
Project description
FastSoup
BeautifulSoup interface for lxml
Key features
FAST search in tree
FAST serialize to str
BeautifulSoup4 interface to interact with object:
Search: find, find_all, find_next, find_next_sibling
Text: get_text, string
Tag: name, get, clear, __getitem__, __str__
Install
pip install fast-soup==1.0.0
How to use
from fast_soup import FastSoup
content = ... # read some html content
soup = FastSoup(content)
# interact like BS4 object
result = soup.find('a', id='my_link')
# interact like lxml object
el = result.unwrap()
FAQ
Q: BS4 already implement lxml parser. Why i should use FastSoup?
A: Yes, BS4 implement parser, and it’s just building the tree. All next interactions proceed with “Python speed”: searching, serialization. FastSoup internally use lxml and guarantee “C speed”.
Q: How FastSoup speedup works?
A: FastSoup just build xpath and execute them. For prevent rebuilding LRU cache used.
Q: Why you don’t support whole interface? This will be soon?
A: I wrote functions which speed up parsing in my projects. Just create a issue or pull request and i think we find the solution ;)
Miscellaneous
You can got power of BeautifulSoup when wrap your lxml objects, e.g:
from fast_soup import Tag
content = ... # some bytes ready to parse
context = lxml.etree.iterparse(
io.BytesIO(content), ...
)
for event, elem in context:
tag = Tag(elem)
tag_text = tag.get_text()
tag_attr = tag['attribute']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fast_soup-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f570dbb84b4d1167e91f412138e13b21562903e20f57e8048898c3d0a9e00a2 |
|
MD5 | 4a44970f15f7eb9db052618ee2de334c |
|
BLAKE2b-256 | 72f8873495bf70bec1eeda607c42be86b579b70c8c59ed964c92d07bb95d8d04 |