Skip to main content

BeautifulSoup interface for lxml

Project description


Build Status

BeautifulSoup interface for lxml

Key features

  • FAST search in tree
  • FAST serialize to str
  • BeautifulSoup4 interface to interact with object:
    • Search: find, find_all, find_next, find_next_sibling
    • Text: get_text, string
    • Tag: name, get, clear, __getitem__, __str__


pip install fast-soup==1.0.0

How to use

from fast_soup import FastSoup

content = ...  # read some html content
soup = FastSoup(content)

# interact like BS4 object
result = soup.find('a', id='my_link')

# interact like lxml object
el = result.unwrap()


Q: BS4 already implement lxml parser. Why i should use FastSoup?

A: Yes, BS4 implement parser, and it’s just building the tree. All next interactions proceed with “Python speed”: searching, serialization. FastSoup internally use lxml and guarantee “C speed”.

Q: How FastSoup speedup works?

A: FastSoup just build xpath and execute them. For prevent rebuilding LRU cache used.

Q: Why you don’t support whole interface? This will be soon?

A: I wrote functions which speed up parsing in my projects. Just create a issue or pull request and i think we find the solution ;)


You can got power of BeautifulSoup when wrap your lxml objects, e.g:

from fast_soup import Tag

content = ...  # some bytes ready to parse
context = lxml.etree.iterparse(
    io.BytesIO(content),  ...
for event, elem in context:
    tag = Tag(elem)

    tag_text = tag.get_text()
    tag_attr = tag['attribute']

Project details

Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for fast-soup, version 1.0.0
Filename, size File type Python version Upload date Hashes
Filename, size fast_soup-1.0.0-py3-none-any.whl (16.6 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size fast-soup-1.0.0.tar.gz (16.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page