Skip to main content

BeautifulSoup interface for lxml

Project description

FastSoup

Build Status https://coveralls.io/repos/github/spumer/FastSoup/badge.svg

BeautifulSoup interface for lxml

Key features

  • FAST search in tree

  • FAST serialize to str

  • BeautifulSoup4 interface to interact with object:

    • Search: find, find_all, find_next, find_next_sibling

    • Text: get_text, string

    • Tag: name, get, clear, __getitem__, __str__, __repr__, append, new_tag, extract, replace_with

Install

pip install fast-soup==1.1.0

How to use

from fast_soup import FastSoup

content = ...  # read some html content
soup = FastSoup(content)

# interact like BS4 object
result = soup.find('a', id='my_link')

# interact like lxml object
el = result.unwrap()

FAQ

Q: BS4 already implement lxml parser. Why i should use FastSoup?

A: Yes, BS4 implement parser, and it’s just building the tree. All next interactions proceed with “Python speed”: searching, serialization. FastSoup internally use lxml and guarantee “C speed”.

Q: How FastSoup speedup works?

A: FastSoup just build xpath and execute them. For prevent rebuilding LRU cache used.

Q: Why you don’t support whole interface? This will be soon?

A: I wrote functions which speed up parsing in my projects. Just create a issue or pull request and i think we find the solution ;)

Miscellaneous

You can got power of BeautifulSoup when wrap your lxml objects, e.g:

from fast_soup import Tag

content = ...  # some bytes ready to parse
context = lxml.etree.iterparse(
    io.BytesIO(content),  ...
)
for event, elem in context:
    tag = Tag(elem)

    tag_text = tag.get_text()
    tag_attr = tag['attribute']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast-soup-1.1.0.tar.gz (17.2 kB view hashes)

Uploaded Source

Built Distribution

fast_soup-1.1.0-py3-none-any.whl (16.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page