Skip to main content

A wrapper for BeautifulSoup4 that restores the ability to work with HTML fragments

Project description

This is a thin wrapper for BeautifulSoup4 that restores the ability to work with HTML fragments. For example:

from bs4 import BeautifulSoup
from fragmentsoup import FragmentSoup
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', features='html5lib')
soup
# <html><head></head><body><b class="boldest">Extremely bold</b></body></html>
# Note that the fragment is wrapped to make it a valid html document

soup = FragmentSoup('<b class="boldest">Extremely bold</b>', features='html5lib')
soup
# <b class="boldest">Extremely bold</b>
# FragmentSoup keeps it as a fragment

In almost all cases, a FragmentSoup instance should work exactly the same as a BeautifulSoup instance. The one notable exception is that calling ‘wrap’ on a Fragment itself will wrap the entire Fragment and return itself:

from fragmentsoup import FragmentSoup
soup = FragmentSoup('<b class="boldest">Extremely bold</b>', features='html5lib')
soup
# <b class="boldest">Extremely bold</b>

soup.wrap(soup.new_tag('div')
# <div><b class="boldest">Extremely bold</b></div>

If you wrap a subelement, it returns a BeautifulSoup “Tag” instance. If you want to use the returned wrapped subelement as a Fragment, you will need to wrap the returned Tag instance to use it as a fragment:

from fragmentsoup import FragmentSoup
soup = FragmentSoup('<div><b class="boldest">Extremely bold</b></div>', features='html5lib')
subdocument = soup.b.wrap(soup.new_tag('h1'))
subdocument
# <h1><b class="boldest">Extremely bold</b></h1>
type(subdocument)
# <class 'bs4.element.Tag'>

subdocument = FragmentSoup(subdocument)
type(subdocument)
# <class 'fragmentsoup.FragmentSoup'>

This also applies to Tags returned as a result of unwrapping a part of the document.

What if I pass in a well-formed document?

If you pass in a full document (which is defined as starting with a <!DOCTYPE> or <html> tag), then FragmentSoup assumes that the resulting tree is well-formed and it acts exactly as if it were a regular BeautifulSoup instance. It will not allow you to wrap the well-formed document with a tag - it will raise a ValueError (just as regular BeautifulSoup does).

How does it work?

FragmentSoup wraps the incoming snippet in a dummy <fragmentsoup> tag that it removes (along with all context outside the <fragmentsoup> tag before rendering. Otherwise, it defers any attribute accesses to an internal BeautifulSoup instance.

Bugs

Aside from the differences noted above, any difference in behavior from regular BeautifulSoup4 is a bug. Reports and patches welcome.

Change Log

Version History

0.6.0
  • Initial release to Github and PyPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

fragmentsoup-0.6.1-py3-none-any.whl (3.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page