Skip to main content

A wrapper for BeautifulSoup4 that restores the ability to work with HTML fragments

Project description

This is a thin wrapper for BeautifulSoup4 that restores the ability to work with HTML fragments. For example:

from bs4 import BeautifulSoup
from fragmentsoup import FragmentSoup
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>', features='html5lib')
soup
# <html><head></head><body><b class="boldest">Extremely bold</b></body></html>
# Note that the fragment is wrapped to make it a valid html document

soup = FragmentSoup('<b class="boldest">Extremely bold</b>', features='html5lib')
soup
# <b class="boldest">Extremely bold</b>
# FragmentSoup keeps it as a fragment

In almost all cases, a FragmentSoup instance should work exactly the same as a BeautifulSoup instance. The one notable exception is that calling ‘wrap’ on a Fragment itself will wrap the entire Fragment and return itself:

from fragmentsoup import FragmentSoup
soup = FragmentSoup('<b class="boldest">Extremely bold</b>', features='html5lib')
soup
# <b class="boldest">Extremely bold</b>

soup.wrap(soup.new_tag('div')
# <div><b class="boldest">Extremely bold</b></div>

If you wrap a subelement, it returns a BeautifulSoup “Tag” instance. If you want to use the returned wrapped subelement as a Fragment, you will need to wrap the returned Tag instance to use it as a fragment:

from fragmentsoup import FragmentSoup
soup = FragmentSoup('<div><b class="boldest">Extremely bold</b></div>', features='html5lib')
subdocument = soup.b.wrap(soup.new_tag('h1'))
subdocument
# <h1><b class="boldest">Extremely bold</b></h1>
type(subdocument)
# <class 'bs4.element.Tag'>

subdocument = FragmentSoup(subdocument)
type(subdocument)
# <class 'fragmentsoup.FragmentSoup'>

This also applies to Tags returned as a result of unwrapping a part of the document.

What if I pass in a well-formed document?

If you pass in a full document (which is defined as starting with a <!DOCTYPE> or <html> tag), then FragmentSoup assumes that the resulting tree is well-formed and it acts exactly as if it were a regular BeautifulSoup instance. It will not allow you to wrap the well-formed document with a tag - it will raise a ValueError (just as regular BeautifulSoup does).

How does it work?

FragmentSoup wraps the incoming snippet in a dummy <fragmentsoup> tag that it removes (along with all context outside the <fragmentsoup> tag before rendering. Otherwise, it defers any attribute accesses to an internal BeautifulSoup instance.

Bugs

Aside from the differences noted above, any difference in behavior from regular BeautifulSoup4 is a bug. Reports and patches welcome.

Change Log

Version History

0.6.0
  • Initial release to Github and PyPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

fragmentsoup-0.6.1-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file fragmentsoup-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: fragmentsoup-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/51.3.3 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.1+

File hashes

Hashes for fragmentsoup-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ea894a2b52ce2b8efdb8e2478f948b57629a8ea9f435cd8328272150d94edb6a
MD5 4f8cac6707fe8cc840679ad1a9de9644
BLAKE2b-256 f8942a5d403475873dafb5ee58d2eedfdb0c6e8262b1537403e2a6ccceaa375f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page