Skip to main content

A type-safe wrapper around BeautifulSoup and related HTML parsing utilities

Project description

typed-soup

A type-safe wrapper around BeautifulSoup and utilities for parsing HTML. Extracted from Open-Gov Crawlers.

Motivation

This is an example from production code.

Before

Before

Here are the first five errors. There are 16 in total.

  error: Type of "rows" is partially unknown
    Type of "rows" is "list[PageElement | Tag | NavigableString] | Unknown" (reportUnknownVariableType)
  error: Type of "find_all" is partially unknown
    Type of "find_all" is "Unknown | ((name: str | bytes | Pattern[str] | bool | ((Tag) -> bool) | Iterable[str | bytes | Pattern[str] | bool | ((Tag) -> bool)] | ElementFilter | None = None, attrs: Dict[str, str | bytes | Pattern[str] | bool | ((str) -> bool) | Iterable[str | bytes | Pattern[str] | bool | ((str) -> bool)]] = {}, recursive: bool = True, string: str | bytes | Pattern[str] | bool | ((str) -> bool) | Iterable[str | bytes | Pattern[str] | bool | ((str) -> bool)] | None = None, limit: int | None = None, _stacklevel: int = 2, **kwargs: str | bytes | Pattern[str] | bool | ((str) -> bool) | Iterable[str | bytes | Pattern[str] | bool | ((str) -> bool)]) -> ResultSet[PageElement | Tag | NavigableString])" (reportUnknownMemberType)
  error: Cannot access attribute "find_all" for class "PageElement"
    Attribute "find_all" is unknown (reportAttributeAccessIssue)
  error: Cannot access attribute "find_all" for class "NavigableString"
    Attribute "find_all" is unknown (reportAttributeAccessIssue)
  error: Type of "row" is partially unknown
    Type of "row" is "PageElement | Tag | NavigableString | Unknown" (reportUnknownVariableType)

After

Switching out BeautifulSoup for TypedSoup provides type knowledge to the checker and IDE:

After

Installation

pip install typed-soup

Quick Start

from typed_soup import TypedSoup
from bs4 import BeautifulSoup

# Create a type-safe soup object
soup = TypedSoup(BeautifulSoup("<div>Hello <span>World</span></div>", "html.parser"))

# Find elements with type safety
element = soup.find("span")
if element:
    print(element.get_text())  # Type-safe: IDE knows this returns str

Usage

Wrap a BeautifulSoup object in TypedSoup to add type safety:

from typed_soup import TypedSoup
from bs4 import BeautifulSoup

soup = TypedSoup(BeautifulSoup(html_content, "html.parser"))

Supported Functions

I'm adding functions as I need them. If you have a request, please open an issue. These are the ones that I needed for a dozen spiders:

  • find
  • find_all
  • __call__ (implicit find_all, e.g. soup("p") - standard BeautifulSoup API)
  • get_text
  • children
  • tag_name
  • parent
  • next_sibling
  • get_content_after_element
  • string

And then these help create a TypedSoup object:

  • TypedSoup

Type Safety Benefits

  • All methods return properly typed results
  • No more None surprises - optional values are properly typed and described in the function signatures
  • IDE autocomplete support for all methods
  • Static type checking support with mypy/pyright
  • Runtime type validation for BeautifulSoup results

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

typed_soup-1.0.0.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

typed_soup-1.0.0-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file typed_soup-1.0.0.tar.gz.

File metadata

  • Download URL: typed_soup-1.0.0.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.3 Darwin/24.6.0

File hashes

Hashes for typed_soup-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1b4ce8fcbf552b283a986ee2a06193af50129f6195e6a6c4f18fe3ab86477066
MD5 cafb2e6640798b214cb9d1676b1018a5
BLAKE2b-256 4abfbfcdad42b3fea61c7867ad7e840816a9d72f2c8711affaac7f735f9ec409

See more details on using hashes here.

File details

Details for the file typed_soup-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: typed_soup-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.3 Darwin/24.6.0

File hashes

Hashes for typed_soup-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b4d1194f3e90ad63d71dae010f2871290f7b8be7a08d4b6dd68093914da74576
MD5 48aa9c921259c8e2431730bcc51911c0
BLAKE2b-256 fd705889376d6d8e19a7323b955d300acdd4fa8c20c49b30a53611a43d77e820

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page