A type-safe wrapper around BeautifulSoup and related HTML parsing utilities

These details have not been verified by PyPI

Project links

Project description

typed-soup

A type-safe wrapper around BeautifulSoup and utilities for parsing HTML/XML with robust return types and error handling. Extracted from Open-Gov Crawlers.

Motivation

Before

Here are the first five errors. There are 16 in total.

  error: Type of "rows" is partially unknown
    Type of "rows" is "list[PageElement | Tag | NavigableString] | Unknown" (reportUnknownVariableType)
  error: Type of "find_all" is partially unknown
    Type of "find_all" is "Unknown | ((name: str | bytes | Pattern[str] | bool | ((Tag) -> bool) | Iterable[str | bytes | Pattern[str] | bool | ((Tag) -> bool)] | ElementFilter | None = None, attrs: Dict[str, str | bytes | Pattern[str] | bool | ((str) -> bool) | Iterable[str | bytes | Pattern[str] | bool | ((str) -> bool)]] = {}, recursive: bool = True, string: str | bytes | Pattern[str] | bool | ((str) -> bool) | Iterable[str | bytes | Pattern[str] | bool | ((str) -> bool)] | None = None, limit: int | None = None, _stacklevel: int = 2, **kwargs: str | bytes | Pattern[str] | bool | ((str) -> bool) | Iterable[str | bytes | Pattern[str] | bool | ((str) -> bool)]) -> ResultSet[PageElement | Tag | NavigableString])" (reportUnknownMemberType)
  error: Cannot access attribute "find_all" for class "PageElement"
    Attribute "find_all" is unknown (reportAttributeAccessIssue)
  error: Cannot access attribute "find_all" for class "NavigableString"
    Attribute "find_all" is unknown (reportAttributeAccessIssue)
  error: Type of "row" is partially unknown
    Type of "row" is "PageElement | Tag | NavigableString | Unknown" (reportUnknownVariableType)

After

Changing one line of code to use TypedSoup instead of BeautifulSoup resolves the errors:

After

Installation

pip install typed-soup

Usage

If you're using Scrapy, you can use the from_response function to create a TypedSoup object from a Scrapy response:

from typed_soup import from_response
from scrapy.http.response.html import HtmlResponse

# Assume 'response' is an HtmlResponse object
soup = from_response(response)

# Find an element
element = soup.find("div", class_="example")
if element:
    print(element.get_text())

# Find all elements
elements = soup.find_all("p")
for elem in elements:
    print(elem.get_text())

Or, without Scrapy, you can explicity wrap a BeautifulSoup object in TypedSoup:

from typed_soup import TypedSoup
from bs4 import BeautifulSoup

soup = TypedSoup(BeautifulSoup(html_content, "html.parser"))

Supported Functions

I'm adding functions as I need them. If you have a request, please open an issue. These are the ones that I needed for a dozen spiders:

find
find_all
get_text
children
tag_name
parent
next_sibling
get_content_after_element
string

And then these help create a TypedSoup object:

from_response
TypedSoup

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

Nov 7, 2025

0.1.6

May 25, 2025

0.1.5

May 24, 2025

This version

0.1.4

May 24, 2025

0.1.2

May 24, 2025

0.1.1

May 24, 2025

0.1.0

May 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

typed_soup-0.1.4.tar.gz (3.2 kB view details)

Uploaded May 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

typed_soup-0.1.4-py3-none-any.whl (3.8 kB view details)

Uploaded May 24, 2025 Python 3

File details

Details for the file typed_soup-0.1.4.tar.gz.

File metadata

Download URL: typed_soup-0.1.4.tar.gz
Upload date: May 24, 2025
Size: 3.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.13.3 Darwin/24.4.0

File hashes

Hashes for typed_soup-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`34dda006d6ed45fe3aca6ba4ed9fa2fbdb549b7bf4f86d2fa7cd8cf3639c09ad`
MD5	`b78365afcaba12d8eb96040e3ceefa22`
BLAKE2b-256	`cd13477576e8f78bfc80abb3c680c42b1fe20933cdd921deb8d101b1370656ba`

See more details on using hashes here.

File details

Details for the file typed_soup-0.1.4-py3-none-any.whl.

File metadata

Download URL: typed_soup-0.1.4-py3-none-any.whl
Upload date: May 24, 2025
Size: 3.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.13.3 Darwin/24.4.0

File hashes

Hashes for typed_soup-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`11c26ef036cc2acf2550b3166845cbc8384e9ac6422431bb50f1b1637d705b84`
MD5	`18518aa2d8c3c363fcd11bd2a5bf1b69`
BLAKE2b-256	`ef61a921b22dfa85e2364e53ad461c570e61b75a6b2e695d1028e37dc9bf528e`

See more details on using hashes here.

typed-soup 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

typed-soup

Motivation

Before

After

Installation

Usage

Supported Functions

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes