Simple Python helper library that can significantly reduce LLM input tokens by removing unnecessary page code

These details have not been verified by PyPI

Project links

Project description

html-shrinker

Simple Python helper library that can significantly reduce LLM input tokens by removing unnecessary page code (configurable).

AI scraping usually involves sending the whole page code to an LLM + instructions + output format. 99.9% of the time the information needed is somewhere in the body tag of the page, thus we can safely remove the whole head tag which contains a ton of styles and scripts and metadata that are not needed. This alone reduces the tokens/costs significantly. Further optimizations can be made, like removing specific html tags, attributes or even the innertext.

What it does

Removes noisy tags/attributes or keeps only a whitelist
Strips inner text
Removes comments
Flattens repeated single-child div > div wrappers, even if they are nested many levels deep
Collapses whitespace between tags

Quick install

pip install html-shrinker

Quick start

from html_shrinker import HTMLShrinker
from html_shrinker.defaults import tags

raw_html = """
<html>
  <head><script>ignore me</script></head>
  <body>
    <div><div><p>Hello world</p></div></div>
    <script>alert("x")</script>
  </body>
</html>
"""

shrinker = HTMLShrinker(
    tags=list(tags),
)
result = shrinker.shrink(raw_html)
print(result)

API

from html_shrinker import HTMLShrinker

shrinker = HTMLShrinker(
    tag_mode="remove",
    tags=["script", "style", "head"],
    attribute_mode="remove",
    attributes=["class", "id", "style"],
    strip_innertext=False,
    remove_comments=True,
    flatten_single_child_divs=True,
    collapse_between_tags=True,
)

output = shrinker.shrink("<html>...</html>")

Default presets are available from:

from html_shrinker.defaults import tags, arguments

Invalid HTML input raises InvalidHTMLInputError:

from html_shrinker import HTMLShrinker, InvalidHTMLInputError

try:
    HTMLShrinker().shrink("<div>fragment</div>")
except InvalidHTMLInputError as exc:
    print(exc)

Configuration

HTMLShrinker(...) constructor parameters:

tag_mode: "remove" or "keep" (default: "remove")
tags: list[str]
- If tag_mode="remove": these tags are removed.
- If tag_mode="keep": only these tags are kept.
attribute_mode: "remove" or "keep" (default: "remove")
attributes: list[str]
- If attribute_mode="remove": these attributes are removed.
- If attribute_mode="keep": only these attributes are kept.
strip_innertext: bool (default: False)
- If True: removes text nodes.
- Example: <p>secret</p> becomes <p></p>.
remove_comments: bool (default: True)
- If True: removes HTML comments such as .
flatten_single_child_divs: bool (default: True)
- If True: flattens nested div > div wrappers when a div contains only one child div.
- This is applied repeatedly, so a deep chain like <div><div><div><p>...</p></div></div></div> becomes <div><p>...</p></div>.
- These large div chains appear very often when shrinking aggresively.
collapse_between_tags: bool (default: True)
- If True: removes whitespace between tags, so > < becomes ><.

Notes:

Default tags is empty ([]), so no tags are removed by default.
Default attributes is empty ([]), so no attributes are removed by default.
If tag_mode="keep", tags must be non-empty.
If attribute_mode="keep", attributes must be non-empty.

Usage patterns

1) Remove mode (default)

from html_shrinker import HTMLShrinker

shrinker = HTMLShrinker(
    tag_mode="remove",
    tags=["script", "style", "head"],
    attribute_mode="remove",
    attributes=["class", "id", "style"],
)
clean = shrinker.shrink(raw_html)

2) Keep mode

from html_shrinker import HTMLShrinker

shrinker = HTMLShrinker(
    tag_mode="keep",
    tags=["main", "article", "h1", "h2", "p", "ul", "ol", "li", "a"],
    attribute_mode="keep",
    attributes=["href"],
)
clean = shrinker.shrink(raw_html)

3) Strip inner text

from html_shrinker import HTMLShrinker

shrinker = HTMLShrinker(strip_innertext=True)
clean = shrinker.shrink("<html><body><p>secret text</p></body></html>")
# <html><body><p></p></body></html>

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_shrinker-0.1.0.tar.gz (5.7 kB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

html_shrinker-0.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file html_shrinker-0.1.0.tar.gz.

File metadata

Download URL: html_shrinker-0.1.0.tar.gz
Upload date: Mar 10, 2026
Size: 5.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for html_shrinker-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`98d3e96700e7c9c5f68082c5b0804524e7a560eb988423b8668dbfbc4b71e939`
MD5	`328176c7cb90cceeda5b8ff00be7e729`
BLAKE2b-256	`ca96a60b54edf40774d0ef7fdcbc0e0638b83fc0abc9154fee0df89f4ef23876`

See more details on using hashes here.

File details

Details for the file html_shrinker-0.1.0-py3-none-any.whl.

File metadata

Download URL: html_shrinker-0.1.0-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 6.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for html_shrinker-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`28c9b2e3a1dffc75feeefc4d9bd5027baab6e4a71f2a61030351784af753b7e8`
MD5	`a073aebc8dc12c3bbc598101eb80cce2`
BLAKE2b-256	`60ee7ac93e9ed1395795e7dd3cf38274ec42062a9e1bbd222921600968be03c5`

See more details on using hashes here.

html-shrinker 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

html-shrinker

What it does

Quick install

Quick start

API

Configuration

Usage patterns

1) Remove mode (default)

2) Keep mode

3) Strip inner text

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes