Skip to main content

Parse and query SOM (Semantic Object Model) - the structured web format for AI agents

Project description

som-parser

Parse and query SOM (Semantic Object Model) output in Python. SOM is a structured JSON format that represents web pages as semantic regions and elements, designed for AI agents, browser automation, and web scraping. This library provides Pydantic v2 models for type-safe parsing, validation, and a rich set of query utilities to extract exactly what you need.

Install

pip install som-parser

Quick Start

Parse Plasmate output

import subprocess
from som_parser import parse_som, from_plasmate

# Parse a SOM JSON string or dict
som = parse_som('{"som_version": "0.1", ...}')

# Or parse raw Plasmate CLI output directly
result = subprocess.run(["plasmate", "https://example.com"], capture_output=True, text=True)
som = from_plasmate(result.stdout)

print(som.title)       # "Example Domain"
print(som.url)         # "https://example.com/"
print(som.som_version) # "0.1"

Find links

from som_parser import parse_som, get_links, find_by_role

som = parse_som(data)

# Get all links as simple dicts
for link in get_links(som):
    print(f"{link['text']} -> {link['href']}")

# Or find by role for full SomElement objects
for el in find_by_role(som, "link"):
    print(el.id, el.text, el.attrs.href)

Get interactive elements

from som_parser import parse_som, get_interactive_elements

som = parse_som(data)
for el in get_interactive_elements(som):
    print(f"{el.id}: {el.role.value} - actions: {[a.value for a in el.actions]}")

Convert to markdown

from som_parser import parse_som, to_markdown

som = parse_som(data)
print(to_markdown(som))

Use Pydantic models directly

from som_parser import Som, SomElement, ElementRole

# Validate and construct from a dict
som = Som.model_validate(my_dict)

# Access typed fields
for region in som.regions:
    for element in region.elements:
        if element.role == ElementRole.LINK:
            print(element.attrs.href)

# Serialize back to JSON
print(som.model_dump_json(indent=2))

API Reference

Parser

Function Description
parse_som(input: str | dict) -> Som Parse JSON string or dict into a validated Som object
is_valid_som(input) -> bool Check if input conforms to the SOM schema
from_plasmate(json_output: str) -> Som Parse raw Plasmate CLI JSON output

Query Utilities

Function Description
get_all_elements(som) -> list[SomElement] Flatten all elements from all regions
find_by_role(som, role) -> list[SomElement] Find elements by role (enum or string)
find_by_id(som, id) -> SomElement | None Find a single element by its SOM id
find_by_text(som, text, exact=False) -> list[SomElement] Search elements by text content
get_interactive_elements(som) -> list[SomElement] Get elements that have actions
get_links(som) -> list[dict] Extract all links as {text, href, id} dicts
get_forms(som) -> list[SomRegion] Get all form regions
get_inputs(som) -> list[SomElement] Get all input elements
get_headings(som) -> list[dict] Extract heading hierarchy as {level, text, id}
get_text(som) -> str Extract all visible text content
get_text_by_region(som) -> list[dict] Extract text grouped by region
get_compression_ratio(som) -> float Return html_bytes / som_bytes
to_markdown(som) -> str Convert SOM to readable markdown
filter_elements(som, predicate) -> list[SomElement] Generic filter with a callable

Types

All Pydantic v2 models are exported from the top level:

  • Som, SomRegion, SomElement, SomElementAttrs, SomMeta
  • StructuredData, LinkElement, SelectOption, ListItem
  • RegionRole, ElementRole, ElementAction, SemanticHint (enums)

Links

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

som_parser-0.3.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

som_parser-0.3.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file som_parser-0.3.0.tar.gz.

File metadata

  • Download URL: som_parser-0.3.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for som_parser-0.3.0.tar.gz
Algorithm Hash digest
SHA256 79d222287dd265fb2c87e1b805494467d9649f8abd58b5ab46436cfa11fb6480
MD5 60e435d972a0a68c31ba9daed6d2bee2
BLAKE2b-256 16a0831a0d60dcd67221a056bb9e177c4d098737cd08bd8f6997076da50df820

See more details on using hashes here.

File details

Details for the file som_parser-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: som_parser-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for som_parser-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 050a626b3d6ca13ed7ad10592944e4235d1be7e9ee2b0fe4e4548050af56f4ae
MD5 c48055bfb1584ee983e1235c18bbdd6e
BLAKE2b-256 288c2c915476bc0765eafb9bc79b85b28aed108bd960fa36258be7ae3895fc20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page