Benchmark for evaluating LLM understanding of web UI: SiFR vs HTML vs AXTree vs Screenshots

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Alechko375

These details have not been verified by PyPI

Project description

SiFR Benchmark

How well do AI agents understand web UI?

Benchmark comparing SiFR vs HTML vs AXTree vs Screenshots across 10 complex websites.

Results

Tested on 10 high-complexity sites: Amazon, YouTube, Reddit, eBay, Walmart, Airbnb, Yelp, IMDB, ESPN, GitHub.

Format	Accuracy	Tokens (avg)	Latency
SiFR	64.6%	25,512	7.5s
Screenshot	21.5%	37,765	8.0s
Raw HTML	4.7%	32,879	8.3s
AXTree	3.0%	5,289	1.9s

SiFR is 3x more accurate than screenshots and 14x more accurate than raw HTML.

Per-Site Breakdown

Site	SiFR	Screenshot	HTML	AXTree
GitHub	🏆 100%	0%	0%	0%
YouTube	🏆 100%	53.3%	0%	0%
Walmart	🏆 85.7%	30%	11.4%	0%
Reddit	🏆 83.3%	0%	0%	0%
eBay	🏆 71.4%	13.3%	0%	14.3%
Amazon	🏆 66.7%	25.7%	0%	0%
Airbnb	🏆 57.1%	0%	34.3%	0%
Yelp	🤝 50%	50%	0%	12.5%
ESPN	🏆 42.9%	0%	0%	0%
IMDB	0%	🏆 45%	0%	0%

SiFR wins on 9 out of 10 sites.

What is SiFR?

Structured Interface Format for Representation — a compact format optimized for LLM understanding of web UI.

a015:
  tag: a
  text: "Add to Cart"
  box: [500, 300, 120, 40]
  attrs: {href: "/cart/add", class: "btn-primary"}
  salience: high

Key advantages:

Compact: 10-20x smaller than raw HTML
Actionable IDs: Every element has a unique ID (a015, btn003)
Salience scoring: High/medium/low importance ranking
LLM-native: Structured for AI comprehension

Installation

pip install sifr-benchmark

Prerequisites

Element-to-LLM Chrome Extension — captures pages in SiFR format
- Chrome Web Store
- Or load unpacked from element-to-llm-chrome/

API Keys

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...  # optional

Playwright (for automated capture)
```
playwright install chromium
```

Quick Start

Full Benchmark (Recommended)

Capture → Generate Ground Truth → Test — all in one command:

sifr-bench full-benchmark-e2llm https://www.amazon.com https://www.youtube.com \
  -e /path/to/element-to-llm-extension \
  -s 400

Options:

-e, --extension — Path to E2LLM extension (required)
-s, --target-size — SiFR budget in KB (default: 100, max: 380)
-m, --models — Models to test (default: gpt-4o-mini)
-v, --verbose — Show detailed output

Other Commands

# List all benchmark runs
sifr-bench list-runs

# Compare multiple runs
sifr-bench compare benchmark_runs/run_1 benchmark_runs/run_2

# Validate SiFR files
sifr-bench validate examples/

# Show help
sifr-bench info

How It Works

1. Capture (E2LLM Extension)

The extension captures 4 formats simultaneously:

SiFR — Structured format with salience scoring
HTML — Raw rendered DOM (outerHTML)
AXTree — Playwright accessibility tree
Screenshot — Full-page PNG

2. Ground Truth Generation

GPT-4o Vision analyzes the screenshot + SiFR to generate tasks:

Click tasks: "Click the Sign In button" → a003
Input tasks: "Enter search query" → input001
Locate tasks: "Find the main heading" → h1001

3. Benchmark

Each format is tested against the same ground truth:

Question: "Click on the shopping cart icon"
Expected: a015
SiFR response: a015 ✓
HTML response: none ✗

Output Format

        Benchmark Results: Combined (10 sites)
┏━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┓
┃ Format     ┃ Accuracy ┃ Tokens ┃ Latency ┃ Status ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━┩
│ sifr       │    64.6% │ 25,512 │  7,511ms│   ✅   │
│ screenshot │    21.5% │ 37,765 │  8,039ms│   ⚠️   │
│ html_raw   │     4.7% │ 32,879 │  8,332ms│   ⚠️   │
│ axtree     │     3.0% │  5,289 │  1,876ms│   ⚠️   │
└────────────┴──────────┴────────┴─────────┴────────┘

Status icons:

✅ Success (accuracy ≥ 50%)
⚠️ Warning (accuracy < 50%)
❌ Failed (accuracy = 0%)

Run Directory Structure

Each benchmark creates an isolated run:

benchmark_runs/run_20251206_182941/
├── captures/
│   ├── sifr/*.sifr
│   ├── html/*.html
│   ├── axtree/*.json
│   └── screenshots/*.png
├── ground-truth/*.json
├── results/
│   ├── raw_results.json
│   └── summary.json
└── run_meta.json

Key Findings

SiFR dominates complex sites — 100% on GitHub/YouTube, 85%+ on Walmart/Reddit
Screenshots struggle with dense UI — Can't reliably identify elements
Raw HTML is unusable — Too large, no semantic structure for LLMs
AXTree IDs don't match — Own ID scheme incompatible with ground truth

Why IMDB Failed?

IMDB has the largest DOM (706KB SiFR, 2171KB HTML). Truncation to 97KB removes critical elements. This highlights the need for smarter budgeting in the E2LLM extension.

Tested Models

GPT-4o-mini (default)
GPT-4o
Claude 3.5 Sonnet
Claude 3 Haiku

Contributing

Add test sites: Run benchmark on more URLs
Improve ground truth: Manual verification of tasks
New models: Add support in models.py

Citation

@misc{sifr2025,
  title={SiFR: Structured Interface Format for AI Web Agents},
  author={SiFR Contributors},
  year={2025},
  url={https://github.com/Alechko375/sifr-benchmark}
}

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Alechko375

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.39

Dec 14, 2025

0.1.38

Dec 8, 2025

0.1.37

Dec 8, 2025

0.1.36

Dec 8, 2025

0.1.35

Dec 8, 2025

0.1.34

Dec 7, 2025

0.1.33

Dec 7, 2025

0.1.32

Dec 7, 2025

0.1.31

Dec 7, 2025

0.1.30

Dec 7, 2025

0.1.29

Dec 7, 2025

0.1.28

Dec 7, 2025

0.1.27

Dec 7, 2025

0.1.26

Dec 7, 2025

0.1.25

Dec 7, 2025

0.1.24

Dec 7, 2025

0.1.23

Dec 7, 2025

0.1.22

Dec 7, 2025

This version

0.1.21

Dec 6, 2025

0.1.20

Dec 6, 2025

0.1.19

Dec 6, 2025

0.1.18

Dec 6, 2025

0.1.17

Dec 6, 2025

0.1.15

Dec 5, 2025

0.1.14

Dec 5, 2025

0.1.13

Dec 5, 2025

0.1.12

Dec 5, 2025

0.1.11

Dec 5, 2025

0.1.10

Dec 5, 2025

0.1.9

Dec 5, 2025

0.1.8

Dec 5, 2025

0.1.4

Dec 3, 2025

0.1.3

Dec 3, 2025

0.1.1

Dec 3, 2025

0.1.0

Dec 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sifr_benchmark-0.1.21.tar.gz (43.0 kB view details)

Uploaded Dec 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sifr_benchmark-0.1.21-py3-none-any.whl (31.5 kB view details)

Uploaded Dec 6, 2025 Python 3

File details

Details for the file sifr_benchmark-0.1.21.tar.gz.

File metadata

Download URL: sifr_benchmark-0.1.21.tar.gz
Upload date: Dec 6, 2025
Size: 43.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sifr_benchmark-0.1.21.tar.gz
Algorithm	Hash digest
SHA256	`61d72bb6291c4caa6311a22ebac52c492a776f7f2800b37c9b61cf01939267d5`
MD5	`7670af2d28a060040785fb6a2136313e`
BLAKE2b-256	`3b10ae1e90da2a5c52baac2c0beb85aea48c8fbec2efa566102d6db3f7dce578`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sifr_benchmark-0.1.21.tar.gz:

Publisher: benchmark.yml on Alechko375/sifr-benchmark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sifr_benchmark-0.1.21.tar.gz
- Subject digest: 61d72bb6291c4caa6311a22ebac52c492a776f7f2800b37c9b61cf01939267d5
- Sigstore transparency entry: 747286082
- Sigstore integration time: Dec 6, 2025
Source repository:
- Permalink: Alechko375/sifr-benchmark@927430c2d23970936a001fe330c3af291fbd1867
- Branch / Tag: refs/tags/v0.1.21
- Owner: https://github.com/Alechko375
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: benchmark.yml@927430c2d23970936a001fe330c3af291fbd1867
- Trigger Event: push

File details

Details for the file sifr_benchmark-0.1.21-py3-none-any.whl.

File metadata

Download URL: sifr_benchmark-0.1.21-py3-none-any.whl
Upload date: Dec 6, 2025
Size: 31.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sifr_benchmark-0.1.21-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c8171498474a8dc317777de4bf3efc711c94196bee75641e6be782972c2938e4`
MD5	`470ded003ae662094da0945a0f761c3d`
BLAKE2b-256	`b11ad3b61e486c3fcbbb24d8db780441da170bef1bd7d9c97d6707d022adbb75`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sifr_benchmark-0.1.21-py3-none-any.whl:

Publisher: benchmark.yml on Alechko375/sifr-benchmark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sifr_benchmark-0.1.21-py3-none-any.whl
- Subject digest: c8171498474a8dc317777de4bf3efc711c94196bee75641e6be782972c2938e4
- Sigstore transparency entry: 747286085
- Sigstore integration time: Dec 6, 2025
Source repository:
- Permalink: Alechko375/sifr-benchmark@927430c2d23970936a001fe330c3af291fbd1867
- Branch / Tag: refs/tags/v0.1.21
- Owner: https://github.com/Alechko375
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: benchmark.yml@927430c2d23970936a001fe330c3af291fbd1867
- Trigger Event: push

sifr-benchmark 0.1.21

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SiFR Benchmark

Results

Per-Site Breakdown

What is SiFR?

Installation

Prerequisites

Quick Start

Full Benchmark (Recommended)

Other Commands

How It Works

1. Capture (E2LLM Extension)

2. Ground Truth Generation

3. Benchmark

Output Format

Run Directory Structure

Key Findings

Why IMDB Failed?

Tested Models

Contributing

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance