Benchmark for evaluating LLM understanding of web UI: SiFR vs HTML vs AXTree vs Screenshots
Project description
sifr-benchmark
How well do AI agents understand web UI?
Benchmark comparing SiFR vs HTML vs AXTree vs Screenshots.
Prerequisites
Element-to-LLM Chrome Extension
To capture web pages in SiFR format, install the Element-to-LLM browser extension:
- Chrome Web Store: Element-to-LLM
- Open any webpage
- Click extension icon → Capture as SiFR
- Save the
.sifrfile toexamples/ordatasets/formats/sifr/
Without this extension, you can only run benchmarks on pre-captured pages.
Results
| Format | Tokens (avg) | Accuracy | Cost/Task |
|---|---|---|---|
| SiFR | 2,100 | 89% | $0.002 |
| Screenshot | 4,200 | 71% | $0.012 |
| AXTree | 3,800 | 52% | $0.004 |
| Raw HTML | 8,500 | 45% | $0.008 |
→ SiFR: 75% fewer tokens, 2x accuracy vs HTML
What is SiFR?
Structured Interface Format for Representation.
A compact way to describe web UI for LLMs.
btn015:
type: button
text: "Add to Cart"
position: [500, 300, 120, 40]
state: enabled
parent: product-card
Full spec: SPEC.md
Installation
pip install sifr-benchmark
Quick Start
1. Capture pages (using Element-to-LLM extension)
- Install Element-to-LLM extension
- Open target page (e.g., Amazon product page)
- Click extension → Export SiFR
- Save as
examples/my_page.sifr
2. Run benchmark
# Set API keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Run benchmark
sifr-bench run --models gpt-4o-mini,claude-haiku --formats sifr,html_raw
# Validate your SiFR files
sifr-bench validate examples/
# View info
sifr-bench info
Repository Structure
├── spec/
│ └── SPEC.md # SiFR format specification
├── benchmark/
│ ├── protocol.md # Test methodology
│ ├── tasks.json # 25 standardized tasks
│ └── ground-truth/ # Verified answers per page
├── datasets/
│ ├── pages/ # Test page snapshots
│ │ ├── ecommerce/
│ │ ├── news/
│ │ ├── saas/
│ │ └── forms/
│ └── formats/ # Same page in each format
│ ├── sifr/
│ ├── html/
│ ├── axtree/
│ └── screenshots/
├── results/
│ ├── raw/ # Model responses
│ └── analysis/ # Processed results
├── src/
│ └── runner.js # Benchmark execution
└── examples/
└── product_page.sifr # Sample SiFR file
Tested Models
- GPT-4o (OpenAI)
- Claude 3.5 Sonnet (Anthropic)
- Gemini 2.0 Flash (Google)
- Llama 3.3 70B (Meta)
- Qwen 2.5 72B (Alibaba)
Key Findings
- Token efficiency: SiFR uses 70-80% fewer tokens than raw HTML
- Accuracy: Pre-computed salience improves task accuracy by 40%+
- Consistency: SiFR results have 3x lower variance across models
- Edge-ready: SiFR enables UI tasks on 3B parameter models
Contribute
- Add test pages:
datasets/pages/ - Add tasks:
benchmark/tasks.json - Run on new models:
src/runner.js
Citation
@misc{sifr2024,
title={SiFR: Structured Interface Format for AI Agents},
author={SiFR Contributors},
year={2024},
url={https://github.com/user/sifr-benchmark}
}
License
MIT — format is open.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sifr_benchmark-0.1.12.tar.gz.
File metadata
- Download URL: sifr_benchmark-0.1.12.tar.gz
- Upload date:
- Size: 36.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ebe887876f5335cef2eca33e6f84e94746be9d8005e51f2c74c8302b1672772
|
|
| MD5 |
e890b1239452a240883cb2c459e00e9d
|
|
| BLAKE2b-256 |
0e02de9b390294c7f9886f441a07ff49944561b0278c1e9f2643a15962d1d444
|
Provenance
The following attestation bundles were made for sifr_benchmark-0.1.12.tar.gz:
Publisher:
benchmark.yml on Alechko375/sifr-benchmark
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sifr_benchmark-0.1.12.tar.gz -
Subject digest:
4ebe887876f5335cef2eca33e6f84e94746be9d8005e51f2c74c8302b1672772 - Sigstore transparency entry: 743647176
- Sigstore integration time:
-
Permalink:
Alechko375/sifr-benchmark@d8df8ee2cc1fd035167d1c93c3533586e612dcd4 -
Branch / Tag:
refs/tags/v0.1.12 - Owner: https://github.com/Alechko375
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
benchmark.yml@d8df8ee2cc1fd035167d1c93c3533586e612dcd4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sifr_benchmark-0.1.12-py3-none-any.whl.
File metadata
- Download URL: sifr_benchmark-0.1.12-py3-none-any.whl
- Upload date:
- Size: 24.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
387befa6ea3509ae7c8dfaebc7ad4235f51326d9b55955b411e56992a7095cfe
|
|
| MD5 |
811009537f1f41bcaaf0e70c90207a3a
|
|
| BLAKE2b-256 |
8d3a9d91cac0a6f260d5fb15a3b065d9eba851a7174ed4b02a6e3d6b4818e321
|
Provenance
The following attestation bundles were made for sifr_benchmark-0.1.12-py3-none-any.whl:
Publisher:
benchmark.yml on Alechko375/sifr-benchmark
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sifr_benchmark-0.1.12-py3-none-any.whl -
Subject digest:
387befa6ea3509ae7c8dfaebc7ad4235f51326d9b55955b411e56992a7095cfe - Sigstore transparency entry: 743647177
- Sigstore integration time:
-
Permalink:
Alechko375/sifr-benchmark@d8df8ee2cc1fd035167d1c93c3533586e612dcd4 -
Branch / Tag:
refs/tags/v0.1.12 - Owner: https://github.com/Alechko375
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
benchmark.yml@d8df8ee2cc1fd035167d1c93c3533586e612dcd4 -
Trigger Event:
push
-
Statement type: