Benchmark for evaluating LLM understanding of web UI: SiFR vs HTML vs AXTree vs Screenshots

These details have not been verified by PyPI

Project links

Project description

sifr-benchmark

How well do AI agents understand web UI?
Benchmark comparing SiFR vs HTML vs AXTree vs Screenshots.

Prerequisites

Element-to-LLM Chrome Extension

To capture web pages in SiFR format, install the Element-to-LLM browser extension:

Chrome Web Store: Element-to-LLM
Open any webpage
Click extension icon → Capture as SiFR
Save the .sifr file to examples/ or datasets/formats/sifr/

Without this extension, you can only run benchmarks on pre-captured pages.

Results

Format	Tokens (avg)	Accuracy	Cost/Task
SiFR	2,100	89%	$0.002
Screenshot	4,200	71%	$0.012
AXTree	3,800	52%	$0.004
Raw HTML	8,500	45%	$0.008

→ SiFR: 75% fewer tokens, 2x accuracy vs HTML

What is SiFR?

Structured Interface Format for Representation.
A compact way to describe web UI for LLMs.

btn015:
  type: button
  text: "Add to Cart"
  position: [500, 300, 120, 40]
  state: enabled
  parent: product-card

Full spec: SPEC.md

Installation

pip install sifr-benchmark

Quick Start

1. Capture pages (using Element-to-LLM extension)

Install Element-to-LLM extension
Open target page (e.g., Amazon product page)
Click extension → Export SiFR
Save as examples/my_page.sifr

2. Run benchmark

# Set API keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

# Run benchmark
sifr-bench run --models gpt-4o-mini,claude-haiku --formats sifr,html_raw

# Validate your SiFR files
sifr-bench validate examples/

# View info
sifr-bench info

Repository Structure

├── spec/
│   └── SPEC.md              # SiFR format specification
├── benchmark/
│   ├── protocol.md          # Test methodology
│   ├── tasks.json           # 25 standardized tasks
│   └── ground-truth/        # Verified answers per page
├── datasets/
│   ├── pages/               # Test page snapshots
│   │   ├── ecommerce/
│   │   ├── news/
│   │   ├── saas/
│   │   └── forms/
│   └── formats/             # Same page in each format
│       ├── sifr/
│       ├── html/
│       ├── axtree/
│       └── screenshots/
├── results/
│   ├── raw/                 # Model responses
│   └── analysis/            # Processed results
├── src/
│   └── runner.js            # Benchmark execution
└── examples/
    └── product_page.sifr    # Sample SiFR file

Tested Models

GPT-4o (OpenAI)
Claude 3.5 Sonnet (Anthropic)
Gemini 2.0 Flash (Google)
Llama 3.3 70B (Meta)
Qwen 2.5 72B (Alibaba)

Key Findings

Token efficiency: SiFR uses 70-80% fewer tokens than raw HTML
Accuracy: Pre-computed salience improves task accuracy by 40%+
Consistency: SiFR results have 3x lower variance across models
Edge-ready: SiFR enables UI tasks on 3B parameter models

Contribute

Add test pages: datasets/pages/
Add tasks: benchmark/tasks.json
Run on new models: src/runner.js

Citation

@misc{sifr2024,
  title={SiFR: Structured Interface Format for AI Agents},
  author={SiFR Contributors},
  year={2024},
  url={https://github.com/user/sifr-benchmark}
}

License

MIT — format is open.

SiFR Spec | Extension | Discord

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.39

Dec 14, 2025

0.1.38

Dec 8, 2025

0.1.37

Dec 8, 2025

0.1.36

Dec 8, 2025

0.1.35

Dec 8, 2025

0.1.34

Dec 7, 2025

0.1.33

Dec 7, 2025

0.1.32

Dec 7, 2025

0.1.31

Dec 7, 2025

0.1.30

Dec 7, 2025

0.1.29

Dec 7, 2025

0.1.28

Dec 7, 2025

0.1.27

Dec 7, 2025

0.1.26

Dec 7, 2025

0.1.25

Dec 7, 2025

0.1.24

Dec 7, 2025

0.1.23

Dec 7, 2025

0.1.22

Dec 7, 2025

0.1.21

Dec 6, 2025

0.1.20

Dec 6, 2025

0.1.19

Dec 6, 2025

0.1.18

Dec 6, 2025

0.1.17

Dec 6, 2025

0.1.15

Dec 5, 2025

0.1.14

Dec 5, 2025

0.1.13

Dec 5, 2025

0.1.12

Dec 5, 2025

0.1.11

Dec 5, 2025

0.1.10

Dec 5, 2025

0.1.9

Dec 5, 2025

0.1.8

Dec 5, 2025

This version

0.1.4

Dec 3, 2025

0.1.3

Dec 3, 2025

0.1.1

Dec 3, 2025

0.1.0

Dec 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sifr_benchmark-0.1.4.tar.gz (34.7 kB view details)

Uploaded Dec 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sifr_benchmark-0.1.4-py3-none-any.whl (21.8 kB view details)

Uploaded Dec 3, 2025 Python 3

File details

Details for the file sifr_benchmark-0.1.4.tar.gz.

File metadata

Download URL: sifr_benchmark-0.1.4.tar.gz
Upload date: Dec 3, 2025
Size: 34.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sifr_benchmark-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`3a0330b3b9e58a8336864851a9dd1b4835e25588f9c0071b15ca304ee5cd3a7e`
MD5	`c66c64655841a7a1a4e146fae285abf1`
BLAKE2b-256	`9ffa0a5ba79982c6c5e767c17f3d3a8661af081f7c2d3278a074c8bdad008331`

See more details on using hashes here.

File details

Details for the file sifr_benchmark-0.1.4-py3-none-any.whl.

File metadata

Download URL: sifr_benchmark-0.1.4-py3-none-any.whl
Upload date: Dec 3, 2025
Size: 21.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sifr_benchmark-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b688e3f2f9651b44f45049a36ab74607d8d3643c66c004b4a63e8499b1841b7`
MD5	`abcb13baac8cff0c32ce672dad981277`
BLAKE2b-256	`446be5a88ab0519315a5caa3d8dcdd039499c8fe4b4a879294630bf794a13f11`

See more details on using hashes here.

sifr-benchmark 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sifr-benchmark

Prerequisites

Element-to-LLM Chrome Extension

Results

What is SiFR?

Installation

Quick Start

1. Capture pages (using Element-to-LLM extension)

2. Run benchmark

Repository Structure

Tested Models

Key Findings

Contribute

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes