Skip to main content

Benchmark for evaluating LLM understanding of web UI: SiFR vs HTML vs AXTree vs Screenshots

Project description

sifr-benchmark

How well do AI agents understand web UI?
Benchmark comparing SiFR vs HTML vs AXTree vs Screenshots.

Prerequisites

Element-to-LLM Chrome Extension

To capture web pages in SiFR format, install the Element-to-LLM browser extension:

  1. Chrome Web Store: Element-to-LLM
  2. Open any webpage
  3. Click extension icon → Capture as SiFR
  4. Save the .sifr file to examples/ or datasets/formats/sifr/

Without this extension, you can only run benchmarks on pre-captured pages.

Results

Format Tokens (avg) Accuracy Cost/Task
SiFR 2,100 89% $0.002
Screenshot 4,200 71% $0.012
AXTree 3,800 52% $0.004
Raw HTML 8,500 45% $0.008

→ SiFR: 75% fewer tokens, 2x accuracy vs HTML

What is SiFR?

Structured Interface Format for Representation.
A compact way to describe web UI for LLMs.

btn015:
  type: button
  text: "Add to Cart"
  position: [500, 300, 120, 40]
  state: enabled
  parent: product-card

Full spec: SPEC.md

Installation

pip install sifr-benchmark

Quick Start

1. Capture pages (using Element-to-LLM extension)

  1. Install Element-to-LLM extension
  2. Open target page (e.g., Amazon product page)
  3. Click extension → Export SiFR
  4. Save as examples/my_page.sifr

2. Run benchmark

# Set API keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

# Run benchmark
sifr-bench run --models gpt-4o-mini,claude-haiku --formats sifr,html_raw

# Validate your SiFR files
sifr-bench validate examples/

# View info
sifr-bench info

Repository Structure

├── spec/
│   └── SPEC.md              # SiFR format specification
├── benchmark/
│   ├── protocol.md          # Test methodology
│   ├── tasks.json           # 25 standardized tasks
│   └── ground-truth/        # Verified answers per page
├── datasets/
│   ├── pages/               # Test page snapshots
│   │   ├── ecommerce/
│   │   ├── news/
│   │   ├── saas/
│   │   └── forms/
│   └── formats/             # Same page in each format
│       ├── sifr/
│       ├── html/
│       ├── axtree/
│       └── screenshots/
├── results/
│   ├── raw/                 # Model responses
│   └── analysis/            # Processed results
├── src/
│   └── runner.js            # Benchmark execution
└── examples/
    └── product_page.sifr    # Sample SiFR file

Tested Models

  • GPT-4o (OpenAI)
  • Claude 3.5 Sonnet (Anthropic)
  • Gemini 2.0 Flash (Google)
  • Llama 3.3 70B (Meta)
  • Qwen 2.5 72B (Alibaba)

Key Findings

  1. Token efficiency: SiFR uses 70-80% fewer tokens than raw HTML
  2. Accuracy: Pre-computed salience improves task accuracy by 40%+
  3. Consistency: SiFR results have 3x lower variance across models
  4. Edge-ready: SiFR enables UI tasks on 3B parameter models

Contribute

  • Add test pages: datasets/pages/
  • Add tasks: benchmark/tasks.json
  • Run on new models: src/runner.js

Citation

@misc{sifr2024,
  title={SiFR: Structured Interface Format for AI Agents},
  author={SiFR Contributors},
  year={2024},
  url={https://github.com/user/sifr-benchmark}
}

License

MIT — format is open.


SiFR Spec | Extension | Discord

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sifr_benchmark-0.1.8.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sifr_benchmark-0.1.8-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file sifr_benchmark-0.1.8.tar.gz.

File metadata

  • Download URL: sifr_benchmark-0.1.8.tar.gz
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sifr_benchmark-0.1.8.tar.gz
Algorithm Hash digest
SHA256 43cc847cde0dd5d3fe417c730a60b1c7dbbe2142dc7c64ee3d68731c0a36a0be
MD5 aadc860250224392fa707b99202bb87b
BLAKE2b-256 5a2e57d5bdf4b6bffd7ec23031cf48c9611d213ceeb70aef520271f101b1ed8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sifr_benchmark-0.1.8.tar.gz:

Publisher: benchmark.yml on Alechko375/sifr-benchmark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sifr_benchmark-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: sifr_benchmark-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sifr_benchmark-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 f14eb72d240647fd1331571275fbf9e83eccb2c38a2953f9824c4d54abdf392f
MD5 0cd4fb6deb96650df019fc4db69e885e
BLAKE2b-256 09bfec5b97bc026635613b053a597a8aecf00d5814f731b4275881a22b44ab51

See more details on using hashes here.

Provenance

The following attestation bundles were made for sifr_benchmark-0.1.8-py3-none-any.whl:

Publisher: benchmark.yml on Alechko375/sifr-benchmark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page