Benchmark evaluation for widget code generation — 12 quality metrics across layout, legibility, perceptual, style, and geometry.
Project description
widget2code-bench
Benchmark evaluation for widget code generation — 12 quality metrics across layout, legibility, perceptual, style, and geometry.
Installation
# 1. Install PyTorch with CUDA support first (skip if CPU-only)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
# 2. Install widget2code-bench
pip install widget2code-bench
Note: PyPI only ships CPU-only PyTorch. To use
--cuda, you must install PyTorch from the official index before installing this package.
Usage
Single image mode
Evaluate one GT-prediction pair. Prints JSON results to stdout, no files saved.
widget2code-bench \
--gt_image /path/to/gt.png \
--pred_image /path/to/pred.png \
--cuda
Batch mode
Evaluate all matched pairs in directories.
widget2code-bench \
--gt_dir /path/to/GT \
--pred_dir /path/to/predictions \
--pred_name output.png \
--cuda
Directory Structure (batch mode)
- GT dir: flat image files with 4-digit IDs in filenames (e.g.
gt_0001.png) - Pred dir: subfolders with 4-digit IDs in names, each containing
--pred_namefile
gt_dir/ pred_dir/
gt_0001.png image_0001/
gt_0002.png output.png
... image_0002/
output.png
Options
| Flag | Default | Description |
|---|---|---|
--gt_image |
— | Single GT image path |
--pred_image |
— | Single prediction image path |
--gt_dir |
— | GT directory (flat image files) |
--pred_dir |
— | Prediction directory (subfolders) |
--pred_name |
output.png |
Prediction filename inside each subfolder |
--output_dir |
{pred_dir}/.analysis |
Statistics output directory |
--workers |
4 | Parallel threads |
--cuda |
off | Enable GPU |
--skip_eval |
off | Skip evaluation, only generate statistics |
--no_fill |
off | Disable fill-image evaluation for missing predictions (fill is on by default) |
Output (batch mode)
- Evaluation — Saves
evaluation.jsonin each prediction subfolder +evaluation.xlsxin pred_dir - Statistics — Saves
metrics_stats.jsonandmetrics.xlsxto{pred_dir}/.analysis/
Handling missing predictions (fill mode, default)
When a GT image has no matching prediction, the evaluator also scores it against synthetic fill images, so the summary xlsx can show how different assumptions about missing samples affect the aggregate metrics. Each summary xlsx contains up to 4 rows:
| Row | Description |
|---|---|
<run> |
Average over matched pairs only |
<run> (+ black fill) |
Missing preds scored against an all-black image |
<run> (+ white fill) |
Missing preds scored against an all-white image |
<run> (+ zero fill) |
Missing preds contribute the worst-case value (LPIPS = 1.0, others = 0) |
Two extra columns are appended after Geometry:
SuccessRate / ratio— matched pairs / total GT, as a percentageSuccessRate / count— e.g.993/1000
Pass --no_fill to disable this behavior (only row 1 is produced and missing preds are skipped).
All metrics are higher-is-better except lp (LPIPS), which is a distance (lower-is-better).
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file widget2code_bench-0.2.1.tar.gz.
File metadata
- Download URL: widget2code_bench-0.2.1.tar.gz
- Upload date:
- Size: 19.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3d701e43c4c3ddfb0461be35e4b625250454937a1a7c6b21ac44ae8afe9ada4
|
|
| MD5 |
dd454462582751f02256ab6c66aa7503
|
|
| BLAKE2b-256 |
3fd20e47b3e1e65d04ce7b8388da1b2e9f0a38a0ad010f181d7a997adf58f9c1
|
File details
Details for the file widget2code_bench-0.2.1-py3-none-any.whl.
File metadata
- Download URL: widget2code_bench-0.2.1-py3-none-any.whl
- Upload date:
- Size: 22.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d7488f01f66f12c438feb9d5fc365da984ced2571e03160aa350789ad3c887e
|
|
| MD5 |
2fcb1b89db1f326a7a44f05990b1ce17
|
|
| BLAKE2b-256 |
e8b7e22b021b573071c4e77c71f318c7c993998abe7579b7c51e29d57227e7e7
|