A live, holistic, and challenging benchmark for fashion image retrieval in real e-commerce settings
Project description
LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval
LookBench is a live, holistic, and challenging benchmark for fashion image retrieval in real e-commerce settings. This repository provides the official evaluation code and model implementations.
๐ฐ News
- [2026-01] LookBench paper released on arXiv
- [2026-01] GR-Lite open-source model released
- [2026-01] Initial benchmark dataset released
๐ Overview
LookBench addresses the limitations of existing fashion retrieval benchmarks by providing:
- ๐ Continuously Refreshing Samples: Mitigates data contamination with time-stamped, periodically updated test sets
- ๐ฏ Diverse Retrieval Tasks: Covers single-item and multi-item retrieval across real studio, AI-generated studio, real street-look, and AI-generated street-look scenarios
- ๐ Attribute-Supervised Evaluation: Fine-grained evaluation based on 100+ fashion attributes across categories
- ๐ Challenging Benchmarks: Many strong baselines achieve below 60% Recall@1
Benchmark Subsets
| Dataset | Image Source | # Retrieval Items | Difficulty | # Queries / Corpus |
|---|---|---|---|---|
| RealStudioFlat | Real studio flat-lay product photos | Single | Easy | 1,011 / 62,226 |
| AIGen-Studio | AI-generated lifestyle studio images | Single | Medium | 192 / 59,254 |
| RealStreetLook | Real street outfit photos | Multi | Hard | 1,000 / 61,553 |
| AIGen-StreetLook | AI-generated street outfit compositions | Multi | Hard | 160 / 58,846 |
๐ Quick Start
Installation
Option 1: Install from PyPI (Recommended)
pip install look-bench
Option 2: Install from Source
# Clone the repository
git clone https://github.com/SerendipityOneInc/look-bench.git
cd look-bench
# Install in development mode
pip install -e .
# Or install dependencies only
pip install -r requirements.txt
Optional: Install with Examples Support
For running example notebooks and scripts that require matplotlib:
pip install look-bench[examples]
Load Dataset from Hugging Face
The LookBench dataset is hosted on Hugging Face and can be loaded directly:
Option 1: Using look-bench utility (Recommended)
from look_bench.utils import load_lookbench_dataset
# Load a specific config
dataset = load_lookbench_dataset("real_studio_flat")
# Access query and gallery splits
query_data = dataset['query']
gallery_data = dataset['gallery']
print(f"Query samples: {len(query_data)}")
print(f"Gallery samples: {len(gallery_data)}")
Option 2: Using Hugging Face datasets directly
from datasets import load_dataset
# Load a specific config
dataset = load_dataset("srpone/look-bench", "real_studio_flat")
# Access query and gallery splits
query_data = dataset['query']
gallery_data = dataset['gallery']
print(f"Query samples: {len(query_data)}")
print(f"Gallery samples: {len(gallery_data)}")
Quick Evaluation
import torch
from manager import ConfigManager, ModelManager
# Load model
config_manager = ConfigManager('configs/config.yaml')
model_manager = ModelManager(config_manager)
model, _ = model_manager.load_model('clip')
transform = model_manager.get_transform('clip')
# Extract features from an image
sample = dataset['real_studio_flat']['query'][0]
image_tensor = transform(sample['image']).unsqueeze(0)
if torch.cuda.is_available():
model = model.cuda()
image_tensor = image_tensor.cuda()
with torch.no_grad():
features = model(image_tensor)
print(f"Feature shape: {features.shape}")
Run Full Evaluation
# Run evaluation with default configuration
python main.py
# Run with specific model
python main.py --pipeline evaluation --model clip
# Use custom configuration
python main.py --config configs/config.yaml
Example Scripts & Notebooks
We provide both Python scripts and Google Colab notebooks for easy experimentation:
๐ Colab Notebooks (Run in Browser)
- 01_quickstart.ipynb
- Basic usage and dataset exploration
- 02_model_evaluation.ipynb
- Complete evaluation pipeline
- 03_custom_model.ipynb
- Integrate custom models
๐ Python Scripts (Run Locally)
- examples/00_data_exploration.py - Dataset exploration and statistics
- examples/01_load_grlite_model.py - Load and test GR-Lite model
- examples/02_model_evaluation.py - Complete model evaluation pipeline
- examples/03_custom_model.py - Integrate your own custom models
# Run examples locally
python examples/00_data_exploration.py
python examples/01_load_grlite_model.py
python examples/02_model_evaluation.py
python examples/03_custom_model.py
๐๏ธ Architecture
look-bench/
โโโ main.py # Main entry point (config-driven)
โโโ manager.py # Configuration, model, and data managers
โโโ runner/ # Pipeline execution framework
โ โโโ base_pipeline.py # Base pipeline class
โ โโโ evaluator.py # Core evaluation logic
โ โโโ pipeline.py # Pipeline registry
โ โโโ evaluation_pipeline.py # Standard evaluation pipeline
โ โโโ feature_extraction_pipeline.py # Feature extraction pipeline
โโโ models/ # Model implementations and registry
โ โโโ base.py # Base model interface
โ โโโ registry.py # Model registration system
โ โโโ factory.py # Model factory
โ โโโ clip_model.py # CLIP model
โ โโโ siglip_model.py # SigLIP model
โ โโโ dinov2_model.py # DINOv2 model
โโโ datasets/ # Dataset loading (BEIR-style)
โ โโโ base.py # Base dataset implementation
โ โโโ registry.py # Dataset registry
โโโ metrics/ # Evaluation metrics
โ โโโ rank.py # Recall@K
โ โโโ mrr.py # Mean Reciprocal Rank
โ โโโ ndcg.py # Normalized Discounted Cumulative Gain
โ โโโ map.py # Mean Average Precision
โโโ configs/ # Configuration files
โ โโโ config.yaml # Main configuration
โโโ utils/ # Utilities and logging
๐ฏ Supported Models
| Model | Architecture | Input Size | Embedding Dim | Framework |
|---|---|---|---|---|
| CLIP | Vision Transformer | 224ร224 | 512 | PyTorch |
| SigLIP | Vision Transformer | 224ร224 | 768 | PyTorch |
| DINOv2 | Vision Transformer | 224ร224 | 768 | PyTorch |
| GR-Lite | Vision Transformer | 336ร336 | 1024 | PyTorch |
โ๏ธ Configuration
Edit configs/config.yaml to configure models and evaluation settings:
# Pipeline configuration
pipeline:
name: "evaluation" # evaluation, feature_extraction
model: "clip"
dataset: "fashion200k"
args: {}
# Model configuration
clip:
enabled: true
model_name: "openai/clip-vit-base-patch16"
input_size: 224
embedding_dim: 512
device: "cuda"
# Evaluation settings
evaluation:
metric: "recall"
top_k: [1, 5, 10, 20]
l2norm: true
๐ Evaluation Metrics
LookBench supports multiple evaluation metrics:
- Recall@K: Top-K retrieval accuracy (K=1, 5, 10, 20)
- MRR: Mean Reciprocal Rank
- NDCG@K: Normalized Discounted Cumulative Gain
- MAP: Mean Average Precision
Fine-Grained Evaluation
All metrics are computed with attribute-level matching:
- Fine Recall@1: Requires exact category and all attributes to match
- Coarse Recall@1: Only requires category to match
- nDCG@K: Graded relevance based on attribute overlap
๐ง Advanced Usage
Custom Model Integration
LookBench makes it easy to integrate your own models using the registry pattern. Here's a quick example:
from models.base import BaseModel
from models.registry import register_model
import torch.nn as nn
from torchvision import models, transforms
@register_model("resnet50", metadata={
"description": "ResNet-50 for fashion retrieval",
"framework": "PyTorch",
"input_size": 224,
"embedding_dim": 2048
})
class ResNet50Model(BaseModel):
@classmethod
def load_model(cls, model_name: str, model_path: str = None):
model = models.resnet50(pretrained=True)
model = nn.Sequential(*list(model.children())[:-1]) # Remove FC layer
# Wrapper to flatten output
class Wrapper(nn.Module):
def __init__(self, backbone):
super().__init__()
self.backbone = backbone
def forward(self, x):
return self.backbone(x).squeeze(-1).squeeze(-1)
return Wrapper(model), cls()
@classmethod
def get_transform(cls, input_size: int = 224):
return transforms.Compose([
transforms.Resize((input_size, input_size)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
Then add your model to configs/config.yaml:
resnet50:
enabled: true
model_name: "resnet50"
model_path: null # or path to your weights
input_size: 224
embedding_dim: 2048
device: "cuda"
For complete examples, see examples/03_custom_model.py
Custom Pipeline
Create custom evaluation pipelines:
from runner.base_pipeline import BasePipeline
from runner.pipeline import register_pipeline
@register_pipeline("custom_pipeline")
class CustomPipeline(BasePipeline):
def get_pipeline_name(self) -> str:
return "custom_pipeline"
def run(self, **kwargs):
# Your custom logic here
model_name = kwargs.get('model_name', 'clip')
dataset_type = kwargs.get('dataset_type', 'fashion200k')
# Load model and data
model, _ = self.model_manager.load_model(model_name)
# ... your evaluation logic
return {"status": "success", "results": results}
๐ Results
Fine Recall@1 Performance
Our GR-Lite model achieves state-of-the-art performance on LookBench. Fine Recall@1 requires exact category and all attributes to match:
| Model | Resolution / Emb. | AIGen-StreetLook | AIGen-Studio | RealStreetLook | RealStudioFlat | Overall |
|---|---|---|---|---|---|---|
| GR-Pro (Ours) | 336 / 1024 | 63.67 | 54.88 | 44.75 | 51.55 | 49.80 |
| GR-Lite (Ours, Open) | 336 / 1024 | 62.47 | 52.08 | 43.84 | 51.70 | 49.18 |
| Marqo-FashionSigLIP | 224 / 768 | 66.27 | 58.53 | 42.43 | 51.86 | 49.44 |
| Marqo-FashionCLIP | 224 / 512 | 63.22 | 54.93 | 41.87 | 51.68 | 48.63 |
| SigLIP2-B/16 | 384 / 768 | 57.83 | 54.97 | 39.35 | 49.12 | 46.10 |
| SigLIP2-L/16 | 384 / 1024 | 51.89 | 48.57 | 35.91 | 44.78 | 41.86 |
| PP-ShiTuV2 | 224 / 512 | 30.06 | 33.69 | 32.77 | 43.22 | 37.17 |
| DINOv3-ViT-L | 224 / 1024 | 20.24 | 27.66 | 26.27 | 39.85 | 31.83 |
| DINOv2-ViT-L | 224 / 1024 | 24.29 | 25.05 | 22.99 | 37.66 | 29.57 |
| CLIP-L/14 | 336 / 768 | 25.28 | 25.95 | 21.09 | 40.35 | 30.08 |
| CLIP-B/16 | 224 / 512 | 17.86 | 13.75 | 16.80 | 34.75 | 24.36 |
Coarse Recall@1 Performance
Coarse Recall@1 only requires category match (more lenient):
| Model | Resolution / Emb. | AIGen-StreetLook | AIGen-Studio | RealStreetLook | RealStudioFlat | Overall |
|---|---|---|---|---|---|---|
| GR-Pro (Ours) | 336 / 1024 | 92.50 | 92.75 | 79.82 | 94.16 | 87.93 |
| GR-Lite (Ours, Open) | 336 / 1024 | 88.75 | 90.16 | 76.76 | 92.68 | 85.54 |
| Marqo-FashionSigLIP | 224 / 768 | 90.00 | 93.78 | 73.39 | 88.63 | 82.77 |
| Marqo-FashionCLIP | 224 / 512 | 84.38 | 87.05 | 75.33 | 88.72 | 82.68 |
| SigLIP2-B/16 | 384 / 768 | 86.25 | 90.67 | 72.17 | 88.33 | 81.62 |
| SigLIP2-L/16 | 384 / 1024 | 80.62 | 90.67 | 68.20 | 84.97 | 78.12 |
| CLIP-L/14 | 336 / 768 | 46.88 | 56.48 | 45.26 | 76.85 | 59.91 |
| CLIP-B/16 | 224 / 512 | 35.62 | 32.12 | 33.54 | 67.26 | 48.11 |
nDCG@5 Performance
nDCG@5 evaluates ranking quality with graded relevance based on attribute overlap:
| Model | Resolution / Emb. | AIGen-StreetLook | AIGen-Studio | RealStreetLook | RealStudioFlat | Overall |
|---|---|---|---|---|---|---|
| GR-Pro (Ours) | 336 / 1024 | 63.67 | 54.88 | 44.75 | 51.55 | 49.80 |
| GR-Lite (Ours, Open) | 336 / 1024 | 62.47 | 52.08 | 43.84 | 51.70 | 49.18 |
| Marqo-FashionSigLIP | 224 / 768 | 66.27 | 58.53 | 42.43 | 51.86 | 49.44 |
| Marqo-FashionCLIP | 224 / 512 | 63.22 | 54.93 | 41.87 | 51.68 | 48.63 |
| SigLIP2-B/16 | 384 / 768 | 57.83 | 54.97 | 39.35 | 49.12 | 46.10 |
See our paper for complete results including MRR and additional models.
๐ Citation
If you use LookBench in your research, please cite:
@article{gao2026lookbench,
title={LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval},
author={Chao Gao and Siqiao Xue and Yimin Peng and Jiwen Fu and Tingyi Gu and Shanshan Li and Fan Zhou},
year={2026},
url={https://arxiv.org/abs/2601.14706},
journal={arXiv preprint arXiv:2601.14706},
}
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
The GR-Lite model weights are distributed under the DINOv3 License as they are derived from Meta's DINOv3 model.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file look_bench-0.3.0.tar.gz.
File metadata
- Download URL: look_bench-0.3.0.tar.gz
- Upload date:
- Size: 45.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ffb11eb63b57789b73f9d79ce84547dfcc261895aec26a805e26fdbbfdb231a
|
|
| MD5 |
22b52495cce03c62330c6e43ce9ee4d2
|
|
| BLAKE2b-256 |
68e6f9d15189fdf242d6dff8cf7043d9e8a752aac3a46b573fd9f5812916e753
|
Provenance
The following attestation bundles were made for look_bench-0.3.0.tar.gz:
Publisher:
python-publish.yml on SerendipityOneInc/look-bench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
look_bench-0.3.0.tar.gz -
Subject digest:
6ffb11eb63b57789b73f9d79ce84547dfcc261895aec26a805e26fdbbfdb231a - Sigstore transparency entry: 851949678
- Sigstore integration time:
-
Permalink:
SerendipityOneInc/look-bench@21ad55c71b8ed6787f67cd1c576320d4754bd20c -
Branch / Tag:
refs/tags/0.3.0 - Owner: https://github.com/SerendipityOneInc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@21ad55c71b8ed6787f67cd1c576320d4754bd20c -
Trigger Event:
release
-
Statement type:
File details
Details for the file look_bench-0.3.0-py3-none-any.whl.
File metadata
- Download URL: look_bench-0.3.0-py3-none-any.whl
- Upload date:
- Size: 56.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65b727a7e91705c730663a29d575bb9cfaf45627a588c86b3cbe122ad7cce9ed
|
|
| MD5 |
9f6ad643476740fbc8158765a6a5b543
|
|
| BLAKE2b-256 |
19b3c30cde60a026eb0c6c2eb7c6b06fd4aa10ecf6f4c37a27d2d184834e075f
|
Provenance
The following attestation bundles were made for look_bench-0.3.0-py3-none-any.whl:
Publisher:
python-publish.yml on SerendipityOneInc/look-bench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
look_bench-0.3.0-py3-none-any.whl -
Subject digest:
65b727a7e91705c730663a29d575bb9cfaf45627a588c86b3cbe122ad7cce9ed - Sigstore transparency entry: 851949730
- Sigstore integration time:
-
Permalink:
SerendipityOneInc/look-bench@21ad55c71b8ed6787f67cd1c576320d4754bd20c -
Branch / Tag:
refs/tags/0.3.0 - Owner: https://github.com/SerendipityOneInc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@21ad55c71b8ed6787f67cd1c576320d4754bd20c -
Trigger Event:
release
-
Statement type: