A live, holistic, and challenging benchmark for fashion image retrieval in real e-commerce settings

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

iLampard

These details have not been verified by PyPI

Project links

Documentation

Project description

LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval

LookBench is a live, holistic, and challenging benchmark for fashion image retrieval in real e-commerce settings. This repository provides the official evaluation code and model implementations.

📰 News

[2026-01] LookBench paper released on arXiv
[2026-01] GR-Lite open-source model released
[2026-01] Initial benchmark dataset released

📖 Overview

LookBench addresses the limitations of existing fashion retrieval benchmarks by providing:

🔄 Continuously Refreshing Samples: Mitigates data contamination with time-stamped, periodically updated test sets
🎯 Diverse Retrieval Tasks: Covers single-item and multi-item retrieval across real studio, AI-generated studio, real street-look, and AI-generated street-look scenarios
📊 Attribute-Supervised Evaluation: Fine-grained evaluation based on 100+ fashion attributes across categories
🏆 Challenging Benchmarks: Many strong baselines achieve below 60% Recall@1

Benchmark Subsets

Dataset	Image Source	# Retrieval Items	Difficulty	# Queries / Corpus
RealStudioFlat	Real studio flat-lay product photos	Single	Easy	1,011 / 62,226
AIGen-Studio	AI-generated lifestyle studio images	Single	Medium	192 / 59,254
RealStreetLook	Real street outfit photos	Multi	Hard	1,000 / 61,553
AIGen-StreetLook	AI-generated street outfit compositions	Multi	Hard	160 / 58,846

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/SerendipityOneInc/look-bench.git
cd look-bench

# Install dependencies
pip install -r requirements.txt

Load Dataset from Hugging Face

The LookBench dataset is hosted on Hugging Face and can be loaded directly:

from datasets import load_dataset

# Load the entire LookBench dataset
dataset = load_dataset("srpone/look-bench")

# Access different subsets
real_studio = dataset['real_studio_flat']  # Easy: single-item retrieval
aigen_studio = dataset['aigen_studio']     # Medium: AI-generated studio images
real_street = dataset['real_streetlook']   # Hard: multi-item outfit retrieval
aigen_street = dataset['aigen_streetlook'] # Hard: AI-generated street looks

# Each subset has query and gallery splits
query_data = dataset['real_studio_flat']['query']
gallery_data = dataset['real_studio_flat']['gallery']

print(f"Query samples: {len(query_data)}")
print(f"Gallery samples: {len(gallery_data)}")

Quick Evaluation

import torch
from manager import ConfigManager, ModelManager

# Load model
config_manager = ConfigManager('configs/config.yaml')
model_manager = ModelManager(config_manager)

model, _ = model_manager.load_model('clip')
transform = model_manager.get_transform('clip')

# Extract features from an image
sample = dataset['real_studio_flat']['query'][0]
image_tensor = transform(sample['image']).unsqueeze(0)

if torch.cuda.is_available():
    model = model.cuda()
    image_tensor = image_tensor.cuda()

with torch.no_grad():
    features = model(image_tensor)

print(f"Feature shape: {features.shape}")

Run Full Evaluation

# Run evaluation with default configuration
python main.py

# Run with specific model
python main.py --pipeline evaluation --model clip

# Use custom configuration
python main.py --config configs/config.yaml

Example Scripts & Notebooks

We provide both Python scripts and Google Colab notebooks for easy experimentation:

📓 Colab Notebooks (Run in Browser)

01_quickstart.ipynb - Basic usage and dataset exploration
02_model_evaluation.ipynb - Complete evaluation pipeline
03_custom_model.ipynb - Integrate custom models

🐍 Python Scripts (Run Locally)

examples/01_quickstart.py - Basic usage and dataset exploration
examples/02_model_evaluation.py - Complete model evaluation pipeline
examples/03_custom_model.py - Integrate your own custom models

# Run examples locally
python examples/01_quickstart.py
python examples/02_model_evaluation.py
python examples/03_custom_model.py

🏗️ Architecture

look-bench/
├── main.py                 # Main entry point (config-driven)
├── manager.py              # Configuration, model, and data managers
├── runner/                 # Pipeline execution framework
│   ├── base_pipeline.py   # Base pipeline class
│   ├── evaluator.py       # Core evaluation logic
│   ├── pipeline.py        # Pipeline registry
│   ├── evaluation_pipeline.py      # Standard evaluation pipeline
│   └── feature_extraction_pipeline.py  # Feature extraction pipeline
├── models/                 # Model implementations and registry
│   ├── base.py            # Base model interface
│   ├── registry.py        # Model registration system
│   ├── factory.py         # Model factory
│   ├── clip_model.py      # CLIP model
│   ├── siglip_model.py    # SigLIP model
│   └── dinov2_model.py    # DINOv2 model
├── datasets/               # Dataset loading (BEIR-style)
│   ├── base.py            # Base dataset implementation
│   └── registry.py        # Dataset registry
├── metrics/                # Evaluation metrics
│   ├── rank.py            # Recall@K
│   ├── mrr.py             # Mean Reciprocal Rank
│   ├── ndcg.py            # Normalized Discounted Cumulative Gain
│   └── map.py             # Mean Average Precision
├── configs/                # Configuration files
│   └── config.yaml        # Main configuration
└── utils/                  # Utilities and logging

🎯 Supported Models

Model	Architecture	Input Size	Embedding Dim	Framework
CLIP	Vision Transformer	224×224	512	PyTorch
SigLIP	Vision Transformer	224×224	768	PyTorch
DINOv2	Vision Transformer	224×224	768	PyTorch
GR-Lite	Vision Transformer	336×336	1024	PyTorch

⚙️ Configuration

Edit configs/config.yaml to configure models and evaluation settings:

# Pipeline configuration
pipeline:
  name: "evaluation"  # evaluation, feature_extraction
  model: "clip"
  dataset: "fashion200k"
  args: {}

# Model configuration
clip:
  enabled: true
  model_name: "openai/clip-vit-base-patch16"
  input_size: 224
  embedding_dim: 512
  device: "cuda"

# Evaluation settings
evaluation:
  metric: "recall"
  top_k: [1, 5, 10, 20]
  l2norm: true

📊 Evaluation Metrics

LookBench supports multiple evaluation metrics:

Recall@K: Top-K retrieval accuracy (K=1, 5, 10, 20)
MRR: Mean Reciprocal Rank
NDCG@K: Normalized Discounted Cumulative Gain
MAP: Mean Average Precision

Fine-Grained Evaluation

All metrics are computed with attribute-level matching:

Fine Recall@1: Requires exact category and all attributes to match
Coarse Recall@1: Only requires category to match
nDCG@K: Graded relevance based on attribute overlap

🔧 Advanced Usage

Custom Model Integration

LookBench makes it easy to integrate your own models using the registry pattern. Here's a quick example:

from models.base import BaseModel
from models.registry import register_model
import torch.nn as nn
from torchvision import models, transforms

@register_model("resnet50", metadata={
    "description": "ResNet-50 for fashion retrieval",
    "framework": "PyTorch",
    "input_size": 224,
    "embedding_dim": 2048
})
class ResNet50Model(BaseModel):
    @classmethod
    def load_model(cls, model_name: str, model_path: str = None):
        model = models.resnet50(pretrained=True)
        model = nn.Sequential(*list(model.children())[:-1])  # Remove FC layer
        
        # Wrapper to flatten output
        class Wrapper(nn.Module):
            def __init__(self, backbone):
                super().__init__()
                self.backbone = backbone
            def forward(self, x):
                return self.backbone(x).squeeze(-1).squeeze(-1)
        
        return Wrapper(model), cls()
    
    @classmethod
    def get_transform(cls, input_size: int = 224):
        return transforms.Compose([
            transforms.Resize((input_size, input_size)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                               std=[0.229, 0.224, 0.225])
        ])

Then add your model to configs/config.yaml:

resnet50:
  enabled: true
  model_name: "resnet50"
  model_path: null  # or path to your weights
  input_size: 224
  embedding_dim: 2048
  device: "cuda"

For complete examples, see examples/03_custom_model.py

Custom Pipeline

Create custom evaluation pipelines:

from runner.base_pipeline import BasePipeline
from runner.pipeline import register_pipeline

@register_pipeline("custom_pipeline")
class CustomPipeline(BasePipeline):
    def get_pipeline_name(self) -> str:
        return "custom_pipeline"
    
    def run(self, **kwargs):
        # Your custom logic here
        model_name = kwargs.get('model_name', 'clip')
        dataset_type = kwargs.get('dataset_type', 'fashion200k')
        
        # Load model and data
        model, _ = self.model_manager.load_model(model_name)
        # ... your evaluation logic
        
        return {"status": "success", "results": results}

📈 Results

Fine Recall@1 Performance

Our GR-Lite model achieves state-of-the-art performance on LookBench. Fine Recall@1 requires exact category and all attributes to match:

Model	Resolution / Emb.	AIGen-StreetLook	AIGen-Studio	RealStreetLook	RealStudioFlat	Overall
GR-Pro (Ours)	336 / 1024	63.67	54.88	44.75	51.55	49.80
GR-Lite (Ours, Open)	336 / 1024	62.47	52.08	43.84	51.70	49.18
Marqo-FashionSigLIP	224 / 768	66.27	58.53	42.43	51.86	49.44
Marqo-FashionCLIP	224 / 512	63.22	54.93	41.87	51.68	48.63
SigLIP2-B/16	384 / 768	57.83	54.97	39.35	49.12	46.10
SigLIP2-L/16	384 / 1024	51.89	48.57	35.91	44.78	41.86
PP-ShiTuV2	224 / 512	30.06	33.69	32.77	43.22	37.17
DINOv3-ViT-L	224 / 1024	20.24	27.66	26.27	39.85	31.83
DINOv2-ViT-L	224 / 1024	24.29	25.05	22.99	37.66	29.57
CLIP-L/14	336 / 768	25.28	25.95	21.09	40.35	30.08
CLIP-B/16	224 / 512	17.86	13.75	16.80	34.75	24.36

Coarse Recall@1 Performance

Coarse Recall@1 only requires category match (more lenient):

Model	Resolution / Emb.	AIGen-StreetLook	AIGen-Studio	RealStreetLook	RealStudioFlat	Overall
GR-Pro (Ours)	336 / 1024	92.50	92.75	79.82	94.16	87.93
GR-Lite (Ours, Open)	336 / 1024	88.75	90.16	76.76	92.68	85.54
Marqo-FashionSigLIP	224 / 768	90.00	93.78	73.39	88.63	82.77
Marqo-FashionCLIP	224 / 512	84.38	87.05	75.33	88.72	82.68
SigLIP2-B/16	384 / 768	86.25	90.67	72.17	88.33	81.62
SigLIP2-L/16	384 / 1024	80.62	90.67	68.20	84.97	78.12
CLIP-L/14	336 / 768	46.88	56.48	45.26	76.85	59.91
CLIP-B/16	224 / 512	35.62	32.12	33.54	67.26	48.11

nDCG@5 Performance

nDCG@5 evaluates ranking quality with graded relevance based on attribute overlap:

Model	Resolution / Emb.	AIGen-StreetLook	AIGen-Studio	RealStreetLook	RealStudioFlat	Overall
GR-Pro (Ours)	336 / 1024	63.67	54.88	44.75	51.55	49.80
GR-Lite (Ours, Open)	336 / 1024	62.47	52.08	43.84	51.70	49.18
Marqo-FashionSigLIP	224 / 768	66.27	58.53	42.43	51.86	49.44
Marqo-FashionCLIP	224 / 512	63.22	54.93	41.87	51.68	48.63
SigLIP2-B/16	384 / 768	57.83	54.97	39.35	49.12	46.10

See our paper for complete results including MRR and additional models.

📄 Citation

If you use LookBench in your research, please cite:

@article{gao2026lookbench,
  title={LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval}, 
  author={Chao Gao and Siqiao Xue and Yimin Peng and Jiwen Fu and Tingyi Gu and Shanshan Li and Fan Zhou},
  year={2026},
  url={https://arxiv.org/abs/2601.14706}, 
  journal={arXiv preprint arXiv:2601.14706},
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

The GR-Lite model weights are distributed under the DINOv3 License as they are derived from Meta's DINOv3 model.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

iLampard

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

0.3.0

Jan 25, 2026

This version

0.2.0

Jan 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

look_bench-0.2.0.tar.gz (39.8 kB view details)

Uploaded Jan 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

look_bench-0.2.0-py3-none-any.whl (48.8 kB view details)

Uploaded Jan 24, 2026 Python 3

File details

Details for the file look_bench-0.2.0.tar.gz.

File metadata

Download URL: look_bench-0.2.0.tar.gz
Upload date: Jan 24, 2026
Size: 39.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for look_bench-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`748716e76b5574c5bc9cd44dfe3683dce8708ffbdb43314d94eaea27dedb0528`
MD5	`2e7394858dfd88db0809cc8f52dc290c`
BLAKE2b-256	`18892513f7397aa1168468aaf238f10b2babd61ac9d13cc87fee40888f2bb1fd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for look_bench-0.2.0.tar.gz:

Publisher: python-publish.yml on SerendipityOneInc/look-bench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: look_bench-0.2.0.tar.gz
- Subject digest: 748716e76b5574c5bc9cd44dfe3683dce8708ffbdb43314d94eaea27dedb0528
- Sigstore transparency entry: 849977093
- Sigstore integration time: Jan 24, 2026
Source repository:
- Permalink: SerendipityOneInc/look-bench@bb2af89468c33acdde1338fc4b0ef2473c768ac9
- Branch / Tag: refs/tags/0.2.0
- Owner: https://github.com/SerendipityOneInc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@bb2af89468c33acdde1338fc4b0ef2473c768ac9
- Trigger Event: release

File details

Details for the file look_bench-0.2.0-py3-none-any.whl.

File metadata

Download URL: look_bench-0.2.0-py3-none-any.whl
Upload date: Jan 24, 2026
Size: 48.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for look_bench-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a509b3f7a1efb447d962c70bc7560c99a8c15a37a7c0352ce5384fefa9b8021a`
MD5	`ed067f9d864170a915513ecc01c56d9a`
BLAKE2b-256	`f996282dd8ffc73d07eec9b3ff77ee2a348bb6813b6888b0af8e246e13a043bd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for look_bench-0.2.0-py3-none-any.whl:

Publisher: python-publish.yml on SerendipityOneInc/look-bench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: look_bench-0.2.0-py3-none-any.whl
- Subject digest: a509b3f7a1efb447d962c70bc7560c99a8c15a37a7c0352ce5384fefa9b8021a
- Sigstore transparency entry: 849977094
- Sigstore integration time: Jan 24, 2026
Source repository:
- Permalink: SerendipityOneInc/look-bench@bb2af89468c33acdde1338fc4b0ef2473c768ac9
- Branch / Tag: refs/tags/0.2.0
- Owner: https://github.com/SerendipityOneInc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@bb2af89468c33acdde1338fc4b0ef2473c768ac9
- Trigger Event: release

look-bench 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LookBench: A Live and Holistic Open Benchmark for Fashion Image Retrieval

📰 News

📖 Overview

Benchmark Subsets

🚀 Quick Start

Installation

Load Dataset from Hugging Face

Quick Evaluation

Run Full Evaluation

Example Scripts & Notebooks

📓 Colab Notebooks (Run in Browser)

🐍 Python Scripts (Run Locally)

🏗️ Architecture

🎯 Supported Models

⚙️ Configuration

📊 Evaluation Metrics

Fine-Grained Evaluation

🔧 Advanced Usage

Custom Model Integration

Custom Pipeline

📈 Results

Fine Recall@1 Performance

Coarse Recall@1 Performance

nDCG@5 Performance

📄 Citation

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance