Various scene understanding and perception evaluation metrics.

These details have not been verified by PyPI

Project description

Scalable Distributed Evaluation for Computer Vision

Evaluators is a high-throughput evaluation framework designed for large-scale computer vision research. It specializes in handling video tasks by decoupling inference I/O from metric computation.

This architecture enables offline evaluation workflows: models stream predictions to efficient storage backends (Memory Map or LMDB) during inference, and metrics are computed in a decoupled stage using distributed map-reduce logic. This approach prevents CPU-bound metric calculation from throttling GPU inference.

The key features are:

Zero-overhead inference.
Writes predictions to disk using non-blocking I/O, allowing the training loop to run at full GPU utilization.
Distributed by design.
Automatically handles synchronization across multiple nodes and GPUs using torchmetrics and custom scheduling logic.
Explicit Memory Schemas. Uses Pydantic-based schemas to define data formats and encodings (PNG, TIFF, Raw) up front, ensuring type safety and storage efficiency.
Lazy loading.
Supports referencing ground truth data from disk rather than duplicating it in memory caches, enabling evaluation of terabyte-scale datasets.
Multi-domain. Includes verified implementations for:
- Segmentation: Panoptic quality (PQ), semantic mIoU.
- DVPS: Depth-aware video panoptic quality (DVPQ).
- Depth: Eigen et al. metrics (AbsRel, RMSE).
CLI. A command-line interface to inspect, index, and query saved inference results.

Installation

pip install evaluators

Quick start

Python API

The core abstractions are MetricStream (for writing) and run_offline_evaluation (for computing).

Step 1: Inference (online)

import torch
from evaluators import MetricStream, MemorySchema, TensorField, DynamicTemporalWriter
from evaluators.metrics.domain.segmentation import SemanticMetric

1. Configure metrics and schema

### Define the source for ground truth (lazy loading)

dataset = CityscapesDataset(...)
metric = SemanticMetric(
    num_classes=19,
    target_source=dataset, 
)

### Define the explicit memory schema

schema = MemorySchema(fields={
    "sem_seg": TensorField(dtype="int64", shape=(1024, 2048)),
    "sequence_id": TensorField(dtype="int64", shape=()),
    "frame_index": TensorField(dtype="int64", shape=()),
})

2. Initialize stream and writer

# Create a writer (backend)
writer = DynamicTemporalWriter(output_dir="./inference_cache/stream_1", schema=schema)

# Create a stream and bind the writer
stream = MetricStream(
    metrics=[metric],
    name="semantic",
    schema=schema
)
stream.bind(writer)

3. Run inference loop

for batch in dataloader: # Model forward pass
    preds = model(batch["image"])

    # Push to stream (non-blocking)
    stream.update(
        batch={
            "sem_seg": preds,
            "sequence_id": batch["sequence_id"],
            "frame_index": batch["frame_index"]
        }
    )
    
# Finalize
writer.close()

> Note: `sequence_id` and `frame_index` must be `torch.int64`.

4. Finalize and compute

from evaluators import run_offline_evaluation

# Syncs workers, builds catalog, and runs metrics
results = run_offline_evaluation(
    metrics=[metric],
    artifact_dir="./inference_cache/semantic"
)
print(results["SemanticMetric"]["mIoU"])

Step 2: Re-evaluation (offline)

Because predictions are persisted, metrics can be re-calculated or added without re-running the model.

# Run evaluation on existing artifacts
results = run_offline_evaluation(
    metrics=[new_metric],
    artifact_dir="./inference_cache/semantic"
)

CLI tools

The library includes a CLI for managing the inference cache.

List stored sequences.

evaluators memory ls ./inference_cache

Inspect specific tensor shapes.

evaluators memory inspect ./inference_cache --sequence_id frankfurt_000001

Export to standard PyTorch file.

evaluators memory export ./inference_cache --sequence_id frankfurt_000001 --out my_video.pt

Supported metrics

Depth estimation

Implements standard error metrics (AbsRel, SqRel, RMSE, RMSElog) and threshold accuracies ($\delta < 1.25^n$).

Segmentation

Semantic: Mean intersection over union (mIoU).
Panoptic: Panoptic quality (PQ), segmentation quality (SQ), recognition quality (RQ). Supports "Thing" and "Stuff" splits.

Depth-aware video panoptic segmentation (DVPS)

Implements DVPQ (Depth-aware video panoptic quality). This metric evaluates spatio-temporal consistency using sliding window tubes, gated by pixel-wise depth accuracy.

Architecture

The evaluation pipeline consists of three stages.

Write.
Each GPU writes predictions to locally sharded files (e.g. .memmap or .lmdb). No communication occurs.
Schedule.
A synchronization barrier is reached. The main process aggregates metadata manifests from all shards to build a Global Catalog. It partitions the workload (videos) among workers using a greedy strategy to balance duration.
Compute. Workers iterate through their assigned Virtual Sequences. Data is streamed from disk, processed by torchmetrics, and reduced globally.

See OFFLINE_EVALUATION.md for detailed usage and design principles.

Performance

evaluators is built for high-throughput I/O. The following benchmarks were conducted using the Comprehensive Memory Evaluation Suite (CMES) on a mobile workstation (i7-12700H, NVMe SSD).

Throughput (FPS)

Backend	Codec	Resolution	Write FPS	Read FPS	Compression
Memmap	Raw	512x1024	250.8	501.6	1.0x
Memmap	Blosc	512x1024	59.7	119.3	0.85x
LMDB	Raw	512x1024	13.3	26.7	1.0x
LMDB	PNG	512x1024	27.2	13.6	0.62x
Filesystem	TIFF	512x1024	124.9	249.7	0.89x

Insights

Memmap is king for raw throughput: The MemmapTemporalWriter achieves >500 FPS for read operations on mid-resolution video frames, making it ideal for fast metric computation.
Blosc provides the best balance: Using BloscCodec with Memmap offers significant storage savings with minimal CPU overhead compared to PNG/TIFF.
LMDB for stability: While slower for sequential video access, LMDB provides robust ACID compliance and is preferred for random-access metadata or small feature vectors.

Full benchmark reports, including memory usage (Peak RSS) and CPU overhead plots, are available in docs/benchmarks/.

Development

This project uses modern Python tooling for dependency management and quality assurance.

Setup

We use uv for fast dependency management.

# Install dependencies
uv sync --all-extras

Testing

Tests are managed by pytest.

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=evaluators

Linting & Formatting

We use ruff for all linting and formatting needs.

# Check code style
uv run ruff check .

# Format code
uv run ruff format .

Contributing

Contributions are welcome. Please ensure that:

New features are covered by tests.
Code passes all static analysis checks (ruff).
Architecture changes are discussed in an issue first.

Acknowledgements

This work was developed at the Mobile Perception Systems (MPS) lab at Eindhoven University of Technology.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.1.4

May 2, 2026

2.1.3

Apr 1, 2026

2.1.2

Apr 1, 2026

2.1.1

Mar 23, 2026

2.1.0

Mar 23, 2026

2.0.0

Mar 19, 2026

1.0.3

May 3, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evaluators-2.1.4.tar.gz (110.1 kB view details)

Uploaded May 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

evaluators-2.1.4-py3-none-any.whl (150.4 kB view details)

Uploaded May 2, 2026 Python 3

File details

Details for the file evaluators-2.1.4.tar.gz.

File metadata

Download URL: evaluators-2.1.4.tar.gz
Upload date: May 2, 2026
Size: 110.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"NixOS","version":"26.05","id":"yarara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for evaluators-2.1.4.tar.gz
Algorithm	Hash digest
SHA256	`e2e8a195db0025f052056bc8ff6baf5f33e623769910323c955cd41cbb8f0266`
MD5	`bfc9a85eeedaf8cda38997c8ab1b6270`
BLAKE2b-256	`726726cab4dc2f6f78aac8cb82509c4b7b29b3ac1d80d7a3efa7b8a9ed931106`

See more details on using hashes here.

File details

Details for the file evaluators-2.1.4-py3-none-any.whl.

File metadata

Download URL: evaluators-2.1.4-py3-none-any.whl
Upload date: May 2, 2026
Size: 150.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"NixOS","version":"26.05","id":"yarara","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for evaluators-2.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8c1a218dea6bf4f62b9eec55b0753ff6be0a0269fd26e24134bcf498657fcb16`
MD5	`390225870192c60c7d5d9d380d5c217b`
BLAKE2b-256	`981fcea08f154a36167ea1e1b13f2d6f85dd2ca0f52285044cce71f2762fc6eb`

See more details on using hashes here.

evaluators 2.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Scalable Distributed Evaluation for Computer Vision

Installation

Quick start

Python API

Step 1: Inference (online)

1. Configure metrics and schema

2. Initialize stream and writer

3. Run inference loop

4. Finalize and compute

Step 2: Re-evaluation (offline)

CLI tools

Supported metrics

Depth estimation

Segmentation

Depth-aware video panoptic segmentation (DVPS)

Architecture

Performance

Throughput (FPS)

Insights

Development

Setup

Testing

Linting & Formatting

Contributing

Acknowledgements

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes