Shared eval tools for single-cell bench, spatial bench, and future biology benchmarks.

Project description

latch-eval-tools

Shared eval tools for single-cell bench, spatial bench, and future biology benchmarks.

Installation

pip install latch-eval-tools

Components

Types

from latch_eval_tools import Eval, EvalResult

eval_case = Eval(
    id="test_001",
    task="Count cells in the dataset",
    data_node="latch:///data/sample.h5ad",
    grader={"type": "numeric_tolerance", "config": {...}}
)

Graders

Available graders: numeric_tolerance, label_set_jaccard, distribution_comparison, marker_gene_precision_recall, marker_gene_separation, spatial_adjacency, multiple_choice

from latch_eval_tools.graders import get_grader, NumericToleranceGrader

grader = get_grader("numeric_tolerance")
result = grader.evaluate(
    agent_answer={"n_cells": 1523},
    config={
        "ground_truth": {"n_cells": 1500},
        "tolerances": {"n_cells": {"type": "relative", "value": 0.05}}
    }
)
print(result.passed)
print(result.reasoning)

Harness

Run evaluations with different agents:

from latch_eval_tools.harness import EvalRunner, run_minisweagent_task

runner = EvalRunner("evals/count_cells.json", cache_name=".scbench")
result = runner.run(agent_function=lambda task, work_dir: 
    run_minisweagent_task(task, work_dir, model_name="anthropic/claude-sonnet-4")
)

def my_agent(task_prompt: str, work_dir: Path) -> dict:
    return {"answer": json.loads((work_dir / "eval_answer.json").read_text())}

runner.run(agent_function=my_agent)

Built-in agents: run_minisweagent_task, run_claudecode_task, run_plotsagent_task

Linter

Validate eval JSON files:

eval-lint evals/my_dataset/
eval-lint evals/ --format json

from latch_eval_tools.linter import lint_eval, lint_directory

result = lint_eval("evals/test.json")
print(result.passed, result.issues)

Eval JSON Schema

{
  "id": "unique_test_id",
  "task": "Task description for the agent",
  "data_node": "latch:///path/to/data.h5ad",
  "grader": {
    "type": "numeric_tolerance",
    "config": {
      "ground_truth": {"field": 42},
      "tolerances": {"field": {"type": "absolute", "value": 1}}
    }
  }
}

Project details

Release history Release notifications | RSS feed

0.3.4

Apr 12, 2026

0.3.4a1 pre-release yanked

Apr 12, 2026

0.3.3

Apr 10, 2026

0.3.2

Apr 9, 2026

0.3.1 yanked

Apr 9, 2026

0.3.0a2 pre-release

Apr 6, 2026

0.3.0a1 pre-release

Apr 6, 2026

0.2.0

Mar 10, 2026

0.1.22

Feb 18, 2026

0.1.21

Feb 18, 2026

0.1.20

Feb 18, 2026

0.1.19

Feb 17, 2026

0.1.18

Feb 12, 2026

0.1.17

Feb 10, 2026

0.1.16

Feb 5, 2026

0.1.16.dev1 pre-release

Feb 10, 2026

0.1.15

Feb 5, 2026

0.1.14

Feb 5, 2026

0.1.13

Feb 5, 2026

0.1.12

Feb 4, 2026

0.1.11

Feb 4, 2026

0.1.11.dev1 pre-release

Feb 4, 2026

This version

0.1.11.dev0 pre-release

Feb 4, 2026

0.1.10

Feb 4, 2026

0.1.9

Feb 4, 2026

0.1.8

Feb 4, 2026

0.1.6

Feb 4, 2026

0.1.5

Feb 4, 2026

0.1.4

Feb 4, 2026

0.1.3

Feb 4, 2026

0.1.1

Feb 4, 2026

0.1.0

Feb 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latch_eval_tools-0.1.11.dev0.tar.gz (42.2 kB view details)

Uploaded Feb 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

latch_eval_tools-0.1.11.dev0-py3-none-any.whl (57.5 kB view details)

Uploaded Feb 4, 2026 Python 3

File details

Details for the file latch_eval_tools-0.1.11.dev0.tar.gz.

File metadata

Download URL: latch_eval_tools-0.1.11.dev0.tar.gz
Upload date: Feb 4, 2026
Size: 42.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.9

File hashes

Hashes for latch_eval_tools-0.1.11.dev0.tar.gz
Algorithm	Hash digest
SHA256	`2e3ada75872e42131e3d52db88ee15acccdcb373773cab31ba1570007953bd30`
MD5	`d1fd4866eb31d2bc0779e8e81d67c8f9`
BLAKE2b-256	`420f66b000a5d174c949ab130ac2da0cd1a6efc23781e33069866fcd30955aab`

See more details on using hashes here.

File details

Details for the file latch_eval_tools-0.1.11.dev0-py3-none-any.whl.

File metadata

Download URL: latch_eval_tools-0.1.11.dev0-py3-none-any.whl
Upload date: Feb 4, 2026
Size: 57.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.9

File hashes

Hashes for latch_eval_tools-0.1.11.dev0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f011732902ad24a17b8a5c00b3a49a06af892c481b1e76357cba24d860a4e699`
MD5	`02b40c1466ac6ed3c727b17865bc2309`
BLAKE2b-256	`a09f533b923644bb93678ecdd8c054d1ab7144a7b2ee9a65fff9d82636c09675`

See more details on using hashes here.

latch-eval-tools 0.1.11.dev0

Navigation

Verified details

Owner

Unverified details

Meta

Project description

latch-eval-tools

Installation

Components

Types

Graders

Harness

Linter

Eval JSON Schema

Project details

Verified details

Owner

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes