Score how similar N agent outputs are — exact match, Jaccard token overlap, divergence point, composite 0-1 score. Stdlib-only.

These details have not been verified by PyPI

Project links

Project description

agent-convergence-scorer

Measure how similar N agent outputs are. Score exact-match rate, Jaccard token overlap, divergence point, and a composite 0–1 convergence score over any list of agent runs.

If you run the same prompt through N agents and want a number for "are they producing N distinct outputs or have they collapsed to one idea?" — this is that number.

Pain

You just ran a fan-out of N agents and eyeballing whether they converged is slow and subjective.
Your eval harness reports accuracy but not reproducibility; same prompt, two runs, two answers, no metric.
Multi-agent hackathon or swarm setup; half the agents picked the same target. You want evidence, not vibes.
LLM temperature study where "temp=0.3 vs temp=0.7" needs a downstream consistency number.
You caught agents rephrasing each other but there is no column in your CSV for it.

Install

pip install agent-convergence-scorer

Python 3.9+. Zero runtime dependencies (stdlib only).

Quick start

echo '{"runs": ["The capital is Paris.", "The capital is Paris.", "The capital is Lyon."]}' \
  | agent-convergence-scorer -

Output:

{
  "num_runs": 3,
  "exact_match_rate": 0.667,
  "token_metrics": {
    "avg_overlap": 0.733,
    "jaccard": 1.0
  },
  "convergence_score": 0.703,
  "divergence_point": {
    "diverges_at_token": "paris.",
    "token_position": 3,
    "num_tokens_to_divergence": 3
  }
}

Interpret:

convergence_score = 0.703 — high but not perfect consistency.
exact_match_rate = 0.667 — 2 of 3 runs identical to run 0.
Divergence at token 3 — they agreed on the prefix "The capital is" then split.

Library usage

from agent_convergence_scorer import score_runs

runs = [
    "The answer is A",
    "The answer is B",
    "The answer is C",
]
print(score_runs(runs))
# {'num_runs': 3, 'exact_match_rate': 0.333,
#  'token_metrics': {'avg_overlap': 0.6, 'jaccard': 0.6},
#  'convergence_score': 0.497,
#  'divergence_point': {'diverges_at_token': 'a', 'token_position': 3, 'num_tokens_to_divergence': 3}}

Individual metrics are importable too: exact_match_rate, token_overlap, divergence_point, convergence_score, tokenize.

Metrics — what they mean

Metric	Range	What it measures
`exact_match_rate`	`[0, 1]`	Fraction of runs byte-identical to `runs[0]`. Crude reproducibility floor.
`token_metrics.jaccard`	`[0, 1]`	Token-set Jaccard of the first two runs (quick eyeball).
`token_metrics.avg_overlap`	`[0, 1]`	Mean Jaccard over all `C(N,2)` pairs. Robust to N.
`divergence_point.num_tokens_to_divergence`	`[0, min_len]`	First position where runs disagree. Late divergence = strong shared prefix.
`convergence_score`	`[0, 1]`	Composite: `0.5 * exact_match + 0.3 * avg_overlap + 0.2 * div_distance_norm`.

When to use it

Quick single-number consistency check for multi-agent fan-outs.
CI gate: fail if N reruns of a prompt drop below a convergence threshold.
Measuring the effect of a temperature, prompt, or framing change on output stability.
Quantifying ideation collapse in multi-agent hackathons (N agents → how many distinct ideas?).

When not to use it

Semantic similarity. Tokenization is whitespace-only; "Paris, France" and "paris, france," are different token sets. If you need meaning-level comparison, pair these metrics with a sentence-embedding similarity (or a reranker) externally.
Subword tokenization studies. This is not a BPE/WordPiece tokenizer.
Multilingual corpora where whitespace isn't the word boundary (Chinese, Japanese, Thai, etc.) — tokenize upstream, pass the tokenized-then-joined form.
Ranking quality (nDCG, MRR, etc.) — use ir-measures or ranx instead.
Concurrency-safe incremental scoring over streams — this is a batch tool.

The composite weights (50/30/20) are heuristic; override by calling the individual functions and combining yourself.

Example: measuring a hackathon collapse

from agent_convergence_scorer import score_runs

# 4 agents, same prompt, different (or identical) outputs
runs = [agent.run(prompt) for agent in agents]
result = score_runs(runs)

if result["convergence_score"] > 0.8:
    print(f"⚠️ collapse: {result['convergence_score']:.2f} — agents are rephrasing each other")
else:
    print(f"✓ diverse: {result['convergence_score']:.2f}")

Origin

Built during the Hermes Labs Cascade Hackathon on 2026-04-22, as part of a controlled experiment measuring whether prompt framing affects ideation diversity across N concurrent agents. In the prior-day baseline, 12 agents sharing context collapsed to 2 dominant idea clusters; in the cascade experiment, agents under distinct-persona or distinct-constraint framing produced 4 distinct clusters per arm of 4. This scorer is the mechanism by which the collapse was measured.

Security and supply chain

Tamper evidence: the repository carries a staged hermes-seal v1 manifest at .hermes-seal.yaml. Signature is granted out-of-band with a root-owned key and verified with hermes-seal verify <path-to-repo>.
SBOM: sbom.cdx.json (CycloneDX 1.5) at repo root.
Security policy: see SECURITY.md.

Contributing

See CONTRIBUTING.md. Issues and PRs welcome. For agent-driven contributors, see AGENTS.md.

License

MIT — see LICENSE.

About

Built by Hermes Labs. Sealed with hermes-seal v1.

Related work: lintlang (static linter for AI agent code), cogito-ergo (agent memory with integer-pointer fidelity), claude-router (scaffold router).

If this saved you the five minutes of eyeballing a fan-out's outputs, ⭐ the repo — it helps others find it.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_convergence_scorer-0.1.0.tar.gz (10.2 kB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_convergence_scorer-0.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file agent_convergence_scorer-0.1.0.tar.gz.

File metadata

Download URL: agent_convergence_scorer-0.1.0.tar.gz
Upload date: Apr 23, 2026
Size: 10.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agent_convergence_scorer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2d3378f388cf36945e6c7832c4146e4982890aa0b2f8b2a7b23616ce03ea5975`
MD5	`4ad5a58017d3753763ba5e36b79a008c`
BLAKE2b-256	`257ce5220f30a72165531945200b5cee0878780cbe8e67e748fe33b3d7122f05`

See more details on using hashes here.

File details

Details for the file agent_convergence_scorer-0.1.0-py3-none-any.whl.

File metadata

Download URL: agent_convergence_scorer-0.1.0-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agent_convergence_scorer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b30a03a12a307a486389a20e173c04c8f6861e50dcf1be6705421eb14e62fe35`
MD5	`b1406546958c18f6a7cf5646bccbab03`
BLAKE2b-256	`c69614c4dc6f9289999385d8f036a9a377e19ad9b35092a421b975dc094b5516`

See more details on using hashes here.

agent-convergence-scorer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agent-convergence-scorer

Pain

Install

Quick start

Library usage

Metrics — what they mean

When to use it

When not to use it

Example: measuring a hackathon collapse

Origin

Security and supply chain

Contributing

License

About

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes