Skip to main content

Evaluation library for span-level entity extraction

Project description

spaneval

Pure evaluation library for span-level entity extraction — takes ground truth and predicted (entity_type, start, end) tuples and returns precision, recall, and F1. Works with any pipeline that produces character-span predictions: LLM extractors, fine-tuned models, rule-based systems. Configurable overlap strategies, per-type precision/recall targets, and a scalar optimization score for automated prompt engineering.

Inspired by and a generalization of nervaluate.

Installation

pip install spaneval

Relationship to seqeval and nervaluate

seqeval works on IOB/BIO token sequences. LLMs output character spans — converting between the two is lossy and inconvenient.

nervaluate is the closest prior work and covers the SemEval 2013 strategies well. This library extends the same foundation with configurable overlap thresholds, per-type strategy assignment, and score() — a scalar optimization target for automated prompt engineering and hyperparameter search.

Quickstart

from spaneval import evaluate, to_entities

true = to_entities([
    {"entity_type": "PERSON", "start":  0, "end": 10},
    {"entity_type": "ORG",    "start": 34, "end": 44},
])
pred = to_entities([
    {"entity_type": "PERSON", "start":  0, "end":  9},  # slightly off boundary
    {"entity_type": "ORG",    "start": 34, "end": 44},  # exact
])

results = evaluate(true, pred)
results.report()

report() with no arguments shows a ± range across two strategies (Strict / AnyOverlap), giving an instant picture of how boundary precision affects your numbers.

For a guided walkthrough, see the examples:

  • examples/quickstart.py — five steps from zero-config to per-type strategy assignment
  • examples/goals.py — two steps from goal definition to automated prompt optimization

Strategies

Two branches, covered in full in docs/:

Entity-count strategies score each entity as correct, incorrect, missed, or spurious:

Strategy Span Type
Strict Exact boundaries Required
Exact Exact boundaries Ignored
EntType Any overlap Required
Partial Any overlap; 0.5 credit if boundaries differ Ignored
ProportionalCoverage Fraction of true-entity characters covered Ignored
AnyOverlap Any overlap = full credit Ignored
Contains Prediction must fully contain the true span Ignored
MinimumOverlap Configurable threshold and overlap metric Configurable

TextCoverage counts characters rather than entities — useful when the goal is text redaction rather than entity classification.

Per-type strategies and goals

Different entity types can be evaluated under different strategies and held to different targets:

from spaneval.strategies import Strict, ProportionalCoverage

# different strictness per type
results.report(strategy={"PERSON": Strict(), "DATE": ProportionalCoverage()})

# precision/recall targets per type
from spaneval import Goal
goals = {
    "PERSON": Goal(strategy=Strict(),               recall=0.90, precision=0.80),
    "DATE":   Goal(strategy=ProportionalCoverage(),  recall=0.80, precision=0.70),
}
results.report_goals(goals)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spaneval-0.2.0.tar.gz (127.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spaneval-0.2.0-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file spaneval-0.2.0.tar.gz.

File metadata

  • Download URL: spaneval-0.2.0.tar.gz
  • Upload date:
  • Size: 127.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spaneval-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4f277b6a89b35ab3d39e21db319358f8340b0e87e752e820e860b8e859d09274
MD5 642904d17c8e3a8fad9e8516d6d4b7d0
BLAKE2b-256 b630ef92a935d5bd15e07a8a93d16b2ef7933aaa94a5b708bb0a220cd54be805

See more details on using hashes here.

Provenance

The following attestation bundles were made for spaneval-0.2.0.tar.gz:

Publisher: publish.yml on jamblejoe/spaneval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spaneval-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: spaneval-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spaneval-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 44292d1d221ef886fcf831ff83ef1fde53000bc54aa8aa012977a42eb929428b
MD5 d0991cc9b64e12cf23e9cce9816c3169
BLAKE2b-256 f91af5a049ceb3e4c839b25de95cec992e28f74eea445085aa018108f3a2d8df

See more details on using hashes here.

Provenance

The following attestation bundles were made for spaneval-0.2.0-py3-none-any.whl:

Publisher: publish.yml on jamblejoe/spaneval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page