Skip to main content

Sherpa reconciliation processor

Project description

pyprocessors_reconciliation

license tests codecov docs version PyPI - Python Version

Reconciliation annotations coming from different annotators.

Installation

pip install pyprocessors-reconciliation

Overview

ReconciliationProcessor is a pymultirole processor plugin that reconciles overlapping annotations produced by multiple annotators (NER models, knowledge-base linkers, white/kill lists) into a single coherent set.

The processor is registered under the pyprocessors.plugins entry point as reconciliation.

Parameters

Parameter Type Default Description
type ReconciliationType linker Reconciliation strategy (currently only linker)
kill_label str | None None Label whose annotations suppress matching model annotations
white_label str | None None Label treated as authoritative (terms stripped so it acts like a model annotation)
whitelisted_lexicons list[str] | None None Lexicons whose annotations are duplicated as term-free model candidates
person_label str | None None Label used to identify person annotations for last-name resolution
remove_suspicious bool True Drop model annotations that contain no capitalised word (numbers, percentages, etc.)
resolve_lastnames bool False Resolve isolated last names / first names using full names seen earlier in the document

How it works

  1. Sentence filteringsentence-labelled annotations are removed before processing.
  2. Whitelist marking (mark_whitelisted) — annotations matching white_label have their terms cleared so they behave like model candidates; annotations from whitelisted_lexicons get a term-free duplicate added alongside the original.
  3. Grouping (group_annotations) — annotations are grouped by their first term's lexicon (empty string = model / no-lexicon). Same-span annotations in the same group have their term lists merged and deduplicated.
  4. Linker consolidation (consolidate_linker):
    • Suspicious model annotations (no capitalised word) are optionally dropped.
    • Kill-list annotations suppress matching model annotations.
    • KB annotations at the same span enrich the matching model annotation with their terms.
    • Overlapping or mismatched-label KB matches are logged as warnings and skipped.
  5. Last-name resolution — when resolve_lastnames=True, isolated person names (single token) are resolved to the full-name annotation seen earliest in the document.

Developing

Prerequisites

uv is required as the package manager.

pip install uv

Clone the repository:

git clone https://github.com/oterrier/pyprocessors_reconciliation
cd pyprocessors_reconciliation

Install in development mode

uv sync --extra test

Running the test suite

uv run pytest

Linting and formatting

uv run ruff check .
uv run ruff format .

Building the documentation

uv run --extra docs sphinx-build docs docs/_build

The built documentation is available at docs/_build/index.html.

Building and publishing

uv build
uv publish

SBOM & vulnerability check

Install the SBOM dependencies:

uv sync --extra sbom

Generate a CycloneDX SBOM from the current environment:

uv run cyclonedx-py environment -o sbom.cdx.json --output-format json

Audit dependencies for known vulnerabilities:

uv run pip-audit --format json --output audit-report.json

To fail on any known vulnerability (useful in CI):

uv run pip-audit --strict

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprocessors_reconciliation-1.8.45.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyprocessors_reconciliation-1.8.45-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file pyprocessors_reconciliation-1.8.45.tar.gz.

File metadata

  • Download URL: pyprocessors_reconciliation-1.8.45.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.10 {"installer":{"name":"uv","version":"0.11.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyprocessors_reconciliation-1.8.45.tar.gz
Algorithm Hash digest
SHA256 d12db834bebfeaaebc33619b3ce188d41e5963a974208c4959fcf53dbd207efd
MD5 3e7959db559db456f92ddccc8ed742ce
BLAKE2b-256 4085bc4c2f7ba96e484bf88ef6ea77a2bc9571185f43d2447e03a6cc23b479ac

See more details on using hashes here.

File details

Details for the file pyprocessors_reconciliation-1.8.45-py3-none-any.whl.

File metadata

  • Download URL: pyprocessors_reconciliation-1.8.45-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.10 {"installer":{"name":"uv","version":"0.11.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyprocessors_reconciliation-1.8.45-py3-none-any.whl
Algorithm Hash digest
SHA256 ac8059fdc3d23aa0be8577ce56dcca750b4f898eead5bd15ca2de790ec628472
MD5 e758189a1bb452d8ff58aa836de71764
BLAKE2b-256 573f4f9601b41d8d40ea354b30372f7703bd6fc2bcbb882abb2c7841b4980c46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page