Skip to main content

Sherpa reconciliation processor

Project description

pyprocessors_reconciliation

license tests codecov docs version PyPI - Python Version

Reconciliation annotations coming from different annotators.

Installation

pip install pyprocessors-reconciliation

Overview

ReconciliationProcessor is a pymultirole processor plugin that reconciles overlapping annotations produced by multiple annotators (NER models, knowledge-base linkers, white/kill lists) into a single coherent set.

The processor is registered under the pyprocessors.plugins entry point as reconciliation.

Parameters

Parameter Type Default Description
type ReconciliationType linker Reconciliation strategy (currently only linker)
kill_label str | None None Label whose annotations suppress matching model annotations
white_label str | None None Label treated as authoritative (terms stripped so it acts like a model annotation)
whitelisted_lexicons list[str] | None None Lexicons whose annotations are duplicated as term-free model candidates
person_label str | None None Label used to identify person annotations for last-name resolution
remove_suspicious bool True Drop model annotations that contain no capitalised word (numbers, percentages, etc.)
resolve_lastnames bool False Resolve isolated last names / first names using full names seen earlier in the document

How it works

  1. Sentence filteringsentence-labelled annotations are removed before processing.
  2. Whitelist marking (mark_whitelisted) — annotations matching white_label have their terms cleared so they behave like model candidates; annotations from whitelisted_lexicons get a term-free duplicate added alongside the original.
  3. Grouping (group_annotations) — annotations are grouped by their first term's lexicon (empty string = model / no-lexicon). Same-span annotations in the same group have their term lists merged and deduplicated.
  4. Linker consolidation (consolidate_linker):
    • Suspicious model annotations (no capitalised word) are optionally dropped.
    • Kill-list annotations suppress matching model annotations.
    • KB annotations at the same span enrich the matching model annotation with their terms.
    • Overlapping or mismatched-label KB matches are logged as warnings and skipped.
  5. Last-name resolution — when resolve_lastnames=True, isolated person names (single token) are resolved to the full-name annotation seen earliest in the document.

Developing

Prerequisites

uv is required as the package manager.

pip install uv

Clone the repository:

git clone https://github.com/oterrier/pyprocessors_reconciliation
cd pyprocessors_reconciliation

Install in development mode

uv sync --extra test

Running the test suite

uv run pytest

Linting and formatting

uv run ruff check .
uv run ruff format .

Building the documentation

uv run --extra docs sphinx-build docs docs/_build

The built documentation is available at docs/_build/index.html.

Building and publishing

uv build
uv publish

SBOM & vulnerability check

Install the SBOM dependencies:

uv sync --extra sbom

Generate a CycloneDX SBOM from the current environment:

uv run cyclonedx-py environment -o sbom.cdx.json --output-format json

Audit dependencies for known vulnerabilities:

uv run pip-audit --format json --output audit-report.json

To fail on any known vulnerability (useful in CI):

uv run pip-audit --strict

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprocessors_reconciliation-0.6.13.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyprocessors_reconciliation-0.6.13-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file pyprocessors_reconciliation-0.6.13.tar.gz.

File metadata

  • Download URL: pyprocessors_reconciliation-0.6.13.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyprocessors_reconciliation-0.6.13.tar.gz
Algorithm Hash digest
SHA256 4976c31dc03be056a8607e086813a5e3c824f5394f13b6ed8a48e6af0507d496
MD5 339076af216cb72ee6cde848577ea374
BLAKE2b-256 5b2a1bd47e69b41db4fbbb552f28aeddab20ce53904078e24846e2d8a7fa9796

See more details on using hashes here.

File details

Details for the file pyprocessors_reconciliation-0.6.13-py3-none-any.whl.

File metadata

  • Download URL: pyprocessors_reconciliation-0.6.13-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyprocessors_reconciliation-0.6.13-py3-none-any.whl
Algorithm Hash digest
SHA256 99ed9fd428c88417eba8e370379c0247f46c47115fb56f63e9e003e1e1fb8f99
MD5 6ce79edcb98eb9837ba6a9a154f738d7
BLAKE2b-256 9e24ffafbb14bcc5478a03711b242f8d0f0b5e774df3ee978775bb33cfda2f4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page