Skip to main content

Sherpa reconciliation processor

Project description

pyprocessors_reconciliation

license tests codecov docs version PyPI - Python Version

Reconciliation annotations coming from different annotators.

Installation

pip install pyprocessors-reconciliation

Overview

ReconciliationProcessor is a pymultirole processor plugin that reconciles overlapping annotations produced by multiple annotators (NER models, knowledge-base linkers, white/kill lists) into a single coherent set.

The processor is registered under the pyprocessors.plugins entry point as reconciliation.

Parameters

Parameter Type Default Description
type ReconciliationType linker Reconciliation strategy (currently only linker)
kill_label str | None None Label whose annotations suppress matching model annotations
white_label str | None None Label treated as authoritative (terms stripped so it acts like a model annotation)
whitelisted_lexicons list[str] | None None Lexicons whose annotations are duplicated as term-free model candidates
person_label str | None None Label used to identify person annotations for last-name resolution
remove_suspicious bool True Drop model annotations that contain no capitalised word (numbers, percentages, etc.)
resolve_lastnames bool False Resolve isolated last names / first names using full names seen earlier in the document

How it works

  1. Sentence filteringsentence-labelled annotations are removed before processing.
  2. Whitelist marking (mark_whitelisted) — annotations matching white_label have their terms cleared so they behave like model candidates; annotations from whitelisted_lexicons get a term-free duplicate added alongside the original.
  3. Grouping (group_annotations) — annotations are grouped by their first term's lexicon (empty string = model / no-lexicon). Same-span annotations in the same group have their term lists merged and deduplicated.
  4. Linker consolidation (consolidate_linker):
    • Suspicious model annotations (no capitalised word) are optionally dropped.
    • Kill-list annotations suppress matching model annotations.
    • KB annotations at the same span enrich the matching model annotation with their terms.
    • Overlapping or mismatched-label KB matches are logged as warnings and skipped.
  5. Last-name resolution — when resolve_lastnames=True, isolated person names (single token) are resolved to the full-name annotation seen earliest in the document.

Developing

Prerequisites

uv is required as the package manager.

pip install uv

Clone the repository:

git clone https://github.com/oterrier/pyprocessors_reconciliation
cd pyprocessors_reconciliation

Install in development mode

uv sync --extra test

Running the test suite

uv run pytest

Linting and formatting

uv run ruff check .
uv run ruff format .

Building the documentation

uv run --extra docs sphinx-build docs docs/_build

The built documentation is available at docs/_build/index.html.

Building and publishing

uv build
uv publish

SBOM & vulnerability check

Install the SBOM dependencies:

uv sync --extra sbom

Generate a CycloneDX SBOM from the current environment:

uv run cyclonedx-py environment -o sbom.cdx.json --output-format json

Audit dependencies for known vulnerabilities:

uv run pip-audit --format json --output audit-report.json

To fail on any known vulnerability (useful in CI):

uv run pip-audit --strict

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprocessors_reconciliation-1.6.53.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyprocessors_reconciliation-1.6.53-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file pyprocessors_reconciliation-1.6.53.tar.gz.

File metadata

  • Download URL: pyprocessors_reconciliation-1.6.53.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.9 {"installer":{"name":"uv","version":"0.11.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyprocessors_reconciliation-1.6.53.tar.gz
Algorithm Hash digest
SHA256 36c27d2e31052efefa41865b778498d793e64de16b295e14e088c06c0c0eb3c1
MD5 65368a3c26ecb40686a57fab2eee782d
BLAKE2b-256 7dfb6b67ce5ebafb53a658f3c8ab9d15320dc9ca25894de9048047d1bc850d4e

See more details on using hashes here.

File details

Details for the file pyprocessors_reconciliation-1.6.53-py3-none-any.whl.

File metadata

  • Download URL: pyprocessors_reconciliation-1.6.53-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.9 {"installer":{"name":"uv","version":"0.11.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyprocessors_reconciliation-1.6.53-py3-none-any.whl
Algorithm Hash digest
SHA256 3bf2d67e1f0491d16432f81dd10ea52b729157e4f0833f646d96222724b22bc4
MD5 9fcc0ea5224984b7807cf53154afc53d
BLAKE2b-256 fa81d3e5491a2a7508e489fb4b77df74a8da53490afb1375d8d6ea9b0d73a7db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page