Skip to main content

Sherpa reconciliation processor

Project description

pyprocessors_reconciliation

license tests codecov docs version PyPI - Python Version

Reconciliation annotations coming from different annotators.

Installation

pip install pyprocessors-reconciliation

Overview

ReconciliationProcessor is a pymultirole processor plugin that reconciles overlapping annotations produced by multiple annotators (NER models, knowledge-base linkers, white/kill lists) into a single coherent set.

The processor is registered under the pyprocessors.plugins entry point as reconciliation.

Parameters

Parameter Type Default Description
type ReconciliationType linker Reconciliation strategy (currently only linker)
kill_label str | None None Label whose annotations suppress matching model annotations
white_label str | None None Label treated as authoritative (terms stripped so it acts like a model annotation)
whitelisted_lexicons list[str] | None None Lexicons whose annotations are duplicated as term-free model candidates
person_label str | None None Label used to identify person annotations for last-name resolution
remove_suspicious bool True Drop model annotations that contain no capitalised word (numbers, percentages, etc.)
resolve_lastnames bool False Resolve isolated last names / first names using full names seen earlier in the document

How it works

  1. Sentence filteringsentence-labelled annotations are removed before processing.
  2. Whitelist marking (mark_whitelisted) — annotations matching white_label have their terms cleared so they behave like model candidates; annotations from whitelisted_lexicons get a term-free duplicate added alongside the original.
  3. Grouping (group_annotations) — annotations are grouped by their first term's lexicon (empty string = model / no-lexicon). Same-span annotations in the same group have their term lists merged and deduplicated.
  4. Linker consolidation (consolidate_linker):
    • Suspicious model annotations (no capitalised word) are optionally dropped.
    • Kill-list annotations suppress matching model annotations.
    • KB annotations at the same span enrich the matching model annotation with their terms.
    • Overlapping or mismatched-label KB matches are logged as warnings and skipped.
  5. Last-name resolution — when resolve_lastnames=True, isolated person names (single token) are resolved to the full-name annotation seen earliest in the document.

Developing

Prerequisites

uv is required as the package manager.

pip install uv

Clone the repository:

git clone https://github.com/oterrier/pyprocessors_reconciliation
cd pyprocessors_reconciliation

Install in development mode

uv sync --extra test

Running the test suite

uv run pytest

Linting and formatting

uv run ruff check .
uv run ruff format .

Building the documentation

uv run --extra docs sphinx-build docs docs/_build

The built documentation is available at docs/_build/index.html.

Building and publishing

uv build
uv publish

SBOM & vulnerability check

Install the SBOM dependencies:

uv sync --extra sbom

Generate a CycloneDX SBOM from the current environment:

uv run cyclonedx-py environment -o sbom.cdx.json --output-format json

Audit dependencies for known vulnerabilities:

uv run pip-audit --format json --output audit-report.json

To fail on any known vulnerability (useful in CI):

uv run pip-audit --strict

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprocessors_reconciliation-1.6.31.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyprocessors_reconciliation-1.6.31-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file pyprocessors_reconciliation-1.6.31.tar.gz.

File metadata

  • Download URL: pyprocessors_reconciliation-1.6.31.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyprocessors_reconciliation-1.6.31.tar.gz
Algorithm Hash digest
SHA256 bee732bc14147dedd6940eaf1cdb26771c56201b97e01a05af3df7cda18eb65a
MD5 1c9d8565585114a0844efb794536da25
BLAKE2b-256 c40b658b729a5e670524c6ce90c576dc38ccdf0f9564cfca8b85e7100fbfd5d6

See more details on using hashes here.

File details

Details for the file pyprocessors_reconciliation-1.6.31-py3-none-any.whl.

File metadata

  • Download URL: pyprocessors_reconciliation-1.6.31-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyprocessors_reconciliation-1.6.31-py3-none-any.whl
Algorithm Hash digest
SHA256 aab731213f429fd4943a870c19f27dd2ceea8b22afb31c4b3fc6afcabafad864
MD5 37923b4856e113cf9a8799b572c69a59
BLAKE2b-256 f5eb8f2230f6e452247ae340af2dddfb6964a412273849b3cf958982049703fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page