Skip to main content

Sherpa reconciliation processor

Project description

pyprocessors_reconciliation

license tests codecov docs version PyPI - Python Version

Reconciliation annotations coming from different annotators.

Installation

pip install pyprocessors-reconciliation

Overview

ReconciliationProcessor is a pymultirole processor plugin that reconciles overlapping annotations produced by multiple annotators (NER models, knowledge-base linkers, white/kill lists) into a single coherent set.

The processor is registered under the pyprocessors.plugins entry point as reconciliation.

Parameters

Parameter Type Default Description
type ReconciliationType linker Reconciliation strategy (currently only linker)
kill_label str | None None Label whose annotations suppress matching model annotations
white_label str | None None Label treated as authoritative (terms stripped so it acts like a model annotation)
whitelisted_lexicons list[str] | None None Lexicons whose annotations are duplicated as term-free model candidates
person_label str | None None Label used to identify person annotations for last-name resolution
remove_suspicious bool True Drop model annotations that contain no capitalised word (numbers, percentages, etc.)
resolve_lastnames bool False Resolve isolated last names / first names using full names seen earlier in the document

How it works

  1. Sentence filteringsentence-labelled annotations are removed before processing.
  2. Whitelist marking (mark_whitelisted) — annotations matching white_label have their terms cleared so they behave like model candidates; annotations from whitelisted_lexicons get a term-free duplicate added alongside the original.
  3. Grouping (group_annotations) — annotations are grouped by their first term's lexicon (empty string = model / no-lexicon). Same-span annotations in the same group have their term lists merged and deduplicated.
  4. Linker consolidation (consolidate_linker):
    • Suspicious model annotations (no capitalised word) are optionally dropped.
    • Kill-list annotations suppress matching model annotations.
    • KB annotations at the same span enrich the matching model annotation with their terms.
    • Overlapping or mismatched-label KB matches are logged as warnings and skipped.
  5. Last-name resolution — when resolve_lastnames=True, isolated person names (single token) are resolved to the full-name annotation seen earliest in the document.

Developing

Prerequisites

uv is required as the package manager.

pip install uv

Clone the repository:

git clone https://github.com/oterrier/pyprocessors_reconciliation
cd pyprocessors_reconciliation

Install in development mode

uv sync --extra test

Running the test suite

uv run pytest

Linting and formatting

uv run ruff check .
uv run ruff format .

Building the documentation

uv run --extra docs sphinx-build docs docs/_build

The built documentation is available at docs/_build/index.html.

Building and publishing

uv build
uv publish

SBOM & vulnerability check

Install the SBOM dependencies:

uv sync --extra sbom

Generate a CycloneDX SBOM from the current environment:

uv run cyclonedx-py environment -o sbom.cdx.json --output-format json

Audit dependencies for known vulnerabilities:

uv run pip-audit --format json --output audit-report.json

To fail on any known vulnerability (useful in CI):

uv run pip-audit --strict

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprocessors_reconciliation-1.8.41.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyprocessors_reconciliation-1.8.41-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file pyprocessors_reconciliation-1.8.41.tar.gz.

File metadata

  • Download URL: pyprocessors_reconciliation-1.8.41.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.10 {"installer":{"name":"uv","version":"0.11.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyprocessors_reconciliation-1.8.41.tar.gz
Algorithm Hash digest
SHA256 2d2b4c622fd1e5606152337a0bd86eab18acf6b8b139e4cf95a36a361365fda2
MD5 4fca2afbf60b85a2b6aaacfae25ec2e7
BLAKE2b-256 742038cad27b87d81b6274374cf7a4fb6fe75c198b49f000234c7200db26a47e

See more details on using hashes here.

File details

Details for the file pyprocessors_reconciliation-1.8.41-py3-none-any.whl.

File metadata

  • Download URL: pyprocessors_reconciliation-1.8.41-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.10 {"installer":{"name":"uv","version":"0.11.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyprocessors_reconciliation-1.8.41-py3-none-any.whl
Algorithm Hash digest
SHA256 26206919ac013e269e9e4061ae68a18aecdad0b6ea21ea7e821657c77d457412
MD5 80f792dd8cac87eae1c6c7f1cc3e3a4c
BLAKE2b-256 c7e1d04a78e6c741a41673bfb424b9daacfc66bdfdc59ba54322fd479fbb9392

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page