Sherpa reconciliation processor
Project description
pyprocessors_reconciliation
Reconciliation annotations coming from different annotators.
Installation
pip install pyprocessors-reconciliation
Overview
ReconciliationProcessor is a pymultirole processor plugin that reconciles overlapping annotations produced by multiple annotators (NER models, knowledge-base linkers, white/kill lists) into a single coherent set.
The processor is registered under the pyprocessors.plugins entry point as reconciliation.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
type |
ReconciliationType |
linker |
Reconciliation strategy (currently only linker) |
kill_label |
str | None |
None |
Label whose annotations suppress matching model annotations |
white_label |
str | None |
None |
Label treated as authoritative (terms stripped so it acts like a model annotation) |
whitelisted_lexicons |
list[str] | None |
None |
Lexicons whose annotations are duplicated as term-free model candidates |
person_label |
str | None |
None |
Label used to identify person annotations for last-name resolution |
remove_suspicious |
bool |
True |
Drop model annotations that contain no capitalised word (numbers, percentages, etc.) |
resolve_lastnames |
bool |
False |
Resolve isolated last names / first names using full names seen earlier in the document |
How it works
- Sentence filtering —
sentence-labelled annotations are removed before processing. - Whitelist marking (
mark_whitelisted) — annotations matchingwhite_labelhave their terms cleared so they behave like model candidates; annotations fromwhitelisted_lexiconsget a term-free duplicate added alongside the original. - Grouping (
group_annotations) — annotations are grouped by their first term's lexicon (empty string = model / no-lexicon). Same-span annotations in the same group have their term lists merged and deduplicated. - Linker consolidation (
consolidate_linker):- Suspicious model annotations (no capitalised word) are optionally dropped.
- Kill-list annotations suppress matching model annotations.
- KB annotations at the same span enrich the matching model annotation with their terms.
- Overlapping or mismatched-label KB matches are logged as warnings and skipped.
- Last-name resolution — when
resolve_lastnames=True, isolated person names (single token) are resolved to the full-name annotation seen earliest in the document.
Developing
Prerequisites
uv is required as the package manager.
pip install uv
Clone the repository:
git clone https://github.com/oterrier/pyprocessors_reconciliation
cd pyprocessors_reconciliation
Install in development mode
uv sync --extra test
Running the test suite
uv run pytest
Linting and formatting
uv run ruff check .
uv run ruff format .
Building the documentation
uv run --extra docs sphinx-build docs docs/_build
The built documentation is available at docs/_build/index.html.
Building and publishing
uv build
uv publish
SBOM & vulnerability check
Install the SBOM dependencies:
uv sync --extra sbom
Generate a CycloneDX SBOM from the current environment:
uv run cyclonedx-py environment -o sbom.cdx.json --output-format json
Audit dependencies for known vulnerabilities:
uv run pip-audit --format json --output audit-report.json
To fail on any known vulnerability (useful in CI):
uv run pip-audit --strict
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyprocessors_reconciliation-0.6.19.tar.gz.
File metadata
- Download URL: pyprocessors_reconciliation-0.6.19.tar.gz
- Upload date:
- Size: 36.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aec835161b25500f1215fdb40ca068316eb586e108ebb97ded9312def1ae4e1e
|
|
| MD5 |
8a87fe37a4caf49286f0ba5bbfefe5c9
|
|
| BLAKE2b-256 |
7fdf1c60bd51a1fbfb654efee495fff219de487048bc4e7df685a702a86e2211
|
File details
Details for the file pyprocessors_reconciliation-0.6.19-py3-none-any.whl.
File metadata
- Download URL: pyprocessors_reconciliation-0.6.19-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f30fe7f6d02c3088aebbebb0b452274806dc06e71aa9302353e1e07a32320fe
|
|
| MD5 |
bff9db2d27ce8d95b7acdf08e869390d
|
|
| BLAKE2b-256 |
ef4bd06ca503168496075ba67e60730a3f2f710cb0bfba11ec028fec33a3dc8a
|