Py-HLA-Match open-source research software for HLA matching.

These details have not been verified by PyPI

Project links

Project description

Py-HLA-Match

tests docs version license

About

Py-HLA-Match is a Python library for standardised, rule-based HLA (Human Leukocyte Antigen) matching in retrospective analyses, method development, benchmarking, and in-silico studies in immunogenetics and related fields.

Regulatory Notice

Py-HLA-Match is not certified or conformity assessed as a medical device software or in-vitro medical device software and is intended for research use only. It must therefore not be used for diagnosis or therapy of patients.

For more details on intended use, scope, and limitations, see the Software Card.

Installation

Install from PyPI:

pip install py-hla-match

Quickstart

This quickstart uses the artificial CSVs bundled under the demo folder and avoids any real or sensitive data.

Run hla matching (pairwise := )

Use the synthetic patient and donor CSVs and write results to a new file:

from py_hla_match.parser import HLADataSource
from py_hla_match.export import PairwiseMatch

data_path = "py_hla_match/demo/data/random_data/synthetic_patients.csv"
donor_path = "py_hla_match/demo/data/random_data/synthetic_donors.csv"
output_path = "py_hla_match/demo/data/random_data/match_results.csv"

src = HLADataSource(
    data_path,
    col_idx_start=1,
    col_idx_stop=13,
    row_idx_start=1,
)

tgt = HLADataSource(
    donor_path,
    col_idx_start=1,
    col_idx_stop=13,
    row_idx_start=1,
)

matcher = PairwiseMatch(
    source=src,
    target=tgt,
    storage_filename=output_path,
    include_ard_details=True,
    include_molecular_details=True,
    include_dpb1_tce=False,
    include_homozygosity=False,
    overwrite=True,
)

matcher.run()

Inspect raw allele-level results

Convert raw match levels to a DataFrame and write to CSV:

df = matcher.to_df()
print(df.head())

Matching Logic

Py-HLA-Match classifies each donor–recipient allele pair through in two stages that follow IPD-IMGT/HLA nomenclature semantics.

Stage 1: Mismatch Detection

Both alleles are reduced to their ARD (antigen recognition domain) equivalent via P-group affiliation. If ARD representations differ:

Condition	Classification
Field 1 (allele group) differs	`ANTIGEN_MISMATCH`
Field 1 identical, ARD field 2 differs	`ALLELE_MISMATCH`
DRB3/4/5: same broad locus, different sublocus	`SUBLOCUS_MISMATCH`
Insufficient resolution for comparison	`NOT_ASSESSABLE`

Stage 2: Match Refinement (ARD-matched pairs only)

Pairs classified as ARD_MATCH are refined along two independent dimensions:

ARD match level: identity at the antigen recognition domain:

Level	Meaning
`P_GROUP_MATCH`	Identical ARD amino acid sequence
`G_GROUP_MATCH`	Identical ARD nucleotide sequence

Molecular match level: depth of sequence identity beyond ARD:

Level	Condition	Example
`NOT_ASSESSABLE`	Only 2 fields typed	`A02:01` vs `A02:01`
`ARD_MATCH_ONLY`	Field 2 fields differ but allels share P group	`A01:01` vs `A01:510`
`FULL_PROTEIN_MATCH`	Fields 1–2 identical, field 3 differs or untyped	`A02:01:01` vs `A02:01:02`
`CODING_SEQUENCE_MATCH`	Fields 1–3 identical, field 4 differs or untyped	`A02:01:01:01` vs `A02:01:01:02`
`EXACT_ALLELE_MATCH`	All 4 fields identical	`A02:01:01:01` vs `A02:01:01:01`

Certainty

Each level carries a certainty indicator:

CERTAIN: typing resolution is sufficient to confirm the level
UNCERTAIN: a higher level remains possible given untyped fields

Examples

The examples below illustrate key design choices that Py-HLA-Match makes explicit.

Each is drawn directly from the test suite and is independently reproducible.

Resolution-aware certainty

The same ARD match is classified differently depending on typing depth:

from py_hla_match.hla import HLA
from py_hla_match.matching import allele_pair_match
from py_hla_match.models import HLAPair

# 4-field identical -> EXACT_ALLELE_MATCH, CERTAIN
patient = HLAPair(HLA("A*01:01:01:01"), HLA("A*02:01:01:01"))
donor   = HLAPair(HLA("A*02:01:01:01"), HLA("A*01:01:01:01"))
r = allele_pair_match(patient, donor)
# r.molecular_match_levels      -> (EXACT_ALLELE_MATCH, EXACT_ALLELE_MATCH)
# r.molecular_match_certainties -> (CERTAIN, CERTAIN)

# 4-field, field 4 differs -> CODING_SEQUENCE_MATCH, CERTAIN
patient = HLAPair(HLA("A*01:01:01:01"), HLA("A*01:01:01:04"))
donor   = HLAPair(HLA("A*01:01:01:03"), HLA("A*01:01:01:05"))
r = allele_pair_match(patient, donor)
# r.molecular_match_levels      -> (CODING_SEQUENCE_MATCH, CODING_SEQUENCE_MATCH)
# r.molecular_match_certainties -> (CERTAIN, CERTAIN)

# 3-field vs 4-field -> CODING_SEQUENCE_MATCH, UNCERTAIN
patient = HLAPair(HLA("A*01:01:01"), HLA("A*01:02:01"))
donor   = HLAPair(HLA("A*01:02:01:01"), HLA("A*01:01:01:03"))
r = allele_pair_match(patient, donor)
# r.molecular_match_levels      -> (CODING_SEQUENCE_MATCH, CODING_SEQUENCE_MATCH)
# r.molecular_match_certainties -> (UNCERTAIN, UNCERTAIN)

# 2-field identical -> FULL_PROTEIN_MATCH, UNCERTAIN
patient = HLAPair(HLA("A*01:01"), HLA("A*01:02"))
donor   = HLAPair(HLA("A*01:02"), HLA("A*01:01"))
r = allele_pair_match(patient, donor)
# r.molecular_match_levels      -> (FULL_PROTEIN_MATCH, FULL_PROTEIN_MATCH)
# r.molecular_match_certainties -> (UNCERTAIN, UNCERTAIN)

ARD equivalence is not sequence identity

Alleles with different names can share the same ARD reduction. Py-HLA-Match explicitly distinguishes immunological equivalence from sequence identity:

# A*02:01 and A*02:09 share the same G-group but differ at field 2
patient = HLAPair(HLA("A*02:01:01"), HLA("A*02:01:01"))
donor   = HLAPair(HLA("A*02:09:01"), HLA("A*02:09:01"))
r = allele_pair_match(patient, donor)
# r.allele_match_levels    -> (ARD_MATCH, ARD_MATCH)
# r.ard_match_levels       -> (G_GROUP_MATCH, G_GROUP_MATCH)
# r.molecular_match_levels -> (ARD_MATCH_ONLY, ARD_MATCH_ONLY)

ARD_MATCH_ONLY indicates that the alleles are equivalent at the antigen recognition domain but differ in their full sequence.

Expression suffix policy

Expression suffixes (N, L, S, C, A, Q) are evaluated under a configurable policy. The default treats risk-associated suffixes as functional mismatches and questionable expression (Q) as not assessable:

from py_hla_match.matching import allele_match

# Null allele vs expressed -> ALLELE_MISMATCH (default)
allele_match(HLA("C*03:693"), HLA("C*03:20N"))   # -> ALLELE_MISMATCH

# Questionable expression -> NOT_ASSESSABLE (default)
allele_match(HLA("A*01:436Q"), HLA("A*01:01:70"))  # -> NOT_ASSESSABLE

Override the default to apply center-specific conventions:

from py_hla_match.policy import ExpressionSuffixPolicy, ExpressionSuffixMatchLevel
from py_hla_match.config import HLAMatchConfig, set_config

set_config(HLAMatchConfig(expression_suffix_policy=ExpressionSuffixPolicy(
    q_present=ExpressionSuffixMatchLevel.ALLELE_MISMATCH,
)))
allele_match(HLA("A*01:436Q"), HLA("A*01:01:70"))  # -> ALLELE_MISMATCH

DRB3/4/5 sublocus mismatch

The DRB3/4/5 region is normalized to a shared DRB345 locus and is given an additional mismatch class:

# Different subloci within DRB345
allele_match(HLA("DRB3*02:02:01"), HLA("DRB4*01:03:01"))  # -> DRB345_SUBLOCUS_MISMATCH

# Present sublocus vs non-expressed marker
allele_match(HLA("DRB3*01"), HLA("DRBX*NE"))              # -> DRB345_SUBLOCUS_MISMATCH

Insufficient resolution

When typing resolution is too low for ARD comparison, the result is explicitly flagged rather than silently excluded or assumed to match:

# 1-field cannot confirm ARD equivalence even within the same allele group
allele_match(HLA("B*07"), HLA("B*07:05"))  # -> NOT_ASSESSABLE

# Missing data
allele_match(HLA("A*NE"), HLA("A*01:01"))  # -> NOT_ASSESSABLE

Terminology

Py-HLA-Match uses domain terms such as patient, donor, and score to mirror the structure of typical transplant research datasets (e.g. HSCT retrospective cohorts). These terms refer exclusively to roles and fields in research data and do not imply that Py-HLA-Match implements, recommends, or automates any clinical donor-selection or patient-management workflow.

All match levels and related outputs produced by the library are research metrics derived from HLA nomenclature semantics. They are not clinical risk scores or decision criteria.

Development

Prerequisites

Install Poetry:

curl -sSL https://install.python-poetry.org | python3 -

On Windows (PowerShell):

(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -

Setup

git clone https://github.com/fraunhofer-izi/py-hla-match.git
cd py-hla-match
poetry install

Running tests

poetry run pytest

License

Licensed under the Apache License, Version 2.0. You may obtain a copy of the License in the LICENSE file or at http://www.apache.org/licenses/LICENSE-2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 18, 2026

0.0.1

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_hla_match-0.1.0.tar.gz (35.8 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

py_hla_match-0.1.0-py3-none-any.whl (36.3 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file py_hla_match-0.1.0.tar.gz.

File metadata

Download URL: py_hla_match-0.1.0.tar.gz
Upload date: May 18, 2026
Size: 35.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.12.13 Linux/6.17.0-1013-azure

File hashes

Hashes for py_hla_match-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`929bc2d04da3cadd6e73c4386f93a37e2886d96308d21ba3c0d20e8159945d4f`
MD5	`a5376bf1d4c3c739962d7655f73e3a5b`
BLAKE2b-256	`1c7f22a6a679e04594f0e9d25cdfcc8f883626b6aacf8a8aa3f5cd1c6c818920`

See more details on using hashes here.

File details

Details for the file py_hla_match-0.1.0-py3-none-any.whl.

File metadata

Download URL: py_hla_match-0.1.0-py3-none-any.whl
Upload date: May 18, 2026
Size: 36.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.12.13 Linux/6.17.0-1013-azure

File hashes

Hashes for py_hla_match-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9d5393c55476db221ecad0d1b1ca33b6732402e6d478077526d692b261962f82`
MD5	`06d151933028bd6506c7d060d0bbb60e`
BLAKE2b-256	`ada0c231bd9acfa6b57a3db81d7fd90fb5a926fa71aae690ee618f8cf0f0aad8`

See more details on using hashes here.

py-hla-match 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Py-HLA-Match

About

Regulatory Notice

Installation

Quickstart

Run hla matching (pairwise := )

Inspect raw allele-level results

Matching Logic

Stage 1: Mismatch Detection

Stage 2: Match Refinement (ARD-matched pairs only)

Certainty

Examples

Resolution-aware certainty

ARD equivalence is not sequence identity

Expression suffix policy

DRB3/4/5 sublocus mismatch

Insufficient resolution

Terminology

Development

Prerequisites

Setup

Running tests

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes