Skip to main content

Py-HLA-Match open-source research software for HLA matching.

Project description

Py-HLA-Match

DOI tests codecov docs version license

About

Py-HLA-Match is a Python library for standardised, rule-based HLA (Human Leukocyte Antigen) matching in retrospective analyses, method development, benchmarking, and in-silico studies in immunogenetics and related fields.

Regulatory Notice

Py-HLA-Match is not certified or conformity assessed as a medical device software or in-vitro medical device software and is intended for research use only. It must therefore not be used for diagnosis or therapy of patients.

For more details on intended use, scope, and limitations, see the Software Card.

Installation

Install from PyPI:

pip install py-hla-match

Quickstart

This quickstart uses the artificial CSVs bundled under the demo folder and avoids any real or sensitive data.

Run hla matching (pairwise := )

Use the synthetic patient and donor CSVs and write results to a new file:

from py_hla_match.parser import HLADataSource
from py_hla_match.export import PairwiseMatch

data_path = "py_hla_match/demo/data/random_data/synthetic_patients.csv"
donor_path = "py_hla_match/demo/data/random_data/synthetic_donors.csv"
output_path = "py_hla_match/demo/data/random_data/match_results.csv"

src = HLADataSource(
    data_path,
    col_idx_start=1,
    col_idx_stop=13,
    row_idx_start=1,
)

tgt = HLADataSource(
    donor_path,
    col_idx_start=1,
    col_idx_stop=13,
    row_idx_start=1,
)

matcher = PairwiseMatch(
    source=src,
    target=tgt,
    storage_filename=output_path,
    include_ard_details=True,
    include_molecular_details=True,
    include_dpb1_tce=False,
    include_homozygosity=False,
    overwrite=True,
)

matcher.run()

Inspect raw allele-level results

Convert raw match levels to a DataFrame and write to CSV:

df = matcher.to_df()
print(df.head())

Matching Logic

Py-HLA-Match classifies each donor–recipient allele pair through in two stages that follow IPD-IMGT/HLA nomenclature semantics.

Stage 1: Mismatch Detection

Both alleles are reduced to their ARD (antigen recognition domain) equivalent via P-group affiliation. If ARD representations differ:

Condition Classification
Field 1 (allele group) differs ANTIGEN_MISMATCH
Field 1 identical, ARD field 2 differs ALLELE_MISMATCH
DRB3/4/5: same broad locus, different sublocus SUBLOCUS_MISMATCH
Insufficient resolution for comparison NOT_ASSESSABLE

Stage 2: Match Refinement (ARD-matched pairs only)

Pairs classified as ARD_MATCH are refined along two independent dimensions:

ARD match level: identity at the antigen recognition domain:

Level Meaning
P_GROUP_MATCH Identical ARD amino acid sequence
G_GROUP_MATCH Identical ARD nucleotide sequence

Molecular match level: depth of sequence identity beyond ARD:

Level Condition Example
NOT_ASSESSABLE Only 2 fields typed A*02:01 vs A*02:01
ARD_MATCH_ONLY Field 2 fields differ but allels share P group A*01:01 vs A*01:510
FULL_PROTEIN_MATCH Fields 1–2 identical, field 3 differs or untyped A*02:01:01 vs A*02:01:02
CODING_SEQUENCE_MATCH Fields 1–3 identical, field 4 differs or untyped A*02:01:01:01 vs A*02:01:01:02
EXACT_ALLELE_MATCH All 4 fields identical A*02:01:01:01 vs A*02:01:01:01

Certainty

Each level carries a certainty indicator:

  • CERTAIN: typing resolution is sufficient to confirm the level
  • UNCERTAIN: a higher level remains possible given untyped fields

Examples

The examples below illustrate key design choices that Py-HLA-Match makes explicit.

Each is drawn directly from the test suite and is independently reproducible.

Resolution-aware certainty

The same ARD match is classified differently depending on typing depth:

from py_hla_match.hla import HLA
from py_hla_match.matching import allele_pair_match
from py_hla_match.models import HLAPair

# 4-field identical -> EXACT_ALLELE_MATCH, CERTAIN
patient = HLAPair(HLA("A*01:01:01:01"), HLA("A*02:01:01:01"))
donor   = HLAPair(HLA("A*02:01:01:01"), HLA("A*01:01:01:01"))
r = allele_pair_match(patient, donor)
# r.molecular_match_levels      -> (EXACT_ALLELE_MATCH, EXACT_ALLELE_MATCH)
# r.molecular_match_certainties -> (CERTAIN, CERTAIN)

# 4-field, field 4 differs -> CODING_SEQUENCE_MATCH, CERTAIN
patient = HLAPair(HLA("A*01:01:01:01"), HLA("A*01:01:01:04"))
donor   = HLAPair(HLA("A*01:01:01:03"), HLA("A*01:01:01:05"))
r = allele_pair_match(patient, donor)
# r.molecular_match_levels      -> (CODING_SEQUENCE_MATCH, CODING_SEQUENCE_MATCH)
# r.molecular_match_certainties -> (CERTAIN, CERTAIN)

# 3-field vs 4-field -> CODING_SEQUENCE_MATCH, UNCERTAIN
patient = HLAPair(HLA("A*01:01:01"), HLA("A*01:02:01"))
donor   = HLAPair(HLA("A*01:02:01:01"), HLA("A*01:01:01:03"))
r = allele_pair_match(patient, donor)
# r.molecular_match_levels      -> (CODING_SEQUENCE_MATCH, CODING_SEQUENCE_MATCH)
# r.molecular_match_certainties -> (UNCERTAIN, UNCERTAIN)

# 2-field identical -> FULL_PROTEIN_MATCH, UNCERTAIN
patient = HLAPair(HLA("A*01:01"), HLA("A*01:02"))
donor   = HLAPair(HLA("A*01:02"), HLA("A*01:01"))
r = allele_pair_match(patient, donor)
# r.molecular_match_levels      -> (FULL_PROTEIN_MATCH, FULL_PROTEIN_MATCH)
# r.molecular_match_certainties -> (UNCERTAIN, UNCERTAIN)

ARD equivalence is not sequence identity

Alleles with different names can share the same ARD reduction. Py-HLA-Match explicitly distinguishes immunological equivalence from sequence identity:

# A*02:01 and A*02:09 share the same G-group but differ at field 2
patient = HLAPair(HLA("A*02:01:01"), HLA("A*02:01:01"))
donor   = HLAPair(HLA("A*02:09:01"), HLA("A*02:09:01"))
r = allele_pair_match(patient, donor)
# r.allele_match_levels    -> (ARD_MATCH, ARD_MATCH)
# r.ard_match_levels       -> (G_GROUP_MATCH, G_GROUP_MATCH)
# r.molecular_match_levels -> (ARD_MATCH_ONLY, ARD_MATCH_ONLY)

ARD_MATCH_ONLY indicates that the alleles are equivalent at the antigen recognition domain but differ in their full sequence.

Expression suffix policy

Expression suffixes (N, L, S, C, A, Q) are evaluated under a configurable policy. The default treats risk-associated suffixes as functional mismatches and questionable expression (Q) as not assessable:

from py_hla_match.matching import allele_match

# Null allele vs expressed -> ALLELE_MISMATCH (default)
allele_match(HLA("C*03:693"), HLA("C*03:20N"))   # -> ALLELE_MISMATCH

# Questionable expression -> NOT_ASSESSABLE (default)
allele_match(HLA("A*01:436Q"), HLA("A*01:01:70"))  # -> NOT_ASSESSABLE

Override the default to apply center-specific conventions:

from py_hla_match.policy import ExpressionSuffixPolicy, ExpressionSuffixMatchLevel
from py_hla_match.config import HLAMatchConfig, set_config

set_config(HLAMatchConfig(expression_suffix_policy=ExpressionSuffixPolicy(
    q_present=ExpressionSuffixMatchLevel.ALLELE_MISMATCH,
)))
allele_match(HLA("A*01:436Q"), HLA("A*01:01:70"))  # -> ALLELE_MISMATCH

DRB3/4/5 sublocus mismatch

The DRB3/4/5 region is normalized to a shared DRB345 locus and is given an additional mismatch class:

# Different subloci within DRB345
allele_match(HLA("DRB3*02:02:01"), HLA("DRB4*01:03:01"))  # -> DRB345_SUBLOCUS_MISMATCH

# Present sublocus vs non-expressed marker
allele_match(HLA("DRB3*01"), HLA("DRBX*NE"))              # -> DRB345_SUBLOCUS_MISMATCH

Insufficient resolution

When typing resolution is too low for ARD comparison, the result is explicitly flagged rather than silently excluded or assumed to match:

# 1-field cannot confirm ARD equivalence even within the same allele group
allele_match(HLA("B*07"), HLA("B*07:05"))  # -> NOT_ASSESSABLE

# Missing data
allele_match(HLA("A*NE"), HLA("A*01:01"))  # -> NOT_ASSESSABLE

Terminology

Py-HLA-Match uses domain terms such as patient, donor, and score to mirror the structure of typical transplant research datasets (e.g. HSCT retrospective cohorts). These terms refer exclusively to roles and fields in research data and do not imply that Py-HLA-Match implements, recommends, or automates any clinical donor-selection or patient-management workflow.

All match levels and related outputs produced by the library are research metrics derived from HLA nomenclature semantics. They are not clinical risk scores or decision criteria.

Development

Prerequisites

Install Poetry:

curl -sSL https://install.python-poetry.org | python3 -

On Windows (PowerShell):

(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -

Setup

git clone https://github.com/fraunhofer-izi/py-hla-match.git
cd py-hla-match
poetry install

Running tests

poetry run pytest

License

Copyright 2025 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.

Licensed under the Apache License, Version 2.0. You may obtain a copy of the License in the LICENSE file or at http://www.apache.org/licenses/LICENSE-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_hla_match-0.1.0.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_hla_match-0.1.0-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file py_hla_match-0.1.0.tar.gz.

File metadata

  • Download URL: py_hla_match-0.1.0.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.1 CPython/3.12.13 Linux/6.17.0-1013-azure

File hashes

Hashes for py_hla_match-0.1.0.tar.gz
Algorithm Hash digest
SHA256 929bc2d04da3cadd6e73c4386f93a37e2886d96308d21ba3c0d20e8159945d4f
MD5 a5376bf1d4c3c739962d7655f73e3a5b
BLAKE2b-256 1c7f22a6a679e04594f0e9d25cdfcc8f883626b6aacf8a8aa3f5cd1c6c818920

See more details on using hashes here.

File details

Details for the file py_hla_match-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: py_hla_match-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.4.1 CPython/3.12.13 Linux/6.17.0-1013-azure

File hashes

Hashes for py_hla_match-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9d5393c55476db221ecad0d1b1ca33b6732402e6d478077526d692b261962f82
MD5 06d151933028bd6506c7d060d0bbb60e
BLAKE2b-256 ada0c231bd9acfa6b57a3db81d7fd90fb5a926fa71aae690ee618f8cf0f0aad8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page