Skip to main content

Python wrapper for NetMHCpan-4.2 binding predictions

Project description

Python NetMHCpan Wrapper

Seamless integration of NetMHCpan binding predictions into pandas-based bioinformatics workflows.

Description

This package provides a user-friendly Python interface to NetMHCpan-4.2 (standalone version).
It handles peptide-MHC class I binding affinity predictions while solving problems of inconsistent HLA nomenclature and missing HLA information.

The wrapper supports two main prediction regimes: single allele and pan/supergroup prediction, with built-in parallel processing for speed.

Goals

  • Make NetMHCpan easy to use inside Python/pandas scripts and notebooks
  • Automatically fix inconsistent HLA allele names (e.g. HLA-A*01:01, A0101, A01)
  • Enable pan-prediction when only broad allele families are known

Supported Regimes

1. Single Mode (single)

  • Requires an HLA column
  • Automatically standardizes allele names using a mapping file
  • Predicts binding for the correctly matched allele

Best for: Standard epitope-HLA tables from IEDB, VDJdb, or custom datasets.

2. Pan / Supergroup Mode (pan)

  • Works with or without a helper (supergroup) column
  • Supports predictions for Homo sapiens ('hs'), Mus musculus ('mmu) or pan-prediction without species specification
  • If a valid family is given (e.g. HLA-A01, HLA-B15, HLA-E01), searches only inside that family
  • If no family or unknown family → performs a smart two-round search:
    1. First round: finds the best supergroup using representative alleles
    2. Second round: finds the best allele inside the winning supergroup
  • Handles HLA-free epitope lists

Best for: Large epitope sets with partial or no HLA information.

Directory Structure

mhc_netmhcpan_wrapper/
├── src/
│   ├── __init__.py
│   ├── config.py
│   ├── run_netmhcpan.py          # Main high-level function
│   ├── single_predictor.py
|   ├── mhc_supergroups_hs.txt
|   ├── mhc_supergroups_mmu.txt
│   └── pan_predictor.py
├── datasets/
│   ├── allele_nomenclature_mapping.txt
│   └── mhc_supergroups.txt
├── tmp/                          # Auto-created for temporary files
├── testing.ipynb                 # Usage examples
├── Netmchpan_wrapper_development_notebook.ipynb
├── README.md
└── ...

Quick Start Tutorial

1. Configuration

Edit src/config.py and set the number of cores:

from pathlib import Path

TMPDIR = Path('../tmp')
PATH_TO_SUPERGROUPS = Path('../datasets/mhc_supergroups.txt')
PATH_TO_MAPPING = Path('../datasets/allele_nomenclature_mapping.txt')
N_CORES = 8

2. Basic Usage

import pandas as pd
from pathlib import Path
from src.run_netmhcpan import run_netmhcpan

PATH_TO_NETMHCPAN = Path("/path/to/netMHCpan-4.2bstatic.Linux/netMHCpan-4.2")

# === Single mode ===
df = pd.read_csv("your_data.tsv", sep="\t")

result = run_netmhcpan(
    prediction_mode='single',
    path_to_netmhcpan=str(PATH_TO_NETMHCPAN),
    df=df,
    epitope_colname='antigen.epitope',
    hla_column='hla'
)

# === Pan mode ===
pan_df = pd.DataFrame({
    'epitope': ['LIDGIFLRY', 'VMADRTRHL', 'ANADLEVKI'],
    'family': [None, 'HLA-E01', 'HLA-C06']
})

result_pan = run_netmhcpan(
    prediction_mode='pan',
    path_to_netmhcpan=str(PATH_TO_NETMHCPAN),
    df=pan_df,
    epitope_colname='epitope',
    supergroup_column='family'   # can be None for pure pan search
    species = 'None' #can be hs or mmu for species-restricted pan-prediction
)

Using Classes Directly (Advanced)

from src.single_predictor import SinglePredictor

single = SinglePredictor(
    path_to_netmhcpan=PATH_TO_NETMHCPAN,
    path_to_mapping=Path("datasets/allele_nomenclature_mapping.txt"),
    tmpdir=Path("tmp"),
    n_cores=8
)

df = single.match_hla(df, 'hla')
df = single.predict_affinity_dataframe(df, 'epitope', 'matched_hla')

Important Notes

  • You must have NetMHCpan-4.2 installed and working from the command line (netMHCpan -h)
  • The wrapper calls the binary via subprocess
  • Predictions are run in BA mode (-BA)
  • For very large datasets, adjust N_CORES according to your available CPU and memory

License

This wrapper is available under GNU General Public License.

NetMHCpan is distributed by DTU Health Tech (https://services.healthtech.dtu.dk/services/NetMHCpan-4.1/) — please respect their licensing terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

netmhcpanwrapper-0.1.2.tar.gz (308.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

netmhcpanwrapper-0.1.2-py3-none-any.whl (293.8 kB view details)

Uploaded Python 3

File details

Details for the file netmhcpanwrapper-0.1.2.tar.gz.

File metadata

  • Download URL: netmhcpanwrapper-0.1.2.tar.gz
  • Upload date:
  • Size: 308.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for netmhcpanwrapper-0.1.2.tar.gz
Algorithm Hash digest
SHA256 050705ccbc2375f5525dd7784d4a41f2306a946050ca6b9833525cdf9eb9a9d5
MD5 3e75ed7e9234f254c18be85707d81cdd
BLAKE2b-256 c8da03958ef25e6ecac7be689c67c3f0393cd31606883efdcfe030938eaba379

See more details on using hashes here.

File details

Details for the file netmhcpanwrapper-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for netmhcpanwrapper-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 feddee26d8781b44773fb7a6f55ca95682290396ab86bb5f36afb01e83188ff5
MD5 641dfd9307db003bbc1db25edbbc78fd
BLAKE2b-256 284128d0344ce1ab778f88deeded0e83bf67920e922cab21501c00935bc05a99

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page