Skip to main content

HLA typing based on T-cell beta chain repertoires and HLA mismatch score calculation.

Project description

THNet (TCR-based HLA similarity mapping network)

THNet is a Python 3.11 software designed to infer HLA haplotypes from T-cell beta chain repertoire datasets that can calculate mismatch scores (MS) by taking the HLA allele compositions of both donor and recipient as input to predict the transplantation outcome. The model can infer 208 HLA alleles based on the T-cell beta chain repertoire, with higher accuracy for common alleles.

THNet is developed and maintained by Li lab at the University of Pennsylvania. Please direct your questions regarding THNet to Mingyao Pan: mingyaop@seas.upenn.edu.

THNet is written in Python3.11, with the following dependencies:

  • [Panda]
  • [numpy]
  • [tqdm]
  • [scikit-learn]

Installation

THNet is available on PyPI and can be downloaded and installed through pip:

pip install THNet

THNet is also available on GitHub. The command line entry points can be installed by using the setup.py script:

$python setup.py install

Directory architecture:

 THNet/
├── README.md
├── LICENSE
├── setup.py
├── MANIFEST.in
├── THNet/                 
│   ├── __init__.py
│   ├── load_model.py
│   ├── HLA_inference/   
│   │   ├── __init__.py
│   │   ├── HLA_inference.py
│   │   ├── model_prediction.py
│   │   ├── example/
│   │   │   └── input_example.csv
│   │   ├── models/
│   │   │   ├── models_1.pkl  
│   │   │   └── models_2.pkl 
│   │   └── parameter/
│   │       ├── fscore_dict.pkl
│   │       ├── hla_auc.pkl
│   │       ├── hla_list.pkl
│   │       ├── hla_threshold.pkl      
│   │       └── v_gene_list.pkl
│   └── Mismatch_score/    
│       ├── __init__.py
│       ├── calculate_MS.py
│       ├── example/
│       │   └── input_example.csv
│       └── parameter/
│           ├── class1_distance.pkl  
│           ├── class2_distance.pkl 
│           └── hla_list.pkl

Usage

Type THNet --help to display all the command line options:

THNet has two functions: HLA_inference and Mismatch_score.

HLA_inference

HLA_inference is for the inference of HLA based on one's T cell beta chain repertoire

Commands Description
-i, --input_file Load input file that contains CDR3 beta sequences and V gene families from PATH/TO/FILE.
-o, --output_file Write model output to PATH/TO/FOLDER
-n --Top_HLA_n Output the top n most probable HLA alleles for each HLA type. Default 3. The valid value of n is 1 to 5
  • Input data format

The input file of HLA_inference is a .csv file (separated by delimiter ',') containing three columns: sample, cdr3, and v_gene. Note that the format of the V gene has to be: TRBVXX-XX (IMGT format). The vaild v gene list can be checked at THNet/HLA_inference/parameter/v_gene_list.pkl

cdr3,v_gene,sample
CAWSRGGVTGELFF,TRBV30,Sample1
CASKPMVNEQFF,TRBV19,Sample1
CASSLGAGLQETQYF,TRBV13,Sample1
CASSLSSGSSYNEQFF,TRBV27,Sample1
CASNAGLRDTQYF,TRBV2,Sample1
CASSAGTVVGNTIYF,TRBV5-1,Sample1

An example of the input files for Mismatch_score can be referred to at THNet/HLA_inference/example/input_example.csv

Note: A single sample should have around 10,000 TCR sequences for better model performance. Samples with every few TCRs will yield minimal HLA inference results.

  • Output data format

The output includes two files: HLA_inference.csv, which contains the final HLA predictions, and Top_hlas.csv, listing the top n HLA alleles with the highest probabilities for each HLA type.

  • Demo usage

THNet HLA_inference -i input_file_path/input.csv -o output_folder_path -n 4

This command line takes input.csv as input data and outputs the result files HLA_inference.csv and Top_hlas.csv in the output_folder_path folder.

Note: The input data is provided in a file, and the output_folder_path specifies where the output files will be stored, as there are multiple output files.

Mismatch_score

Mismatch_score is for the calculation of the mismatch scores (MS) by taking the HLA allele compositions of both donor and recipient as input to predict the outcome of transplantation.

Commands Description
-i, --input_file Load input file that contains CDR3 beta sequences and V gene families from PATH/TO/FILE.
-o, --output_file Write model output to PATH/TO/FOLDER
  • Input data format The input file of HLA_inference is a .csv (separated by delimiter ',') file containing 17 columns: TX_ID,Rec_A_1,Rec_A_2,Rec_B_1,Rec_B_2,Rec_C_1,Rec_C_2,Rec_DQB1_1,Rec_DQB1_2,Rec_DRB1_1,Rec_DRB1_2, Don_A_1,Don_A_2,Don_B_1,Don_B_2,Don_C_1,Don_C_2,Don_DQB1_1,Don_DQB1_2,Don_DRB1_1,Don_DRB1_2。 An example of the input files for Mismatch_score can be referred to at THNet/Mismatch_score/example/input_example.csv

Note: Some HLA information may be missing, but a capitalized 'X' should be placed in the corresponding position of the table as a placeholder. All 17 columns are required. The mismatch score will be calculated using the available HLA alleles for that transplantation pair. If all class I or class II HLA alleles of the donor or recipient are missing, the Class_I_MS or Class_II_MS will be set to zero.

Note: So far, there are 208 valid HLA alleles (including both class I and II). The valid HLA alleles can be checked in the file THNet/Mismatch_score/parameter/hla_list.pkl. Any HLA allele not present in this list will trigger an error message.

  • Output data format

The output includes one file: TX_Mismatch_score.csv, which contains three columns: TX_ID, Class_I_MS, and Class_II_MS. Class_I_MS and Class_II_MS indicate the mismatch score for HLA class I and HLA class II of the given TX_ID respectively.

  • Demo usage

THNet Mismatch_score -i input_file_path/input.csv -o output_folder_path

This command line takes input.csv as input data and outputs the result files TX_Mismatch_score.csv in the output_folder_path folder.

Note: The input data is provided in a file, and the output_folder_path specifies where the output files will be stored, as there are multiple output files.

This command line takes input.csv as input data and outputs the result files TX_Mismatch_score.csv in the output_folder_path folder.

Note: The input data is provided in a file, and the output_folder_path specifies where the output files will be stored, as there are multiple output files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

THNet-1.0.3.tar.gz (31.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

THNet-1.0.3-py3-none-any.whl (32.5 MB view details)

Uploaded Python 3

File details

Details for the file THNet-1.0.3.tar.gz.

File metadata

  • Download URL: THNet-1.0.3.tar.gz
  • Upload date:
  • Size: 31.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.3

File hashes

Hashes for THNet-1.0.3.tar.gz
Algorithm Hash digest
SHA256 31226a7544dff70760dc9dbb0da9fe936d8bee03660cff1416d4260f21f2cb37
MD5 d7a07af83735e1f45eb56fe713119e37
BLAKE2b-256 1fc28116f1337fbd5684bfb7fa5dc32400b998267fb747348717b0bdd46edc1d

See more details on using hashes here.

File details

Details for the file THNet-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: THNet-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 32.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.3

File hashes

Hashes for THNet-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 68ae422196d1d0c9ed1c1bd20cc549af453b8754f80aad8af1c8f837c3933a67
MD5 261716ed8143e396edb4f691db85ffbc
BLAKE2b-256 161307f990b36cbfd46c595a8d4c7005305ef06f6ad9c12d41b16ffbfb23ecdb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page