Skip to main content

A Deep Learning Framework for TCR-Peptide Recognition Prediction

Project description

PepTCR-Net: Deep Learning for TCR-Peptide Recognition Prediction

PepTCR-Net Pipeline

Python 3.8+ TensorFlow 2.13+ License: MIT

PepTCR-Net is a state-of-the-art deep learning framework for predicting T-cell receptor (TCR) recognition of peptide antigens. It combines advanced neural network architectures with comprehensive feature engineering to achieve high-accuracy predictions with uncertainty quantification.

🌟 Key Features

  • Multi-modal Integration: Seamlessly combines sequence, categorical, and network-based features
  • Advanced Embeddings: Utilizes autoencoders, position encoding, and Atchley factors for sequence representation
  • Bayesian Neural Networks: Provides uncertainty quantification for predictions
  • Comprehensive Pipeline: End-to-end solution from data preprocessing to model deployment
  • Flexible Architecture: Modular design allows easy customization and extension
  • Class Imbalance Handling: Built-in support for imbalanced datasets
  • Rich Visualizations: Extensive plotting utilities for model interpretation

🚀 Quick Start

Install from PyPI:

pip install peptcrnet

For notebook demos, also install the optional notebook dependencies: pip install peptcrnet[notebooks].

Run the Complete Demo (Easiest!)

# One-click demo launcher
./run_demo.sh

This launches the complete Scenario 17 demo using all features!

Installation

From PyPI (recommended)

pip install peptcrnet

For notebooks and demos: pip install peptcrnet[notebooks]

Requirements: Python 3.8–3.12 (Python 3.13 is not supported due to dependency constraints).

From source (development)

git clone https://github.com/mlizhangx/Pep-TCRNet.git
cd Pep-TCR-Net
pip install -e ".[notebooks]"

# Run the demo
jupyter notebook DEMO_Complete_Pipeline.ipynb

Basic Usage

import peptcrnet
from peptcrnet import PepTCRNetPipeline

# Initialize pipeline
pipeline = PepTCRNetPipeline(data_path='your_data.csv')

# Load and prepare data
pipeline.load_data()
pipeline.split_data(test_size=0.2, val_size=0.1)

# Prepare features
pipeline.prepare_features(feature_types=['sequences', 'categorical'])

# Train model
history = pipeline.train(epochs=100, batch_size=128)

# Evaluate with uncertainty
results = pipeline.evaluate_with_uncertainty(n_samples=200)

# Make predictions
predictions = pipeline.predict(new_data)

📊 Data Format

PepTCR-Net expects input data in CSV format with the following columns:

Column Description Example
CDR3 TCR CDR3β sequence CASSRGQGNEQFF
Peptide Peptide sequence or class label GILGFVFTL
V V gene segment TRBV7-2
J J gene segment TRBJ2-1
HLA-A HLA-A allele A*02:01
HLA-B HLA-B allele B*07:02
HLA-C HLA-C allele C*07:01

🧪 Demo Notebook

Try our interactive demo notebook to see PepTCR-Net in action:

jupyter notebook demo_pipeline.ipynb

The demo includes: - Sample data generation - Step-by-step pipeline walkthrough - Model training and evaluation - Uncertainty quantification - Visualization examples

📚 Documentation

Pipeline Components

1. Data Loading and Preprocessing

from peptcrnet.data import DataLoader

loader = DataLoader('data.csv', atchley_path='atchley_factors.txt')
stats = loader.get_summary_stats()
splits = loader.split_data()

2. Feature Engineering

from peptcrnet.embeddings import SequenceEmbedder, CategoricalEmbedder

# Sequence embeddings
seq_embedder = SequenceEmbedder(atchley_factors, max_length=30)
tcr_embeddings = seq_embedder.encode_sequences(tcr_sequences)

# Categorical embeddings
cat_embedder = CategoricalEmbedder()
cat_embeddings = cat_embedder.encode_features(categorical_data)

3. Model Training

from peptcrnet.models import BayesianClassifier

model = BayesianClassifier(
    input_shapes={'sequences': (100,), 'categorical': (50,)},
    num_classes=5,
    hidden_dims=[512, 256, 64]
)

history = model.train(X_train, y_train, X_val, y_val)

4. Evaluation and Visualization

from peptcrnet.evaluation import ModelEvaluator
from peptcrnet.visualization import plot_confusion_matrix, plot_roc_curves

evaluator = ModelEvaluator()
metrics = evaluator.compute_metrics(y_true, y_pred, y_proba)

plot_confusion_matrix(y_true, y_pred)
plot_roc_curves(y_true, y_proba)

⚙️ Configuration

PepTCR-Net uses a centralized configuration system:

from peptcrnet import config

# Access configuration
print(config.ModelParams.MAX_TCR_LENGTH)
print(config.TrainingParams.BATCH_SIZE)

# Save configuration
config.save_config('my_config.json')

# Load configuration
config.load_config('my_config.json')

🔬 Advanced Features

Uncertainty Quantification

PepTCR-Net provides Bayesian uncertainty estimation:

# Multiple forward passes for uncertainty
predictions, uncertainty = pipeline.predict_with_uncertainty(
    test_data,
    n_samples=200
)

# Identify high-confidence predictions
high_confidence_mask = uncertainty < threshold

Custom Feature Combinations

Experiment with different feature combinations:

# Define feature cases
feature_cases = {
    1: ['TCR'],
    2: ['TCR', 'Peptide'],
    3: ['TCR', 'Peptide', 'HLA'],
    4: ['TCR', 'Peptide', 'HLA', 'VJ', 'Network']
}

# Train with specific features
pipeline.prepare_features(feature_types=feature_cases[3])

Model Persistence

Save and load trained models:

# Save complete pipeline
pipeline.save_pipeline('output_dir/')

# Load saved pipeline
new_pipeline = PepTCRNetPipeline()
new_pipeline.load_pipeline('output_dir/')

📈 Performance

PepTCR-Net achieves state-of-the-art performance on TCR-peptide binding prediction:

  • Accuracy: Up to 95% on benchmark datasets
  • AUC-ROC: 0.90 for multi-class classification
  • Uncertainty Calibration: Well-calibrated confidence scores

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

# Fork the repository
# Create your feature branch
git checkout -b feature/amazing-feature

# Commit your changes
git commit -m 'Add amazing feature'

# Push to the branch
git push origin feature/amazing-feature

# Open a Pull Request

📝 Citation

If you use PepTCR-Net in your research, please cite:

@article{le2025peptcrnet,
  title={PepTCR-Net: prediction of multi-class antigen peptides by T-cell receptor sequences with deep learning},
  author={Le, Phi and Ung, Leah and Yang, Hai and Huang, Anwen and He, Tao and Bruno, Peter and Oh, David Y and Keenan, Bridget P and Zhang, Li},
  journal={Briefings in Bioinformatics},
  volume={26},
  number={4},
  pages={bbaf351},
  year={2025},
  doi={10.1093/bib/bbaf351},
  url={https://doi.org/10.1093/bib/bbaf351}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📮 Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peptcrnet-1.0.3.tar.gz (773.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peptcrnet-1.0.3-py3-none-any.whl (35.8 kB view details)

Uploaded Python 3

File details

Details for the file peptcrnet-1.0.3.tar.gz.

File metadata

  • Download URL: peptcrnet-1.0.3.tar.gz
  • Upload date:
  • Size: 773.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for peptcrnet-1.0.3.tar.gz
Algorithm Hash digest
SHA256 37996bd5f70b7dcc4035c6e749a4578e5f42c0101a756ea118dc7e266e316285
MD5 0daa392d8e658630a7b6ace8b9eb4ad9
BLAKE2b-256 4c0d41ee8f699ae8dbd82cf7c22adeb454f69c9d8d5739cf36488afa841c83db

See more details on using hashes here.

File details

Details for the file peptcrnet-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: peptcrnet-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 35.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for peptcrnet-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4b8a47f66aed521c3cb919295adb46af0522407727016d7dbc78f8f966232157
MD5 f109def79b505bbb24522df80200e105
BLAKE2b-256 3210408f6ddcdafa88568287c8bed8ddd762bcc7ec35e38d8321ec47a54a233e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page