Skip to main content

A Deep Learning Framework for TCR-Peptide Recognition Prediction

Project description

PepTCRNet: Deep Learning for TCR-Peptide Recognition Prediction

PepTCRNet Pipeline

Python 3.8+ TensorFlow 2.13+ License: MIT

PepTCRNet is a state-of-the-art deep learning framework for predicting T-cell receptor (TCR) recognition of peptide antigens. It combines advanced neural network architectures with comprehensive feature engineering to achieve high-accuracy predictions with uncertainty quantification.

🌟 Key Features

  • Multi-modal Integration: Seamlessly combines sequence, categorical, and network-based features
  • Advanced Embeddings: Utilizes autoencoders, position encoding, and Atchley factors for sequence representation
  • Bayesian Neural Networks: Provides uncertainty quantification for predictions
  • Comprehensive Pipeline: End-to-end solution from data preprocessing to model deployment
  • Flexible Architecture: Modular design allows easy customization and extension
  • Class Imbalance Handling: Built-in support for imbalanced datasets
  • Rich Visualizations: Extensive plotting utilities for model interpretation

🚀 Quick Start

Run the Complete Demo (Easiest!)

# One-click demo launcher
./run_demo.sh

This launches the complete Scenario 17 demo using all features!

Installation

From PyPI (recommended)

pip install peptcrnet

Requirements: Python 3.8–3.12 (Python 3.13 is not supported due to dependency constraints).

From source (development)

git clone https://github.com/mlizhangx/Pep-TCRNet.git
cd Pep-TCR-Net
pip install -e .

# Run the demo
jupyter notebook DEMO_Complete_Pipeline.ipynb

Basic Usage

import peptcrnet
from peptcrnet import PepTCRNetPipeline

# Initialize pipeline
pipeline = PepTCRNetPipeline(data_path='your_data.csv')

# Load and prepare data
pipeline.load_data()
pipeline.split_data(test_size=0.2, val_size=0.1)

# Prepare features
pipeline.prepare_features(feature_types=['sequences', 'categorical'])

# Train model
history = pipeline.train(epochs=100, batch_size=128)

# Evaluate with uncertainty
results = pipeline.evaluate_with_uncertainty(n_samples=200)

# Make predictions
predictions = pipeline.predict(new_data)

📊 Data Format

PepTCRNet expects input data in CSV format with the following columns:

Column Description Example
CDR3 TCR CDR3β sequence CASSRGQGNEQFF
Peptide Peptide sequence or class label GILGFVFTL
V V gene segment TRBV7-2
J J gene segment TRBJ2-1
HLA-A HLA-A allele A*02:01
HLA-B HLA-B allele B*07:02
HLA-C HLA-C allele C*07:01

🧪 Demo Notebook

Try our interactive demo notebook to see PepTCRNet in action:

jupyter notebook demo_pipeline.ipynb

The demo includes: - Sample data generation - Step-by-step pipeline walkthrough - Model training and evaluation - Uncertainty quantification - Visualization examples

📚 Documentation

Pipeline Components

1. Data Loading and Preprocessing

from peptcrnet.data import DataLoader

loader = DataLoader('data.csv', atchley_path='atchley_factors.txt')
stats = loader.get_summary_stats()
splits = loader.split_data()

2. Feature Engineering

from peptcrnet.embeddings import SequenceEmbedder, CategoricalEmbedder

# Sequence embeddings
seq_embedder = SequenceEmbedder(atchley_factors, max_length=30)
tcr_embeddings = seq_embedder.encode_sequences(tcr_sequences)

# Categorical embeddings
cat_embedder = CategoricalEmbedder()
cat_embeddings = cat_embedder.encode_features(categorical_data)

3. Model Training

from peptcrnet.models import BayesianClassifier

model = BayesianClassifier(
    input_shapes={'sequences': (100,), 'categorical': (50,)},
    num_classes=5,
    hidden_dims=[512, 256, 64]
)

history = model.train(X_train, y_train, X_val, y_val)

4. Evaluation and Visualization

from peptcrnet.evaluation import ModelEvaluator
from peptcrnet.visualization import plot_confusion_matrix, plot_roc_curves

evaluator = ModelEvaluator()
metrics = evaluator.compute_metrics(y_true, y_pred, y_proba)

plot_confusion_matrix(y_true, y_pred)
plot_roc_curves(y_true, y_proba)

⚙️ Configuration

PepTCRNet uses a centralized configuration system:

from peptcrnet import config

# Access configuration
print(config.ModelParams.MAX_TCR_LENGTH)
print(config.TrainingParams.BATCH_SIZE)

# Save configuration
config.save_config('my_config.json')

# Load configuration
config.load_config('my_config.json')

🔬 Advanced Features

Uncertainty Quantification

PepTCRNet provides Bayesian uncertainty estimation:

# Multiple forward passes for uncertainty
predictions, uncertainty = pipeline.predict_with_uncertainty(
    test_data,
    n_samples=200
)

# Identify high-confidence predictions
high_confidence_mask = uncertainty < threshold

Custom Feature Combinations

Experiment with different feature combinations:

# Define feature cases
feature_cases = {
    1: ['TCR'],
    2: ['TCR', 'Peptide'],
    3: ['TCR', 'Peptide', 'HLA'],
    4: ['TCR', 'Peptide', 'HLA', 'VJ', 'Network']
}

# Train with specific features
pipeline.prepare_features(feature_types=feature_cases[3])

Model Persistence

Save and load trained models:

# Save complete pipeline
pipeline.save_pipeline('output_dir/')

# Load saved pipeline
new_pipeline = PepTCRNetPipeline()
new_pipeline.load_pipeline('output_dir/')

📈 Performance

PepTCRNet achieves state-of-the-art performance on TCR-peptide binding prediction:

  • Accuracy: Up to 95% on benchmark datasets
  • AUC-ROC: 0.90 for multi-class classification
  • Uncertainty Calibration: Well-calibrated confidence scores

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

# Fork the repository
# Create your feature branch
git checkout -b feature/amazing-feature

# Commit your changes
git commit -m 'Add amazing feature'

# Push to the branch
git push origin feature/amazing-feature

# Open a Pull Request

📝 Citation

If you use PepTCRNet in your research, please cite:

@article{le2025peptcrnet,
  title={PepTCR-Net: prediction of multi-class antigen peptides by T-cell receptor sequences with deep learning},
  author={Le, Phi and Ung, Leah and Yang, Hai and Huang, Anwen and He, Tao and Bruno, Peter and Oh, David Y and Keenan, Bridget P and Zhang, Li},
  journal={Briefings in Bioinformatics},
  volume={26},
  number={4},
  pages={bbaf351},
  year={2025},
  doi={10.1093/bib/bbaf351},
  url={https://doi.org/10.1093/bib/bbaf351}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📮 Contact

🗺️ Roadmap

  • Support for TCRα chains
  • Integration with single-cell RNA-seq data
  • Web interface for predictions
  • Pre-trained models for common peptides
  • GPU optimization for large-scale predictions
  • Docker containerization

Made with ❤️ by the PepTCRNet Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peptcrnet-1.0.1.tar.gz (773.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peptcrnet-1.0.1-py3-none-any.whl (35.9 kB view details)

Uploaded Python 3

File details

Details for the file peptcrnet-1.0.1.tar.gz.

File metadata

  • Download URL: peptcrnet-1.0.1.tar.gz
  • Upload date:
  • Size: 773.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for peptcrnet-1.0.1.tar.gz
Algorithm Hash digest
SHA256 499e413de5bd7dd8e917a753bfda9715951fb505134d8873820262fd21b1f963
MD5 07a50f2b0eaf626a697cbc9770d0ed9d
BLAKE2b-256 523ef2d3fbb521ec8bc665d39689fbd635372b59603fd3f640a7f8a760bbdefa

See more details on using hashes here.

File details

Details for the file peptcrnet-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: peptcrnet-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 35.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for peptcrnet-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 696571b42ab084d04d39968e7dd0a29fa17623bbb1cabe4de9f2fec69c82cdcf
MD5 9a45a19a8b47d28a6392c685e14ab1a2
BLAKE2b-256 24a808b93db6b18725d9be8ba20d6788edee1acb1346c956beeb7e3e33f94f68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page