Skip to main content

A Deep Learning Framework for TCR-Peptide Recognition Prediction

Project description

PepTCRNet: Deep Learning for TCR-Peptide Recognition Prediction

PepTCRNet Pipeline

Python 3.8+ TensorFlow 2.13+ License: MIT

PepTCRNet is a state-of-the-art deep learning framework for predicting T-cell receptor (TCR) recognition of peptide antigens. It combines advanced neural network architectures with comprehensive feature engineering to achieve high-accuracy predictions with uncertainty quantification.

🌟 Key Features

  • Multi-modal Integration: Seamlessly combines sequence, categorical, and network-based features
  • Advanced Embeddings: Utilizes autoencoders, position encoding, and Atchley factors for sequence representation
  • Bayesian Neural Networks: Provides uncertainty quantification for predictions
  • Comprehensive Pipeline: End-to-end solution from data preprocessing to model deployment
  • Flexible Architecture: Modular design allows easy customization and extension
  • Class Imbalance Handling: Built-in support for imbalanced datasets
  • Rich Visualizations: Extensive plotting utilities for model interpretation

🚀 Quick Start

Install from PyPI:

pip install peptcrnet

For notebook demos, also install the optional notebook dependencies: pip install peptcrnet[notebooks].

Run the Complete Demo (Easiest!)

# One-click demo launcher
./run_demo.sh

This launches the complete Scenario 17 demo using all features!

Installation

From PyPI (recommended)

pip install peptcrnet

For notebooks and demos: pip install peptcrnet[notebooks]

Requirements: Python 3.8–3.12 (Python 3.13 is not supported due to dependency constraints).

From source (development)

git clone https://github.com/mlizhangx/Pep-TCRNet.git
cd Pep-TCR-Net
pip install -e ".[notebooks]"

# Run the demo
jupyter notebook DEMO_Complete_Pipeline.ipynb

Basic Usage

import peptcrnet
from peptcrnet import PepTCRNetPipeline

# Initialize pipeline
pipeline = PepTCRNetPipeline(data_path='your_data.csv')

# Load and prepare data
pipeline.load_data()
pipeline.split_data(test_size=0.2, val_size=0.1)

# Prepare features
pipeline.prepare_features(feature_types=['sequences', 'categorical'])

# Train model
history = pipeline.train(epochs=100, batch_size=128)

# Evaluate with uncertainty
results = pipeline.evaluate_with_uncertainty(n_samples=200)

# Make predictions
predictions = pipeline.predict(new_data)

📊 Data Format

PepTCRNet expects input data in CSV format with the following columns:

Column Description Example
CDR3 TCR CDR3β sequence CASSRGQGNEQFF
Peptide Peptide sequence or class label GILGFVFTL
V V gene segment TRBV7-2
J J gene segment TRBJ2-1
HLA-A HLA-A allele A*02:01
HLA-B HLA-B allele B*07:02
HLA-C HLA-C allele C*07:01

🧪 Demo Notebook

Try our interactive demo notebook to see PepTCRNet in action:

jupyter notebook demo_pipeline.ipynb

The demo includes: - Sample data generation - Step-by-step pipeline walkthrough - Model training and evaluation - Uncertainty quantification - Visualization examples

📚 Documentation

Pipeline Components

1. Data Loading and Preprocessing

from peptcrnet.data import DataLoader

loader = DataLoader('data.csv', atchley_path='atchley_factors.txt')
stats = loader.get_summary_stats()
splits = loader.split_data()

2. Feature Engineering

from peptcrnet.embeddings import SequenceEmbedder, CategoricalEmbedder

# Sequence embeddings
seq_embedder = SequenceEmbedder(atchley_factors, max_length=30)
tcr_embeddings = seq_embedder.encode_sequences(tcr_sequences)

# Categorical embeddings
cat_embedder = CategoricalEmbedder()
cat_embeddings = cat_embedder.encode_features(categorical_data)

3. Model Training

from peptcrnet.models import BayesianClassifier

model = BayesianClassifier(
    input_shapes={'sequences': (100,), 'categorical': (50,)},
    num_classes=5,
    hidden_dims=[512, 256, 64]
)

history = model.train(X_train, y_train, X_val, y_val)

4. Evaluation and Visualization

from peptcrnet.evaluation import ModelEvaluator
from peptcrnet.visualization import plot_confusion_matrix, plot_roc_curves

evaluator = ModelEvaluator()
metrics = evaluator.compute_metrics(y_true, y_pred, y_proba)

plot_confusion_matrix(y_true, y_pred)
plot_roc_curves(y_true, y_proba)

⚙️ Configuration

PepTCRNet uses a centralized configuration system:

from peptcrnet import config

# Access configuration
print(config.ModelParams.MAX_TCR_LENGTH)
print(config.TrainingParams.BATCH_SIZE)

# Save configuration
config.save_config('my_config.json')

# Load configuration
config.load_config('my_config.json')

🔬 Advanced Features

Uncertainty Quantification

PepTCRNet provides Bayesian uncertainty estimation:

# Multiple forward passes for uncertainty
predictions, uncertainty = pipeline.predict_with_uncertainty(
    test_data,
    n_samples=200
)

# Identify high-confidence predictions
high_confidence_mask = uncertainty < threshold

Custom Feature Combinations

Experiment with different feature combinations:

# Define feature cases
feature_cases = {
    1: ['TCR'],
    2: ['TCR', 'Peptide'],
    3: ['TCR', 'Peptide', 'HLA'],
    4: ['TCR', 'Peptide', 'HLA', 'VJ', 'Network']
}

# Train with specific features
pipeline.prepare_features(feature_types=feature_cases[3])

Model Persistence

Save and load trained models:

# Save complete pipeline
pipeline.save_pipeline('output_dir/')

# Load saved pipeline
new_pipeline = PepTCRNetPipeline()
new_pipeline.load_pipeline('output_dir/')

📈 Performance

PepTCRNet achieves state-of-the-art performance on TCR-peptide binding prediction:

  • Accuracy: Up to 95% on benchmark datasets
  • AUC-ROC: 0.90 for multi-class classification
  • Uncertainty Calibration: Well-calibrated confidence scores

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

# Fork the repository
# Create your feature branch
git checkout -b feature/amazing-feature

# Commit your changes
git commit -m 'Add amazing feature'

# Push to the branch
git push origin feature/amazing-feature

# Open a Pull Request

📝 Citation

If you use PepTCRNet in your research, please cite:

@article{le2025peptcrnet,
  title={PepTCR-Net: prediction of multi-class antigen peptides by T-cell receptor sequences with deep learning},
  author={Le, Phi and Ung, Leah and Yang, Hai and Huang, Anwen and He, Tao and Bruno, Peter and Oh, David Y and Keenan, Bridget P and Zhang, Li},
  journal={Briefings in Bioinformatics},
  volume={26},
  number={4},
  pages={bbaf351},
  year={2025},
  doi={10.1093/bib/bbaf351},
  url={https://doi.org/10.1093/bib/bbaf351}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📮 Contact

🗺️ Roadmap

  • Support for TCRα chains
  • Integration with single-cell RNA-seq data
  • Web interface for predictions
  • Pre-trained models for common peptides
  • GPU optimization for large-scale predictions
  • Docker containerization

Made with ❤️ by the PepTCRNet Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peptcrnet-1.0.2.tar.gz (773.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peptcrnet-1.0.2-py3-none-any.whl (35.9 kB view details)

Uploaded Python 3

File details

Details for the file peptcrnet-1.0.2.tar.gz.

File metadata

  • Download URL: peptcrnet-1.0.2.tar.gz
  • Upload date:
  • Size: 773.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for peptcrnet-1.0.2.tar.gz
Algorithm Hash digest
SHA256 c131e48761345423a30c4d6d6ba03ebdfd420f713c4235d36bc778a499897bed
MD5 c3d0c0b1de571a83bce105f7654eb370
BLAKE2b-256 1e52ee605f8ca144d8af9aa5a89277ac6bc542754f2afd5c901fbd309b27994a

See more details on using hashes here.

File details

Details for the file peptcrnet-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: peptcrnet-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 35.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for peptcrnet-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4c721425532233857768245b0c7194e03222674a0fdf5c8aa9d4466788840eb0
MD5 9d3da3b2e07081441bd095a629c2ae4d
BLAKE2b-256 25731f33c985e5f81c201b10b792eb3a14f97c178c38caf0d06ec71c3f9be755

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page