A Deep Learning Framework for TCR-Peptide Recognition Prediction
Project description
PepTCRNet: Deep Learning for TCR-Peptide Recognition Prediction
PepTCRNet is a state-of-the-art deep learning framework for predicting T-cell receptor (TCR) recognition of peptide antigens. It combines advanced neural network architectures with comprehensive feature engineering to achieve high-accuracy predictions with uncertainty quantification.
🌟 Key Features
- Multi-modal Integration: Seamlessly combines sequence, categorical, and network-based features
- Advanced Embeddings: Utilizes autoencoders, position encoding, and Atchley factors for sequence representation
- Bayesian Neural Networks: Provides uncertainty quantification for predictions
- Comprehensive Pipeline: End-to-end solution from data preprocessing to model deployment
- Flexible Architecture: Modular design allows easy customization and extension
- Class Imbalance Handling: Built-in support for imbalanced datasets
- Rich Visualizations: Extensive plotting utilities for model interpretation
🚀 Quick Start
Run the Complete Demo (Easiest!)
# One-click demo launcher
./run_demo.sh
This launches the complete Scenario 17 demo using all features!
Installation
From Source (Current Setup)
cd /Users/lung/Documents/Projects/peptcrnet/PepTCR-Net
# Install in development mode
pip install -e .
# Run the demo
conda activate tfBNN
jupyter notebook DEMO_Complete_Pipeline.ipynb
Future: From PyPI (After Publishing)
pip install peptcrnet
Basic Usage
import peptcrnet
from peptcrnet import PepTCRNetPipeline
# Initialize pipeline
pipeline = PepTCRNetPipeline(data_path='your_data.csv')
# Load and prepare data
pipeline.load_data()
pipeline.split_data(test_size=0.2, val_size=0.1)
# Prepare features
pipeline.prepare_features(feature_types=['sequences', 'categorical'])
# Train model
history = pipeline.train(epochs=100, batch_size=128)
# Evaluate with uncertainty
results = pipeline.evaluate_with_uncertainty(n_samples=200)
# Make predictions
predictions = pipeline.predict(new_data)
📊 Data Format
PepTCRNet expects input data in CSV format with the following columns:
| Column | Description | Example |
|---|---|---|
CDR3 |
TCR CDR3β sequence | CASSRGQGNEQFF |
Peptide |
Peptide sequence or class label | GILGFVFTL |
V |
V gene segment | TRBV7-2 |
J |
J gene segment | TRBJ2-1 |
HLA-A |
HLA-A allele | A*02:01 |
HLA-B |
HLA-B allele | B*07:02 |
HLA-C |
HLA-C allele | C*07:01 |
🧪 Demo Notebook
Try our interactive demo notebook to see PepTCRNet in action:
jupyter notebook demo_pipeline.ipynb
The demo includes:
- Sample data generation
- Step-by-step pipeline walkthrough
- Model training and evaluation
- Uncertainty quantification
- Visualization examples
📚 Documentation
Pipeline Components
1. Data Loading and Preprocessing
from peptcrnet.data import DataLoader
loader = DataLoader('data.csv', atchley_path='atchley_factors.txt')
stats = loader.get_summary_stats()
splits = loader.split_data()
2. Feature Engineering
from peptcrnet.embeddings import SequenceEmbedder, CategoricalEmbedder
# Sequence embeddings
seq_embedder = SequenceEmbedder(atchley_factors, max_length=30)
tcr_embeddings = seq_embedder.encode_sequences(tcr_sequences)
# Categorical embeddings
cat_embedder = CategoricalEmbedder()
cat_embeddings = cat_embedder.encode_features(categorical_data)
3. Model Training
from peptcrnet.models import BayesianClassifier
model = BayesianClassifier(
input_shapes={'sequences': (100,), 'categorical': (50,)},
num_classes=5,
hidden_dims=[512, 256, 64]
)
history = model.train(X_train, y_train, X_val, y_val)
4. Evaluation and Visualization
from peptcrnet.evaluation import ModelEvaluator
from peptcrnet.visualization import plot_confusion_matrix, plot_roc_curves
evaluator = ModelEvaluator()
metrics = evaluator.compute_metrics(y_true, y_pred, y_proba)
plot_confusion_matrix(y_true, y_pred)
plot_roc_curves(y_true, y_proba)
⚙️ Configuration
PepTCRNet uses a centralized configuration system:
from peptcrnet import config
# Access configuration
print(config.ModelParams.MAX_TCR_LENGTH)
print(config.TrainingParams.BATCH_SIZE)
# Save configuration
config.save_config('my_config.json')
# Load configuration
config.load_config('my_config.json')
🔬 Advanced Features
Uncertainty Quantification
PepTCRNet provides Bayesian uncertainty estimation:
# Multiple forward passes for uncertainty
predictions, uncertainty = pipeline.predict_with_uncertainty(
test_data,
n_samples=200
)
# Identify high-confidence predictions
high_confidence_mask = uncertainty < threshold
Custom Feature Combinations
Experiment with different feature combinations:
# Define feature cases
feature_cases = {
1: ['TCR'],
2: ['TCR', 'Peptide'],
3: ['TCR', 'Peptide', 'HLA'],
4: ['TCR', 'Peptide', 'HLA', 'VJ', 'Network']
}
# Train with specific features
pipeline.prepare_features(feature_types=feature_cases[3])
Model Persistence
Save and load trained models:
# Save complete pipeline
pipeline.save_pipeline('output_dir/')
# Load saved pipeline
new_pipeline = PepTCRNetPipeline()
new_pipeline.load_pipeline('output_dir/')
📈 Performance
PepTCRNet achieves state-of-the-art performance on TCR-peptide binding prediction:
- Accuracy: Up to 95% on benchmark datasets
- AUC-ROC: >0.90 for multi-class classification
- Uncertainty Calibration: Well-calibrated confidence scores
🤝 Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
# Fork the repository
# Create your feature branch
git checkout -b feature/amazing-feature
# Commit your changes
git commit -m 'Add amazing feature'
# Push to the branch
git push origin feature/amazing-feature
# Open a Pull Request
📝 Citation
If you use PepTCRNet in your research, please cite:
@article{peptcrnet2024,
title={PepTCRNet: A Deep Learning Framework for TCR-Peptide Recognition Prediction},
author={Your Name et al.},
journal={Journal Name},
year={2024}
}
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Thanks to all contributors who have helped shape PepTCRNet
- Inspired by advances in deep learning for immunology
- Built with TensorFlow and the Python scientific computing ecosystem
📮 Contact
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: peptcrnet@example.com
🗺️ Roadmap
- Support for TCRα chains
- Integration with single-cell RNA-seq data
- Web interface for predictions
- Pre-trained models for common peptides
- GPU optimization for large-scale predictions
- Docker containerization
Made with ❤️ by the PepTCRNet Team
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peptcrnet-1.0.0.tar.gz.
File metadata
- Download URL: peptcrnet-1.0.0.tar.gz
- Upload date:
- Size: 773.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
233563429dd4b2ecc1de86ac1001c4b380f7e1370ad4ec16918def83e4e5c595
|
|
| MD5 |
2ebfdba53e26ba4c8550b3ed4c841ecd
|
|
| BLAKE2b-256 |
c1914ecdc3e1383abbd8874606f7366a5f46fbb6b5d5985fa79f2153eca8eba6
|
File details
Details for the file peptcrnet-1.0.0-py3-none-any.whl.
File metadata
- Download URL: peptcrnet-1.0.0-py3-none-any.whl
- Upload date:
- Size: 35.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5452306a2f8789a9b22f94ed4e331824d11739b423780f0a54c257f07d8b664
|
|
| MD5 |
0f025dc4ad2e0f374f9c2c4243035de5
|
|
| BLAKE2b-256 |
49068f76a3efbbe3dd2c0bd85a911f4848b745b33b5f6e46dc64b2a53bf360c8
|