HuggingFace-based inference and evaluation library for TCR-pMHC sequence translation models
Project description
hf-tcr
HuggingFace-based inference and evaluation library for TCR-pMHC sequence translation models.
Installation
pip install hf-tcr
Or install from source:
git clone https://github.com/pirl-unc/hf-tcr.git
cd hf-tcr
pip install .
Quick Start
Loading Data
from hf_tcr import TCRpMHCdataset
# Create dataset for pMHC -> TCR translation
dataset = TCRpMHCdataset(
source="pmhc",
target="tcr",
use_pseudo=True,
use_cdr3=True
)
# Load from CSV file
dataset.load_data_from_file("path/to/data.csv")
Running Inference
from hf_tcr import HuggingFaceModelAdapter, TCRBartTokenizer
from transformers import BartForConditionalGeneration
# Load your trained model and tokenizer
tokenizer = TCRBartTokenizer()
model = BartForConditionalGeneration.from_pretrained("path/to/model")
# Create adapter
adapter = HuggingFaceModelAdapter(
hf_tokenizer=tokenizer,
hf_model=model,
device="cuda"
)
# Get a source from your dataset
source = dataset[0][0] # Get source from first example
# Generate translations
translations = adapter.sample_translations(
source=source,
n=10,
max_len=25,
mode="top_k",
top_k=50,
temperature=1.0
)
Evaluating Models
from hf_tcr import ModelEvaluator
# Create evaluator (extends HuggingFaceModelAdapter)
evaluator = ModelEvaluator(
hf_tokenizer=tokenizer,
hf_model=model,
device="cuda"
)
# Compute dataset-level metrics
metrics = evaluator.dataset_metrics_at_k(
dataset=dataset,
k=100,
max_len=25,
mode="top_k",
top_k=50
)
print(f"BLEU: {metrics['char-bleu']:.4f}")
print(f"Precision@100: {metrics['precision']:.4f}")
print(f"Recall@100: {metrics['recall']:.4f}")
print(f"F1@100: {metrics['f1']:.4f}")
print(f"Mean Edit Distance: {metrics['d_edit']:.2f}")
print(f"Sequence Recovery: {metrics['seq_recovery']:.4f}")
print(f"Diversity: {metrics['diversity']:.4f}")
print(f"Perplexity: {metrics['perplexity']:.2f}")
Available Decoding Strategies
The adapter supports multiple decoding strategies:
greedy: Deterministic greedy decodingancestral: Multinomial samplingtop_k: Top-k sampling with temperaturetop_p: Nucleus (top-p) samplingbeam: Deterministic beam searchstochastic_beam: Stochastic beam searchdiverse_beam: Diverse beam searchcontrastive: Contrastive decodingtypical: Typical sampling
Metrics
The ModelEvaluator provides the following metrics:
- Char-BLEU: Character-level BLEU score
- Precision@K: Fraction of generated sequences that match references
- Recall@K: Fraction of reference sequences recovered
- F1@K: Harmonic mean of precision and recall
- Mean Edit Distance: Average Levenshtein distance to closest reference
- Sequence Recovery: Position-wise match percentage
- Diversity: Ratio of unique to total generated sequences
- Perplexity: Model perplexity on the dataset
Data Format
CSV files should contain the following columns:
Required:
CDR3b: CDR3 beta sequenceTRBV: TRBV gene (IMGT format)TRBJ: TRBJ gene (IMGT format)Epitope: Peptide sequenceAllele: HLA alleleReference: Data source reference
Optional:
CDR3a,TRAV,TRAJ,TRAD,TRBDTRA_stitched,TRB_stitchedPseudo,MHC
Dependencies
- torch >= 2.0.0
- transformers >= 4.30.0
- numpy, pandas, tqdm
- python-Levenshtein
- nltk
- einops
- tidytcells >= 2.0.0
- mhcgnomes >= 1.8.0
- tcrpmhcdataset >= 0.2.0
License
Apache-2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hf_tcr-0.2.1.tar.gz.
File metadata
- Download URL: hf_tcr-0.2.1.tar.gz
- Upload date:
- Size: 167.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0baec15af214a451ee261f49a6e9bf637fc1ba206e7edc0f76c20106c60f6ebd
|
|
| MD5 |
5039486b309aa1723c2f3c5f113d72e2
|
|
| BLAKE2b-256 |
bfa839edb094f633cee723c852c7441588beec46ef54f0c8136edda5e096e17c
|
File details
Details for the file hf_tcr-0.2.1-py3-none-any.whl.
File metadata
- Download URL: hf_tcr-0.2.1-py3-none-any.whl
- Upload date:
- Size: 158.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65cfd4b2fa57bf7e0b45096a13f017b2ce6de50145d5a1fae546342c1c20a504
|
|
| MD5 |
aa92c33f223983df40298f057d2a0d53
|
|
| BLAKE2b-256 |
d129ad6a7439af0392c001a8689021496291625227afa55f578576ac40d0d000
|