Integrated TCR-Gene-Antigen Prediction: dataset tooling and models for TCR-antigen recognition.
Project description
ITGAP — Integrated TCR-Gene-Antigen Prediction
itgap is a Python package for building TCR-peptide datasets from the 10x
Genomics CD8+ T-cell multi-omics benchmark and training TCR-antigen
recognition models that integrate gene expression (GEX), V/J gene usage, and
CDR3 sequence information.
It bundles:
NegativeSamplingTool— two-stage synthetic negative TCR-peptide sampling.- Sequence encoding utilities (Atchley factors + positional encoding) with the Atchley table shipped as a package resource.
- Autoencoder + encoder–decoder integration models for combining sequence, V/J, and GEX modalities.
- Residual-MLP binary classifiers, sklearn baselines (logistic regression, random forest), and standard evaluation/plotting helpers.
Install
Core install (small footprint, only numpy, pandas, scikit-learn,
matplotlib):
pip install itgap
Optional extras:
pip install 'itgap[gex]' # adds scanpy + anndata for h5ad loading
pip install 'itgap[tf]' # adds tensorflow (or tensorflow-macos on Apple Silicon)
pip install 'itgap[all]' # everything
itgap[tf] resolves to tensorflow-macos>=2.9 on macOS arm64 and to
tensorflow>=2.9 elsewhere.
Data
The package ships only the small atchley.txt reference table. The large
benchmark file merge_gex_all_donors_all_peptides_meta.h5ad (~200 MB) is
not included; download it from the
10x Genomics CD8+ T-cell multi-omics dataset
and pass its path to NegativeSamplingTool(data_dir=...) or to
load_dataset(h5ad_path=...). The pre-computed CSV embeddings used in the
example notebooks live in the project repository under examples/data/.
Quickstart
Generate a negative-sampled training set:
from itgap import NegativeSamplingTool
tool = NegativeSamplingTool(
data_dir="path/to/10x", # contains merge_gex_all_donors_all_peptides_meta.h5ad
negative_ratio=3.0,
random_seed=42,
)
result = tool.create_combined_dataset(negative_ratio=3.0)
print(result["dataset"].shape, result["statistics"])
Train a residual-MLP classifier on assembled features (requires itgap[tf]):
from itgap import (
load_atchley, build_residual_mlp, compile_and_train, evaluate_classifier,
)
word_vectors, aa_idx = load_atchley() # uses the packaged atchley.txt
model = build_residual_mlp(input_dim=X_train.shape[1])
history = compile_and_train(model, X_train, y_train, X_val, y_val, epochs=50)
metrics = evaluate_classifier(model, X_test, y_test)
Command-line
Installing the package exposes a console script:
itgap-negative-sampling # runs NegativeSamplingTool with default settings
Examples
End-to-end Jupyter notebooks live in
examples/ of the repository:
data_preparation_notebook.ipynb— build the labeled dataset.tcr_beta_prediction_notebook.ipynb— beta-chain only model.tcr_alpha_beta_prediction_notebook.ipynb— alpha + beta + GEX + VJ.
Development
pip install -e '.[dev,all]'
pytest
python -m build
License
MIT. See LICENSE.
Citation
If you use ITGAP in a publication, please cite the project repository: https://github.com/mlizhangx/ITGAP.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file itgap-0.1.0.tar.gz.
File metadata
- Download URL: itgap-0.1.0.tar.gz
- Upload date:
- Size: 626.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f50a87349811d483c45ccc3fc3bddb29f9d7ef182636ead6ac3cf9d85a1d4d88
|
|
| MD5 |
c93e1cac2faf013f59542aa705321799
|
|
| BLAKE2b-256 |
d9ba34d25b86e4339681b97f4bc2f84bf4350c5cc80f174787e86a10a66ea190
|
File details
Details for the file itgap-0.1.0-py3-none-any.whl.
File metadata
- Download URL: itgap-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5ba680a58cc948849ef371255f84e1d6470f2c9537532967d118c8450c07ffc
|
|
| MD5 |
0e52ef466853e1b56372981bb332cd5d
|
|
| BLAKE2b-256 |
76881220238363a18c2d8c451d3e0b40f1ffd9c4e140383233b729fea257bf1a
|