Gene lists for cancer immunotherapy expression analysis
Project description
pirlygenes
Gene lists related to cancer immunotherapy
TCR-T
Clinical trials
Last updated: September 17th, 2024
Sources:
CAR-T
Approved therapies
Last updated: September 17th, 2024
Sources:
Multi-specific antibodies and T-cell engagers
Clinical trials
Last updated: September 11th, 2024
Sources:
Antibody-drug conjugates (ADCs)
Approved
Last updated: September 19th, 2024
Sources:
Clinical trials
Last updated: September 11th, 2024
Sources:
- Pan-cancer analysis of antibody-drug conjugate targets and putative predictors of treatment response
Radioligand therapies (RLTs)
Current target list
Last updated: February 11th, 2026
Sources:
- Radioligand therapy in precision oncology: opportunities and challenges
- FDA approves Pluvicto for metastatic castration-resistant prostate cancer
- FDA approves Lutetium Lu 177 dotatate for gastroenteropancreatic neuroendocrine tumors
- Emerging molecular targets and agents in radioligand therapy for solid tumors
- Early evidence for anti-CD20 targeted alpha-radiation approaches in B-cell malignancies
Methodology:
pirlygenes/data/radioligand-targets.csvis a curated target-level list (gene targets, Ensembl IDs, and target status buckets) intended to power gene-set visualization while trial-levelv1.4.0curation is in progress.
CLI plotting notes:
- Treatment plots now include a
Radiocategory label (capitalized consistently with other treatment labels). - Use
--label-genesto force annotation of genes that should always be text-labeled, for example:--label-genes FAP,CD276. - PNG output defaults are larger/higher resolution (
--plot-height 12.0,--plot-aspect 1.4,--output-dpi 300), and can be overridden from CLI.
Cancer-testis antigens (CTAs)
Last updated: March 23rd, 2026
Quick start
from pirlygenes.gene_sets_cancer import (
CTA_gene_names, # recommended: filtered, reproductive-restricted CTAs
CTA_gene_ids, # same, as Ensembl gene IDs
CTA_unfiltered_gene_names, # full superset from all source databases
CTA_unfiltered_gene_ids, # same, as Ensembl gene IDs
CTA_evidence, # full DataFrame with all evidence columns
)
# Default: expressed, reproductive-restricted CTAs (~257 genes)
cta_genes = CTA_gene_names()
# Full unfiltered superset from all sources (~358 genes)
all_ctas = CTA_unfiltered_gene_names()
# Partition ALL protein-coding genes into CTA / never-expressed / non-CTA
from pirlygenes.gene_sets_cancer import CTA_partition_gene_ids
p = CTA_partition_gene_ids() # p.cta, p.cta_never_expressed, p.non_cta
# Evidence table with per-gene HPA tissue restriction data
df = CTA_evidence()
Pipeline overview
The CTA gene set is built as an unbiased union of genes from multiple CT antigen databases and literature sources, then systematically filtered using Human Protein Atlas tissue expression data.
Step 1: Collect — union of protein-coding CT genes from multiple source databases (358 genes):
| Source | Genes | Reference |
|---|---|---|
| CTpedia | 167 | Almeida et al. 2009, NAR |
| CTexploreR/CTdata | 62 new | Loriot et al. 2025, PLOS Genetics |
| Protein-level CT genes (136 total, 46 overlap) | 89 new | da Silva et al. 2017, Oncotarget |
| EWSR1-FLI1 CT gene binding sites | 12 | Gallegos et al. 2019, Mol Cell Biol |
| Meiosis, piRNA, spermatogenesis genes | 28 | Multiple sources (see docs) |
Each gene is tracked with a source_databases column indicating which databases include it (CTpedia, CTexploreR_CT, CTexploreR_CTP, daSilva2017, daSilva2017_protein). Only protein-coding genes (Ensembl biotype) are included. Genes with outdated HGNC symbols are renamed to current symbols with old names kept as aliases.
Step 2: Annotate — each gene is scored against Human Protein Atlas v23 tissue expression:
- RNA: HPA RNA tissue consensus (
rna_tissue_consensus.tsv) — normalized transcripts per million (nTPM) across 50 normal tissues - Protein: HPA normal tissue IHC (
normal_tissue.tsv) — immunohistochemistry detection levels (Not detected / Low / Medium / High) across 63 tissues with antibody reliability scores (Enhanced / Supported / Approved / Uncertain)
Step 3: Filter — protein-coding + tiered thresholds based on protein antibody confidence (278 of 358 pass):
| Protein evidence | Deflated RNA threshold |
|---|---|
| Enhanced (orthogonal validation) | ≥ 80% |
| Supported (consistent characterization) | ≥ 90% |
| Approved (basic validation) | ≥ 95% |
| Uncertain or no protein data | ≥ 99% |
Genes with protein detected in non-reproductive tissues always fail. Thymus is excluded from all restriction checks (AIRE-driven mTEC expression is expected for CTAs).
Gene set counts
| Function | Description | Count |
|---|---|---|
CTA_gene_names() |
Recommended default. Expressed, reproductive-restricted CTAs | ~257 |
CTA_never_expressed_gene_names() |
CTAs from databases but no HPA expression (max nTPM < 2, no protein) | ~21 |
CTA_filtered_gene_names() |
All filter-passing CTAs (= expressed + never_expressed) | ~278 |
CTA_excluded_gene_names() |
CTAs that fail filter (somatic expression) | ~80 |
CTA_unfiltered_gene_names() |
Full superset from all source databases | 358 |
CTA_evidence() |
Full DataFrame with all evidence columns | 358 rows |
CTA_partition_gene_ids() |
Partition all protein-coding genes (dataclass with .cta, .cta_never_expressed, .non_cta sets) |
~20k |
CTA_partition_gene_names() |
Same, as gene symbols | ~20k |
CTA_partition_dataframes() |
Same, as DataFrames with evidence columns | ~20k |
Evidence columns
Each gene in cancer-testis-antigens.csv carries identity and HPA-derived evidence:
| Column | Description |
|---|---|
Ensembl_Gene_ID |
Ensembl gene ID (validated against release 112) |
source_databases |
Semicolon-separated list of source databases (CTpedia, CTexploreR_CT, CTexploreR_CTP, daSilva2017) |
biotype |
Ensembl gene biotype (must be protein_coding to pass filter) |
Canonical_Transcript_ID |
Longest protein-coding transcript (Ensembl 112) |
protein_reproductive |
IHC detected only in {testis, ovary, placenta} (excl. thymus), or "no data" |
protein_thymus |
IHC detected in thymus |
protein_reliability |
Best HPA antibody reliability: Enhanced / Supported / Approved / Uncertain / "no data" |
protein_strict_expression |
Semicolon-separated tissues with IHC detection (excl. thymus) |
rna_reproductive |
All tissues with ≥1 nTPM (excl. thymus) are in {testis, ovary, placenta} |
rna_thymus |
Thymus nTPM ≥ 1 |
rna_reproductive_frac |
Fraction of total nTPM (excl. thymus) in core reproductive tissues |
rna_deflated_reproductive_frac |
(1 + Σ_repro max(0, nTPM−1)) / (1 + Σ_all max(0, nTPM−1)) |
rna_deflated_reproductive_and_thymus_frac |
Same but thymus added to reproductive numerator |
rna_80/90/95/99_pct_filter |
Whether deflated reproductive fraction ≥ threshold |
filtered |
Final inclusion flag (see tiered thresholds above) |
For full details on the curation process, evidence columns, and filter logic, see docs/cta-curation.md.
Deflated RNA metric
The deflated metric max(0, nTPM − 1) per tissue suppresses low-level basal transcription noise before computing the reproductive fraction. A +1 pseudocount on numerator and denominator prevents 0/0 for very-low-expression genes. Example: CTCFL/BORIS has raw reproductive fraction 54% (diluted by sub-1 nTPM noise across ~40 tissues) but deflated fraction 100% (only testis has ≥1 nTPM).
Class I MHC antigen presentation
Last updated: July 21st, 2018
Sources:
- Frequent HLA class I alterations in human prostate cancer: molecular mechanisms and clinical relevance:
- LMP2/7
- peptide transporters TAP1/2
- chaperones calreticulin, calnexin, ERP57, and tapasin
- IFR-1 and NLRC5
- Expression of Antigen Processing and Presenting Molecules in Brain Metastasis of Breast Cancer
- "β2-microgloblin, transporter associated with antigen processing (TAP) 1, TAP2 and calnexin are down-regulated in brain lesions compared with unpaired breast lesions"
- NLRC5/MHC class I transactivator is a target for immune evasion in cancer
- TAPBPR: a new player in the MHC class I presentation pathway
Interferon-gamma response
Last updated: July 21st, 2018
Sources:
- Interferon Receptor Signaling Pathways Regulating PD-L1 and PD-L2 Expression
- "JAK1/JAK2-STAT1/STAT2/STAT3-IRF1 axis primarily regulates PD-L1 expression, with IRF1 binding to its promoter"
- "PD-L2 responded equally to interferon beta and gamma and is regulated through both IRF1 and STAT3, which bind to the PD-L2 promoter"
- "the suppressor of cytokine signaling protein family (SOCS; mostly SOCS1 and SOCS3) are involved in negative feedback regulation of cytokines that signal mainly through JAK2 binding, thereby modulating the activity of both STAT1 and STAT3"
- Mutations Associated with Acquired Resistance to PD-1 Blockade in Melanoma
- "resistance-associated loss-of-function mutations in the genes encoding interferon-receptor–associated Janus kinase 1 (JAK1) or Janus kinase 2 (JAK2), concurrent with deletion of the wild-type allele"
- SOCS, inflammation, and cancer
- "Abnormal expression of SOCS1 and SOCS3 in cancer cells has been reported in human carcinoma associated with dysregulation of signals from cytokine receptors"
Recurrently mutated cancer genes
Last updated: July 21st, 2018
Cancer genes and recurrent mutations extract from Comprehensive Characterization of Cancer Driver Genes and Mutations.
Genes extracted from Table S1 into cancer-driver-genes.csv. Mutations extracted from Table S4 into cancer-driver-variants.csv.
Both datasets were annotated with Ensembl IDs using Ensembl release 92.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pirlygenes-2.4.0.tar.gz.
File metadata
- Download URL: pirlygenes-2.4.0.tar.gz
- Upload date:
- Size: 673.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc06c41f36d870435db022811af2278d564ef9dbac03e6215686f35272c60d3f
|
|
| MD5 |
fe2259ae79bb24c08b27096ad7938658
|
|
| BLAKE2b-256 |
72cb85849617feaf50f329a0545583033d697e353ad48f34484f70c94a38a7fe
|
File details
Details for the file pirlygenes-2.4.0-py3-none-any.whl.
File metadata
- Download URL: pirlygenes-2.4.0-py3-none-any.whl
- Upload date:
- Size: 93.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
290038aa84bc995350b5fde38772695625c6e803dd732bb09dbcd41c67d4056b
|
|
| MD5 |
19b176e5adc0fb632b2ff81746aa2ef2
|
|
| BLAKE2b-256 |
3fb6830da154d45d5ac46ae4fec7615e8cad6fdc54a8977e3ad5d5b719b3e381
|