Predict cancer epitopes from cancer sequence data

These details have been verified by PyPI

Maintainers

hammerlab iskander openvax tavinathanson timodonnell

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

Topiary

Topiary predicts which peptides from protein sequences will be presented by MHC molecules, making them potential T-cell epitopes. It is used in cancer immunotherapy research to find mutant peptides (neoantigens) that the immune system could target.

Core idea: Given protein sequences + HLA alleles + one or more MHC prediction models, Topiary scans all possible peptides and returns those predicted to be presented by MHC, ranked by any combination of binding affinity, presentation score, processing score, and stability.

Installation

pip install topiary

For Ensembl-based features (variant annotation, gene lookups), download reference data:

pyensembl install --release 93 --species human

Tab completion: activate-global-python-argcomplete

Quick start

Scan a FASTA for MHC binders

topiary \
  --fasta proteins.fasta \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01,HLA-B*07:02 \
  --ic50-cutoff 500 \
  --output-csv results.csv

Score specific peptides

topiary \
  --peptide-csv peptides.csv \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01 \
  --ranking "ba <= 500 & el.score >= 0.5" \
  --output-csv results.csv

Find neoantigen candidates from somatic variants

topiary \
  --vcf somatic.vcf \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01,HLA-B*07:02 \
  --ranking "ba <= 500 | el.rank <= 2" \
  --rank-by "0.6 * affinity.descending_cdf(500, 200) + 0.4 * presentation.score" \
  --only-novel-epitopes \
  --output-csv epitopes.csv

Multi-model scoring with composite ranking

topiary \
  --fasta antigens.fasta \
  --mhc-predictor netmhcpan mhcflurry \
  --mhc-alleles HLA-A*02:01 \
  --ranking "ba <= 500" \
  --rank-by "mean(affinity['netmhcpan'].logistic(350, 150), affinity['mhcflurry'].logistic(350, 150))" \
  --output-csv results.csv

Python API

from topiary import TopiaryPredictor, Affinity, Presentation, wt, mean
from mhctools import NetMHCpan, MHCflurry

predictor = TopiaryPredictor(
    models=[NetMHCpan, MHCflurry],
    alleles=["HLA-A*02:01", "HLA-B*07:02"],
    filter_by=(Affinity <= 500) | (Presentation.rank <= 2.0),
    rank_by=mean(
        Affinity["netmhcpan"].logistic(350, 150),
        Affinity["mhcflurry"].logistic(350, 150),
    ),
)

# Scan protein sequences (sliding window)
df = predictor.predict_from_named_sequences({
    "BRAF_V600E": "MAALSGGGGG...LATEKSRWSG",
    "TP53_R248W": "MEEPQSDPSV...ALPQHAHAQM",
})

# Score specific peptides (no sliding window)
df = predictor.predict_from_named_peptides({
    "peptide_1": "YLQLVFGIEV",
    "peptide_2": "LLFNILGGWV",
})

# From somatic variants (requires varcode)
from varcode import load_vcf
variants = load_vcf("somatic.vcf")
df = predictor.predict_from_variants(variants)

Inputs

Sequences and peptides

Flag	Format	Behavior
`--fasta FILE`	FASTA with full-length proteins	Sliding-window scan
`--peptide-fasta FILE`	FASTA where each entry is one peptide	Scored as-is
`--sequence-csv FILE`	CSV with `sequence` column (+ optional `name`)	Sliding-window scan
`--peptide-csv FILE`	CSV with `peptide` column (+ optional `name`)	Scored as-is

Genomic variants

Flag	Description
`--vcf FILE`	VCF file of somatic variants
`--maf FILE`	TCGA MAF file
`--variant CHR POS REF ALT`	Individual variant
`--protein-change GENE CHANGE`	Direct protein change, e.g. `--protein-change EGFR T790M`

Gene and transcript lookups

Pull protein sequences from Ensembl automatically:

Flag	Example
`--gene-names NAME [...]`	`--gene-names BRAF TP53 EGFR`
`--gene-ids ID [...]`	`--gene-ids ENSG00000157764`
`--transcript-ids ID [...]`	`--transcript-ids ENST00000288602`
`--ensembl-proteome`	Scan the entire Ensembl proteome
`--cta`	Cancer-testis antigen genes (requires `pirlygenes`)
`--ensembl-release N`	Use a specific Ensembl release

Multiple input flags can be combined in a single run.

Restrict to regions

Limit prediction to specific protein regions (only applies to sequence inputs):

--regions spike:319-541 nucleocapsid:0-50

RNA expression filtering

For variant workflows, filter by gene or transcript expression:

--rna-gene-fpkm-tracking-file genes.fpkm_tracking
--rna-min-gene-expression 4.0
--rna-transcript-fpkm-tracking-file isoforms.fpkm_tracking
--rna-min-transcript-expression 1.5

Also supports StringTie GTF: --rna-transcript-fpkm-gtf-file.

MHC prediction models

Specify one or more predictors and alleles:

--mhc-predictor netmhcpan mhcflurry \
--mhc-alleles HLA-A*02:01,HLA-B*07:02

All predictors come from mhctools. Multiple models can be used together — the expression DSL handles disambiguation.

CLI name	Class	Predicts
`netmhcpan`	NetMHCpan (auto-detects version)	affinity + presentation
`netmhcpan4` / `netmhcpan41`	NetMHCpan4 / 41	affinity + presentation
`netmhcpan4-ba` / `netmhcpan4-el`	NetMHCpan4_BA / _EL	single mode
`netmhcpan42` / `netmhcpan42-ba` / `netmhcpan42-el`	NetMHCpan42 variants	NetMHCpan 4.2
`mhcflurry`	MHCflurry	affinity + presentation + processing
`mixmhcpred`	MixMHCpred	presentation
`netmhciipan` / `netmhciipan4`	NetMHCIIpan variants	MHC-II affinity + presentation
`netmhciipan43` / `netmhciipan43-ba` / `netmhciipan43-el`	NetMHCIIpan43 variants	NetMHCIIpan 4.3
`bigmhc` / `bigmhc-el` / `bigmhc-im`	BigMHC variants	presentation / immunogenicity
`netmhcstabpan`	NetMHCstabpan	pMHC stability
`pepsickle` / `netchop`	Pepsickle / NetChop	proteasomal cleavage
`netmhcpan-iedb` / `netmhccons-iedb` / `smm-iedb`	IEDB web API	no local install needed
`random`	RandomBindingPredictor	random (for testing)

Peptide lengths: --mhc-epitope-lengths 8,9,10,11 (defaults come from the predictor).

Filtering and ranking

Topiary has an expression language for filtering and ranking predictions. It works identically in the Python API and as CLI strings.

Prediction kinds and fields

Accessor	Aliases	What it measures
`Affinity`	`ba`, `aff`, `ic50`	Binding affinity (IC50 nM)
`Presentation`	`el`	Presentation / eluted ligand score
`Stability`		pMHC complex stability
`Processing`		Antigen processing / cleavage

Each has three fields: .value (raw), .rank (percentile, lower = better), .score (normalized, higher = better). Default is .value, so Affinity <= 500 means Affinity.value <= 500.

Filters

# Python                                    # CLI string
Affinity <= 500                             # "affinity <= 500" or "ba <= 500"
Affinity.rank <= 2.0                        # "affinity.rank <= 2"
Presentation.score >= 0.5                   # "el.score >= 0.5"
(Affinity <= 500) | (Presentation.rank <= 2)  # "affinity <= 500 | el.rank <= 2"
(Affinity <= 500) & (Presentation.rank <= 2)  # "affinity <= 500 & el.rank <= 2"

CLI flags:

--ranking "affinity <= 500 | el.rank <= 2"
--ic50-cutoff 500          # shorthand for affinity <= 500
--percentile-cutoff 2.0    # shorthand for affinity.rank <= 2

Ranking and transforms

Sort surviving peptides with --rank-by. Supports arithmetic, transforms, and aggregations:

# Simple: sort by presentation score, fall back to affinity
--rank-by pMHC_presentation,pMHC_affinity

# Composite score with normalization
--rank-by "0.6 * affinity.descending_cdf(500, 200) + 0.4 * presentation.score"

# Average across models
--rank-by "mean(affinity['netmhcpan'].logistic(350, 150), affinity['mhcflurry'].logistic(350, 150))"

# Chain transforms
--rank-by "affinity.value.clip(1, 50000).log()"

Available transforms:

Transform	What it does
`.descending_cdf(mean, std)`	Lower input → higher output (for IC50, rank)
`.ascending_cdf(mean, std)`	Higher input → higher output (for scores)
`.logistic(midpoint, width)`	Sigmoid normalization
`.clip(lo, hi)`	Clamp to range
`.hinge()`	`max(0, x)`
`.log()` / `.log2()` / `.log10()` / `.log1p()`	Logarithms
`.sqrt()` / `.exp()`	Square root, exponential
`abs(...)`	Absolute value

Aggregations: mean(), geomean(), minimum(), maximum(), median()

Multi-model disambiguation

When multiple models produce the same kind, qualify with brackets (Python) or underscores (CLI):

Affinity["netmhcpan"] <= 500               # "netmhcpan_ba <= 500"
Affinity["mhcflurry"].score                # "mhcflurry_affinity.score"

Scope prefixes: wildtype and alternate peptide contexts

The wt. prefix reads predictions for the wildtype peptide at the same position. Use it for differential binding:

# Python                                    # CLI string
wt.Affinity.score                           # wt.affinity.score
Affinity.score - wt.Affinity.score          # affinity.score - wt.affinity.score

shuffled. and self. prefixes work the same way for shuffled decoy and self-proteome contexts. All return NaN when the corresponding columns don't exist.

Peptide-level expressions

len reads the peptide length. count('X') counts amino acid occurrences. Both work with scope prefixes:

--rank-by "len"                             # peptide length
--rank-by "count('C')"                      # cysteine count
--rank-by "count('C') - wt.count('C')"      # gained/lost cysteines vs wildtype

Column references

column(name) brings any DataFrame column into expressions — peptide properties, read counts, custom annotations:

--ranking "column(cysteine_count) <= 2"
--rank-by "0.5 * affinity.logistic(350, 150) - 0.2 * column(cysteine_count)"

Missing columns get a helpful error: Column 'hydrophobicty' not found. Did you mean: ['hydrophobicity']?

Quick reference

Python DSL	CLI string
`Affinity <= 500`	`affinity <= 500` / `ba <= 500` / `ic50 <= 500`
`Affinity.rank <= 2`	`affinity.rank <= 2`
`Affinity["netmhcpan"] <= 500`	`netmhcpan_ba <= 500`
`(A <= 500) \| (B.rank <= 2)`	`affinity <= 500 \| el.rank <= 2`
`0.5 * Affinity.score + 0.5 * Presentation.score`	`0.5 * affinity.score + 0.5 * presentation.score`
`Affinity.logistic(350, 150)`	`affinity.logistic(350, 150)`
`mean(Affinity.score, Presentation.score)`	`mean(affinity.score, presentation.score)`
`wt.Affinity.score`	`wt.affinity.score`
`Len()` / `Count("C")`	`len` / `count('C')`
`Column("cysteine_count")`	`column(cysteine_count)`

Exclusion filtering

Exclude peptides that appear in reference proteomes — useful for tumor-specific or pathogen-specific peptides:

--exclude-ensembl                    # Exclude peptides in the human Ensembl proteome
--exclude-non-cta                    # Exclude non-CTA proteins (requires pirlygenes)
--exclude-tissues heart_muscle lung  # Exclude genes expressed in these tissues
--exclude-fasta reference.fasta      # Exclude peptides in custom reference sequences
--exclude-mode substring             # "substring" (default) or "exact"

Output

--output-csv results.csv
--output-html results.html
--output-csv-sep "\t"
--subset-output-columns peptide allele affinity
--rename-output-column value ic50

All predictions: source_sequence_name, peptide, peptide_offset, peptide_length, allele, kind, score, value, affinity, percentile_rank, prediction_method_name

Variant predictions add: variant, gene, gene_id, transcript_id, transcript_name, effect, effect_type, contains_mutant_residues, mutation_start_in_peptide, mutation_end_in_peptide

Peptide properties

Compute amino acid properties and use them in ranking:

from topiary.properties import add_peptide_properties

df = predictor.predict_from_named_sequences(seqs)
df = add_peptide_properties(df, groups=["manufacturability"])

# Properties become ranking signals
score = Affinity.logistic(350, 150) - 0.1 * Column("cysteine_count")
# CLI: --rank-by "affinity.logistic(350, 150) - 0.1 * column(cysteine_count)"

Named groups:

"core" — charge, hydrophobicity, aromaticity, molecular_weight
"manufacturability" — core + cysteine_count, instability_index, max_7mer_hydrophobicity, difficult_nterm/cterm, asp_pro_bonds
"immunogenicity" — core + tcr_charge, tcr_aromaticity, tcr_hydrophobicity (TCR-facing positions for MHC-I)

Protein sources (Python API)

from topiary.sources import (
    ensembl_proteome,
    sequences_from_gene_names,
    cta_sequences,
    non_cta_sequences,
    tissue_expressed_sequences,
    available_tissues,
)

# All return dict[name -> amino_acid_sequence]
seqs = sequences_from_gene_names(["BRAF", "TP53", "EGFR"])
cta = cta_sequences()                          # cancer-testis antigens
heart = tissue_expressed_sequences(["heart_muscle"])
print(available_tissues())                      # list tissue names

Project details

These details have been verified by PyPI

Maintainers

hammerlab iskander openvax tavinathanson timodonnell

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

5.10.0

Apr 24, 2026

5.9.1

Apr 18, 2026

5.9.0

Apr 16, 2026

5.8.0

Apr 16, 2026

5.7.0

Apr 16, 2026

5.6.0

Apr 15, 2026

5.5.0

Apr 14, 2026

5.4.0

Apr 14, 2026

5.2.0

Apr 13, 2026

5.1.0

Apr 13, 2026

5.0.1

Apr 12, 2026

5.0.0

Apr 12, 2026

4.12.0

Apr 12, 2026

4.11.2

Apr 12, 2026

4.11.1

Apr 11, 2026

4.11.0

Apr 11, 2026

4.10.0

Apr 11, 2026

4.9.0

Apr 11, 2026

This version

4.8.0

Apr 10, 2026

4.7.0

Apr 10, 2026

4.6.0

Apr 9, 2026

4.5.1

Apr 9, 2026

4.5.0

Apr 9, 2026

4.4.0

Apr 9, 2026

4.3.2

Apr 9, 2026

4.3.1

Apr 9, 2026

4.3.0

Apr 9, 2026

4.2.1

Apr 8, 2026

4.2.0

Apr 8, 2026

4.1.0

Apr 7, 2026

4.0.1

Apr 7, 2026

4.0.0

Apr 7, 2026

3.0.6

Feb 26, 2018

3.0.2

Feb 21, 2018

3.0.0

Jan 19, 2018

2.2.0

Jan 12, 2018

2.1.0

Dec 21, 2017

2.0.0

Nov 8, 2017

1.2.1

May 4, 2017

1.2.0

Mar 28, 2017

0.1.2

Mar 1, 2017

0.1.1

Mar 1, 2017

0.1.0

Aug 6, 2016

0.0.21

Jun 16, 2016

0.0.20

Jun 10, 2016

0.0.19

May 27, 2016

0.0.18

May 26, 2016

0.0.17

May 13, 2016

0.0.16

May 10, 2016

0.0.15

Mar 15, 2016

0.0.14

Oct 27, 2015

0.0.13

Aug 28, 2015

0.0.10

Aug 25, 2015

0.0.9

Aug 24, 2015

0.0.8

Aug 24, 2015

0.0.7

May 4, 2015

0.0.6

May 2, 2015

0.0.5

May 1, 2015

0.0.4

May 1, 2015

0.0.3

May 1, 2015

0.0.2

May 1, 2015

0.0.1

May 1, 2015

0.0.0

Apr 30, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topiary-4.8.0.tar.gz (89.1 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

topiary-4.8.0-py3-none-any.whl (103.9 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file topiary-4.8.0.tar.gz.

File metadata

Download URL: topiary-4.8.0.tar.gz
Upload date: Apr 10, 2026
Size: 89.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for topiary-4.8.0.tar.gz
Algorithm	Hash digest
SHA256	`af424c5be80af951ef49bae21727a1b40b98baf61e4aacbf4e744860e07da421`
MD5	`79b9c6354afccd71da006e1bc3604d70`
BLAKE2b-256	`1bec20ae491552f5ceace9ce328bb8e92e1536bd1a98c469896fea1121fec539`

See more details on using hashes here.

File details

Details for the file topiary-4.8.0-py3-none-any.whl.

File metadata

Download URL: topiary-4.8.0-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 103.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for topiary-4.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f5ca2473943d40b2dfda51ccde098ecfedcc86f0411a67cbca3b70c84c5996aa`
MD5	`40a6ce1ca570011611c925d708cc0125`
BLAKE2b-256	`baa76577205faf64bb98644a85b1981faa93dbbda5ea605a4d91e3e7224bc85e`

See more details on using hashes here.

topiary 4.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Topiary

Installation

Quick start

Scan a FASTA for MHC binders

Score specific peptides

Find neoantigen candidates from somatic variants

Multi-model scoring with composite ranking

Python API

Inputs

Sequences and peptides

Genomic variants

Gene and transcript lookups

Restrict to regions

RNA expression filtering

MHC prediction models

Filtering and ranking

Prediction kinds and fields

Filters

Ranking and transforms

Multi-model disambiguation

Scope prefixes: wildtype and alternate peptide contexts

Peptide-level expressions

Column references

Quick reference

Exclusion filtering

Output

Peptide properties

Protein sources (Python API)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes