Curated cancer gene sets and reference expression data. Analysis lives in `trufflepig`.
Project description
pirlygenes
Curated cancer gene-knowledge data.
Analysis, plotting, the analyze CLI, and all expression matrices
moved to trufflepig in
v5.0. This package now ships gene-knowledge data only:
- curated gene-set CSVs (therapy targets, CTAs, cancer-driver genes, housekeeping genes, surface proteins, immune/stromal marker panels, lineage and matched-normal panels, fusion/mutation expression-effect rules, narrative gene sets, …)
- the cancer-type registry and gene-symbol/Ensembl-ID resolvers
- cohort-baseline constants (e.g.
TCGA_MEDIAN_PURITY)
Expression matrices (pan-cancer TCGA reference, subtype-deconvolved
non-TCGA cohorts, TCGA deconvolution, HPA cell-type expression,
tumor-up-vs-matched-normal panels, ESTIMATE signatures) ship with
trufflepig — use trufflepig.reference.<accessor>() to read them.
Install
pip install pirlygenes
Run analyses with trufflepig
(distributed on PyPI as pirl-trufflepig — the bare trufflepig
name is owned by an unrelated package; the command + Python import
are both still trufflepig):
pip install pirl-trufflepig
trufflepig run --sample expr.tsv --workspace out --cancer-type PRAD
Python API
from pirlygenes.gene_sets_cancer import (
CTA_gene_names, # ~257 cancer-testis antigens
surface_protein_gene_names, # 2,799 surfaceome genes
cancer_surfaceome_gene_names, # 147 tumor-specific surface targets
therapy_target_gene_names, # by modality: "ADC", "CAR-T", "TCR-T", "bispecific", ...
cancer_type_registry, # cancer-type registry DataFrame
lineage_genes_by_cancer_type, # lineage panels
cancer_family_panels, # broad-family aggregate panels
housekeeping_gene_ids,
mitochondrial_gene_ids,
tme_marker_gene_ids, # tumor microenvironment markers
degradation_gene_pairs, # for RNA degradation index
cancer_family_panel,
TCGA_MEDIAN_PURITY, # per-cohort median tumor purity (Aran et al., 2015)
)
from pirlygenes.gene_sets_cancer import (
fusion_expression_effect_rules_df,
mutation_expression_effect_rules_df,
rare_cancer_fusion_rules_df,
rare_cancer_rna_surrogate_rules_df,
degenerate_subtype_pairs_df,
fusion_surrogate_expression_df,
narrative_gene_sets_df,
narrative_gene_set,
disease_state_rules_df,
)
from pirlygenes.load_dataset import get_data, get_all_csv_paths
from pirlygenes.gene_ids import (
find_canonical_gene_ids_and_names,
gene_id_aliases,
)
from pirlygenes.gene_names import display_name, short_gene_name, aliases
from pirlygenes.gene_families import (
# Generic
gene_family_for_ensembl_id, # ENSG → family name (or None)
gene_family_for_symbol, # Symbol → family name (or None)
gene_family_names, # list of every shipped family
gene_family_ids, # set of ENSGs in one named family
gene_family_symbols, # set of Symbols in one named family
gene_family_table, # long-form DataFrame across all families
# Typed per family (ID and symbol variants for each)
numt_pseudogene_ids,
numt_pseudogene_symbols,
nuclear_retained_lncrna_ids, # MALAT1, NEAT1 (ENE-stabilized)
nuclear_retained_lncrna_symbols,
rrna_and_pseudogene_ids,
rrna_and_pseudogene_symbols,
ribosomal_protein_ids,
ribosomal_protein_pseudogene_ids,
small_noncoding_rna_ids, # snoRNAs, snRNAs, miRNAs, Y RNAs, ...
histone_gene_ids,
hemoglobin_gene_ids,
immune_receptor_segment_ids, # IG/TR V/D/J/C segments
)
The gene_families panels are ENSG-keyed gene-family sets derived
from every installed Ensembl release (numt-pseudogenes.csv,
nuclear-retained-lncrnas.csv, etc.); trufflepig.expression_qc
reads them as the source of truth for its classify_gene_qc lookup.
Mitochondrial-DNA membership is sourced from the existing curated
mitochondrial-genes.csv (with a semantic Role column).
Regenerate the derived CSVs with
python scripts/generate_gene_family_sets.py after the upstream
regex panel changes.
Expression matrices and QC normalization moved to trufflepig in v5.0:
from trufflepig.reference import (
pan_cancer_expression, # 3,100 genes x 83 columns (50 tissues + 33 cancers)
cancer_expression, # one cancer type
cancer_enriched_genes, # enriched genes for one cancer type
subtype_deconvolved_expression, # non-TCGA cohorts (Treehouse, GEO sarcoma, ...)
tcga_deconvolved_expression,
tumor_up_vs_matched_normal,
heme_tumor_up_vs_matched_normal,
hpa_cell_type_expression,
estimate_signatures,
)
from trufflepig.expression_qc import (
normalize_expression,
normalize_technical_rna_long_table,
)
What's bundled (pirlygenes/data/)
| Category | Files |
|---|---|
| Therapy targets | ADC-approved.csv, ADC-trials.csv, ADC-withdrawn.csv, CAR-T-approved.csv, TCR-T-trials.csv, TCR-T-approved.csv, bispecific-antibodies-approved.csv, multispecific-tcell-engager-trials.csv, radioligand-targets.csv |
| Surface proteins | cancer-surfaceome.csv, surface-proteins.csv |
| Cancer-testis antigens | cancer-testis-antigens.csv |
| Driver / key genes | cancer-driver-genes.csv, cancer-driver-variants.csv, cancer-key-genes.csv |
| Cancer-type registry | cancer-type-registry.csv, cancer-family-panels.csv, cancer-type-genes.csv |
| Lineage panel | lineage-genes.csv |
| Rule sets | mutation-expression-effects.csv, fusion-expression-effects.csv, rare-cancer-fusion-rules.csv, rare-cancer-rna-surrogates.csv, degenerate-subtype-pairs.csv, fusion-surrogate-expression.csv, disease-state-rules.csv, narrative-gene-sets.csv |
| QC panels | housekeeping-genes.csv, mitochondrial-genes.csv, culture-stress-genes.csv, tme-markers.csv, degradation-gene-pairs.csv, ffpe-sensitive-markers.csv, artifact-expectations.csv |
| Gene families (ENSG-keyed, derived) | numt-pseudogenes.csv, nuclear-retained-lncrnas.csv, rrna-and-pseudogenes.csv, ribosomal-protein-genes.csv, ribosomal-protein-pseudogenes.csv, small-noncoding-rnas.csv, histone-genes.csv, hemoglobin-genes.csv, immune-receptor-segments.csv |
| Gene-set catalog | gene-sets.csv |
| Therapy response axes | therapy-response-signatures.csv |
| Misc | ensembl-id-aliases.csv, extra-tx-mappings.csv |
Expression matrices (pan-cancer-expression.csv,
subtype-deconvolved-expression.csv.gz, tcga-deconvolved-expression.csv.gz,
hpa-cell-type-expression.csv, tumor-up-vs-matched-normal.csv,
heme-tumor-up-vs-matched-normal.csv, estimate-signatures.csv)
moved to trufflepig/data/ in v5.0 — access via
trufflepig.reference.<accessor>().
The full curated set is the surface area trufflepig calls into.
Migrating from pirlygenes 4.x
Most data-side imports are unchanged. Anything that ran analysis,
plotting, or sample-context inference moved to trufflepig:
| Was in pirlygenes 4.x | Now in 5.0 |
|---|---|
pirlygenes CLI (analyze, compare-analyze, plot-expression, plot-cancer-cohorts, data, cancers) |
trufflepig run, trufflepig compare, trufflepig plot-cancer-cohorts, trufflepig data, trufflepig cancers |
from pirlygenes import infer_sample_context, SampleContext, plot_sample_context, plot_degradation_index |
from trufflepig.sample_context import infer_sample_context, SampleContext, plot_sample_context, plot_degradation_index |
from pirlygenes import plot_gene_expression, plot_sample_vs_cancer, plot_geneset_vs_vital_tissues, plot_ctas_vs_cancer_type_detail |
from trufflepig.plot import ... |
from pirlygenes import pan_reference_embedding_genes, get_embedding_feature_metadata |
from trufflepig.plot_embedding import ... |
from pirlygenes.tumor_purity import TCGA_MEDIAN_PURITY |
from pirlygenes.gene_sets_cancer import TCGA_MEDIAN_PURITY (moved into data-side; trufflepig re-exports) |
from pirlygenes.cli import analyze, compare_analyze (Python API) |
from trufflepig.main import analyze, compare_analyze |
Unchanged (still in pirlygenes):
gene_sets_cancer.*accessors (CTAs, surfaceome, panels, registry, etc.) except the expression accessors listed belowload_dataset.get_data,load_all_dataframes,load_all_dataframes_dictgene_ids.*,gene_names.*
Moved to trufflepig in v5.0:
pirlygenes.gene_sets_cancer.pan_cancer_expression→trufflepig.reference.pan_cancer_expressionpirlygenes.gene_sets_cancer.cancer_expression→trufflepig.reference.cancer_expressionpirlygenes.gene_sets_cancer.cancer_enriched_genes→trufflepig.reference.cancer_enriched_genespirlygenes.gene_sets_cancer.tcga_deconvolved_expression→trufflepig.reference.tcga_deconvolved_expressionpirlygenes.gene_sets_cancer.subtype_deconvolved_expression→trufflepig.reference.subtype_deconvolved_expressionpirlygenes.gene_sets_cancer.tumor_up_vs_matched_normal→trufflepig.reference.tumor_up_vs_matched_normalpirlygenes.gene_sets_cancer.heme_tumor_up_vs_matched_normal→trufflepig.reference.heme_tumor_up_vs_matched_normalpirlygenes.expression_qc.classify_gene_qc→trufflepig.expression_qc.classify_gene_qc(now ENSG-aware viapirlygenes.gene_families)pirlygenes.expression_qc.normalize_expression→trufflepig.expression_normalize.normalize_expressionpirlygenes.expression_qc.normalize_technical_rna_long_table→trufflepig.expression_normalize.normalize_technical_rna_long_table
If the pirlygenes console-script is still on PATH from a prior install, it now prints a one-line "moved to trufflepig" notice and exits 2.
Migration history
- v5.0.0 —
analyze,compare-analyze, plotting, and reporting moved totrufflepig. ThepirlygenesCLI is removed; data and Python API are unchanged. - v4.x — combined data + analysis package.
See pirl-unc/trufflepig#1
for the migration umbrella and
pirl-unc/pirlygenes#119
for the deprecation tracking.
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pirlygenes-5.0.2.tar.gz.
File metadata
- Download URL: pirlygenes-5.0.2.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ef3f816ea22ed95a25a8b3dad3a7f82ebb6dc05727700b984c336adb1444a71
|
|
| MD5 |
bbd6b793c6b0b5893024d1b64405b16a
|
|
| BLAKE2b-256 |
bae60afdd85bb364c8aa6f2464037f8873e23260ba8d43416f973027eb55766e
|
File details
Details for the file pirlygenes-5.0.2-py3-none-any.whl.
File metadata
- Download URL: pirlygenes-5.0.2-py3-none-any.whl
- Upload date:
- Size: 276.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92edbc4e01636d17c12346d0c8661497d5c26a278f761899741b20ea3a0a3eab
|
|
| MD5 |
d171216f666459aaaa92768c85beb57b
|
|
| BLAKE2b-256 |
46b3bddc8a32c2d445a0f4f36d36b44f766850f140a0f6fa07356f4b03a9b62d
|