Skip to main content

Curated cancer gene sets and reference expression data. Analysis lives in `trufflepig`.

Project description

pirlygenes

Curated cancer gene-knowledge data.

Analysis, plotting, the analyze CLI, and all expression matrices moved to trufflepig in v5.0. This package now ships gene-knowledge data only:

  • curated gene-set CSVs (therapy targets, CTAs, cancer-driver genes, housekeeping genes, surface proteins, immune/stromal marker panels, lineage and matched-normal panels, fusion/mutation expression-effect rules, narrative gene sets, …)
  • the cancer-type registry and gene-symbol/Ensembl-ID resolvers
  • cohort-baseline constants (e.g. TCGA_MEDIAN_PURITY)

Expression matrices (pan-cancer TCGA reference, subtype-deconvolved non-TCGA cohorts, TCGA deconvolution, HPA cell-type expression, tumor-up-vs-matched-normal panels, ESTIMATE signatures) ship with trufflepig — use trufflepig.reference.<accessor>() to read them.

Install

pip install pirlygenes

Run analyses with trufflepig (distributed on PyPI as pirl-trufflepig — the bare trufflepig name is owned by an unrelated package; the command + Python import are both still trufflepig):

pip install pirl-trufflepig
trufflepig run --sample expr.tsv --workspace out --cancer-type PRAD

Python API

from pirlygenes.gene_sets_cancer import (
    CTA_gene_names,                   # ~257 cancer-testis antigens
    surface_protein_gene_names,       # 2,799 surfaceome genes
    cancer_surfaceome_gene_names,     # 147 tumor-specific surface targets
    therapy_target_gene_names,        # by modality: "ADC", "CAR-T", "TCR-T", "bispecific", ...
    cancer_type_registry,             # cancer-type registry DataFrame
    lineage_genes_by_cancer_type,     # lineage panels
    cancer_family_panels,             # broad-family aggregate panels
    housekeeping_gene_ids,
    mitochondrial_gene_ids,
    tme_marker_gene_ids,              # tumor microenvironment markers
    degradation_gene_pairs,           # for RNA degradation index
    cancer_family_panel,
    TCGA_MEDIAN_PURITY,               # per-cohort median tumor purity (Aran et al., 2015)
)
from pirlygenes.gene_sets_cancer import (
    fusion_expression_effect_rules_df,
    mutation_expression_effect_rules_df,
    rare_cancer_fusion_rules_df,
    rare_cancer_rna_surrogate_rules_df,
    degenerate_subtype_pairs_df,
    fusion_surrogate_expression_df,
    narrative_gene_sets_df,
    narrative_gene_set,
    disease_state_rules_df,
)
from pirlygenes.load_dataset import get_data, get_all_csv_paths
from pirlygenes.gene_ids import (
    find_canonical_gene_ids_and_names,
    gene_id_aliases,
)
from pirlygenes.gene_names import display_name, short_gene_name, aliases
from pirlygenes.gene_families import (
    # Generic
    gene_family_for_ensembl_id,    # ENSG → family name (or None)
    gene_family_for_symbol,        # Symbol → family name (or None)
    gene_family_names,             # list of every shipped family
    gene_family_ids,               # set of ENSGs in one named family
    gene_family_symbols,           # set of Symbols in one named family
    gene_family_table,             # long-form DataFrame across all families
    # Typed per family (ID and symbol variants for each)
    numt_pseudogene_ids,
    numt_pseudogene_symbols,
    nuclear_retained_lncrna_ids,   # MALAT1, NEAT1 (ENE-stabilized)
    nuclear_retained_lncrna_symbols,
    rrna_and_pseudogene_ids,
    rrna_and_pseudogene_symbols,
    ribosomal_protein_ids,
    ribosomal_protein_pseudogene_ids,
    small_noncoding_rna_ids,       # snoRNAs, snRNAs, miRNAs, Y RNAs, ...
    histone_gene_ids,
    hemoglobin_gene_ids,
    immune_receptor_segment_ids,   # IG/TR V/D/J/C segments
)

The gene_families panels are ENSG-keyed gene-family sets derived from every installed Ensembl release (numt-pseudogenes.csv, nuclear-retained-lncrnas.csv, etc.); trufflepig.expression_qc reads them as the source of truth for its classify_gene_qc lookup. Mitochondrial-DNA membership is sourced from the existing curated mitochondrial-genes.csv (with a semantic Role column). Regenerate the derived CSVs with python scripts/generate_gene_family_sets.py after the upstream regex panel changes.

Expression matrices and QC normalization moved to trufflepig in v5.0:

from trufflepig.reference import (
    pan_cancer_expression,            # 3,100 genes x 83 columns (50 tissues + 33 cancers)
    cancer_expression,                # one cancer type
    cancer_enriched_genes,            # enriched genes for one cancer type
    subtype_deconvolved_expression,   # non-TCGA cohorts (Treehouse, GEO sarcoma, ...)
    tcga_deconvolved_expression,
    tumor_up_vs_matched_normal,
    heme_tumor_up_vs_matched_normal,
    hpa_cell_type_expression,
    estimate_signatures,
)
from trufflepig.expression_qc import (
    normalize_expression,
    normalize_technical_rna_long_table,
)

What's bundled (pirlygenes/data/)

Category Files
Therapy targets ADC-approved.csv, ADC-trials.csv, ADC-withdrawn.csv, CAR-T-approved.csv, TCR-T-trials.csv, TCR-T-approved.csv, bispecific-antibodies-approved.csv, multispecific-tcell-engager-trials.csv, radioligand-targets.csv
Surface proteins cancer-surfaceome.csv, surface-proteins.csv
Cancer-testis antigens cancer-testis-antigens.csv
Driver / key genes cancer-driver-genes.csv, cancer-driver-variants.csv, cancer-key-genes.csv
Cancer-type registry cancer-type-registry.csv, cancer-family-panels.csv, cancer-type-genes.csv
Lineage panel lineage-genes.csv
Rule sets mutation-expression-effects.csv, fusion-expression-effects.csv, rare-cancer-fusion-rules.csv, rare-cancer-rna-surrogates.csv, degenerate-subtype-pairs.csv, fusion-surrogate-expression.csv, disease-state-rules.csv, narrative-gene-sets.csv
QC panels housekeeping-genes.csv, mitochondrial-genes.csv, culture-stress-genes.csv, tme-markers.csv, degradation-gene-pairs.csv, ffpe-sensitive-markers.csv, artifact-expectations.csv
Gene families (ENSG-keyed, derived) numt-pseudogenes.csv, nuclear-retained-lncrnas.csv, rrna-and-pseudogenes.csv, ribosomal-protein-genes.csv, ribosomal-protein-pseudogenes.csv, small-noncoding-rnas.csv, histone-genes.csv, hemoglobin-genes.csv, immune-receptor-segments.csv
Gene-set catalog gene-sets.csv
Therapy response axes therapy-response-signatures.csv
Misc ensembl-id-aliases.csv, extra-tx-mappings.csv

Expression matrices (pan-cancer-expression.csv, subtype-deconvolved-expression.csv.gz, tcga-deconvolved-expression.csv.gz, hpa-cell-type-expression.csv, tumor-up-vs-matched-normal.csv, heme-tumor-up-vs-matched-normal.csv, estimate-signatures.csv) moved to trufflepig/data/ in v5.0 — access via trufflepig.reference.<accessor>().

The full curated set is the surface area trufflepig calls into.

Migrating from pirlygenes 4.x

Most data-side imports are unchanged. Anything that ran analysis, plotting, or sample-context inference moved to trufflepig:

Was in pirlygenes 4.x Now in 5.0
pirlygenes CLI (analyze, compare-analyze, plot-expression, plot-cancer-cohorts, data, cancers) trufflepig run, trufflepig compare, trufflepig plot-cancer-cohorts, trufflepig data, trufflepig cancers
from pirlygenes import infer_sample_context, SampleContext, plot_sample_context, plot_degradation_index from trufflepig.sample_context import infer_sample_context, SampleContext, plot_sample_context, plot_degradation_index
from pirlygenes import plot_gene_expression, plot_sample_vs_cancer, plot_geneset_vs_vital_tissues, plot_ctas_vs_cancer_type_detail from trufflepig.plot import ...
from pirlygenes import pan_reference_embedding_genes, get_embedding_feature_metadata from trufflepig.plot_embedding import ...
from pirlygenes.tumor_purity import TCGA_MEDIAN_PURITY from pirlygenes.gene_sets_cancer import TCGA_MEDIAN_PURITY (moved into data-side; trufflepig re-exports)
from pirlygenes.cli import analyze, compare_analyze (Python API) from trufflepig.main import analyze, compare_analyze

Unchanged (still in pirlygenes):

  • gene_sets_cancer.* accessors (CTAs, surfaceome, panels, registry, etc.) except the expression accessors listed below
  • load_dataset.get_data, load_all_dataframes, load_all_dataframes_dict
  • gene_ids.*, gene_names.*

Moved to trufflepig in v5.0:

  • pirlygenes.gene_sets_cancer.pan_cancer_expressiontrufflepig.reference.pan_cancer_expression
  • pirlygenes.gene_sets_cancer.cancer_expressiontrufflepig.reference.cancer_expression
  • pirlygenes.gene_sets_cancer.cancer_enriched_genestrufflepig.reference.cancer_enriched_genes
  • pirlygenes.gene_sets_cancer.tcga_deconvolved_expressiontrufflepig.reference.tcga_deconvolved_expression
  • pirlygenes.gene_sets_cancer.subtype_deconvolved_expressiontrufflepig.reference.subtype_deconvolved_expression
  • pirlygenes.gene_sets_cancer.tumor_up_vs_matched_normaltrufflepig.reference.tumor_up_vs_matched_normal
  • pirlygenes.gene_sets_cancer.heme_tumor_up_vs_matched_normaltrufflepig.reference.heme_tumor_up_vs_matched_normal
  • pirlygenes.expression_qc.classify_gene_qctrufflepig.expression_qc.classify_gene_qc (now ENSG-aware via pirlygenes.gene_families)
  • pirlygenes.expression_qc.normalize_expressiontrufflepig.expression_normalize.normalize_expression
  • pirlygenes.expression_qc.normalize_technical_rna_long_tabletrufflepig.expression_normalize.normalize_technical_rna_long_table

If the pirlygenes console-script is still on PATH from a prior install, it now prints a one-line "moved to trufflepig" notice and exits 2.

Migration history

  • v5.0.0analyze, compare-analyze, plotting, and reporting moved to trufflepig. The pirlygenes CLI is removed; data and Python API are unchanged.
  • v4.x — combined data + analysis package.

See pirl-unc/trufflepig#1 for the migration umbrella and pirl-unc/pirlygenes#119 for the deprecation tracking.

License

Apache 2.0 — see LICENSE.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pirlygenes-5.0.2.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pirlygenes-5.0.2-py3-none-any.whl (276.0 kB view details)

Uploaded Python 3

File details

Details for the file pirlygenes-5.0.2.tar.gz.

File metadata

  • Download URL: pirlygenes-5.0.2.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pirlygenes-5.0.2.tar.gz
Algorithm Hash digest
SHA256 5ef3f816ea22ed95a25a8b3dad3a7f82ebb6dc05727700b984c336adb1444a71
MD5 bbd6b793c6b0b5893024d1b64405b16a
BLAKE2b-256 bae60afdd85bb364c8aa6f2464037f8873e23260ba8d43416f973027eb55766e

See more details on using hashes here.

File details

Details for the file pirlygenes-5.0.2-py3-none-any.whl.

File metadata

  • Download URL: pirlygenes-5.0.2-py3-none-any.whl
  • Upload date:
  • Size: 276.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pirlygenes-5.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 92edbc4e01636d17c12346d0c8661497d5c26a278f761899741b20ea3a0a3eab
MD5 d171216f666459aaaa92768c85beb57b
BLAKE2b-256 46b3bddc8a32c2d445a0f4f36d36b44f766850f140a0f6fa07356f4b03a9b62d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page