Skip to main content

Splitpea: method for calculating network rewiring changes due to splicing

Project description

Splitpea

This is the Python package implementation of Splitpea: SPLicing InTeractions PErsonAlized.

Original repository: https://github.com/ylaboratory/splitpea
Repository for this package's code: https://github.com/ylaboratory/splitpea_package
This pip package is an extended version with easier installation and additional functions.

Splitpea quantifies rewiring in protein-protein interaction (PPI) networks driven by alternative splicing events. It integrates differential exon usage (PSI values) with domain-domain interactions (DDIs) and PPIs to generate condition or sample specific networks.

Installation

Install from PyPI:

pip install splitpea

Some functions require tabix to be installed; to do so, simply run:

sudo apt-get install tabix

Alternatively, if you are using conda, you can install tabix via:

conda install -c bioconda tabix

Functions that depend on tabix will raise an error if tabix is not found.

Requirements

  • Python >= 3.8
  • Packages:
    • numpy, networkx, intervaltree, ipykernel, matplotlib, plotnine, scikit-learn, adjustText, importlib_resources, requests

Features

  • Adjustable PSI and significance thresholds
  • Bundled reference files, with options for customization
  • Species-specific references: human (default) or mouse
  • Main Outputs:
    • .edges.dat: Edge list with weights and if an edge is chaos 
    • .edges.pickle: NetworkX graph object
  • Additional features and outputs:
    • Edge rewiring summaries
    • Gene level statistics
    • Auto-generated network plots
    • Gephi-compatible TSV files
    • Cytoscape-compatible gml files

Two modes

  1. Sample-specific mode (differential_format=sample_specific)

    • Input: a differential exon table for a single sample.
    • Tab-delimited header:
      ensembl.id  symbol  chr  strand  exon.start  exon.end  psi.background  psi.sample  delta.psi  pval
      
    • Can be generated via preprocess_pooled (see below) from two splicing matrices: one for background samples and another for samples to compare individually against it. Supports SUPPA2 (.psi) and rMATS formats (SE.MATS.JC.txt or SE.MATS.JCEC.txt).
  2. Condition-specific mode (differential_format=suppa2 or rmats)

    • SUPPA2: provide both .psivec and .dpsi (order agnostic)
    • rMATS: use SE.MATS.JC.txt or SE.MATS.JCEC.txt

Currently, only skipped exon (SE) events are supported.


Quick start

Splitpea can be run from the command line or within Python.

Main subcommands:

  • run — Build a rewired network
  • plot — Visualize a saved network
  • stats — Compute edge/gene statistics
  • preprocess_pooled — Generate the differential exon table for sample-specific mode
  • get_consensus_network- Generate summary consensus networks from a directory of rewired networks

Command line

Examples:

# Sample-specific mode
splitpea run sample1-psi.txt out/sample1

# Condition-specific mode (SUPPA2)
splitpea run diffSplice.psivec diffSplice.dpsi out/condA \
  --differential_format suppa2

# Condition-specific mode (rMATS)
splitpea run SE.MATS.JCEC.txt out/condB \
  --differential_format rmats

Note: The second argument (out/sample1, out/condA, etc.) is the output file prefix.

Python API

Examples:

import splitpea

# Sample-specific mode
G = splitpea.run("sample1-psi.txt", "out/sample1")

# Condition-specific mode (SUPPA2)
G = splitpea.run(["diffSplice.psivec", "diffSplice.dpsi"], "out/condA",
                 differential_format="suppa2")

# Condition-specific mode (rMATS)
G = splitpea.run("SE.MATS.JCEC.txt", "out/condB",
                 differential_format="rmats")

run

Required:

  • in_file
    • sample_specific: one file path
    • suppa2: two file paths (.psivec and .dpsi, typically outputed from diffSplice)
    • rmats: one JC or JCEC file path

Options:

  • out_file_prefix — Prefix for all output files (directory + base name); if no prefix is given, the output prefix defaults to out_rewired_network
  • --differential_format {sample_specific,suppa2,rmats} (default: sample_specific)
  • --skip (int) — Number of lines to skip in input file (default: 1)
  • --dpsi_cut (float) — Delta PSI cutoff (default: 0.05)
  • --sigscore_cut (float) — Significance score cutoff (default: 0.05)
  • --include_nas (bool) — Include NAs in significance testing (default: True)
  • --index {0,1} — Coordinate indexing (default: auto-set per format)
  • --map_path — Custom mapping file between IDs and symbols (tab-delimited with headers: symbol entrez ensembl uniprot)
  • --edge_stats_file — Append gain/loss/chaos counts to a stats file

Reference files Splitpea ships bundled reference datasets for human and mouse. These load automatically based on --species unless you override paths.

  • --species {human,mouse} — Which bundled references to use (default: human)
  • ppif — protein–protein interactions
  • ddif — domain–domain interactions
  • entrezpfamf - Entrez–Pfam mapping*
  • pfamcoordsf - Pfam genome coordinates
  • tbf - Tabix index ( usually the .tbi for pfamcoordsf)
  • map_path - Gene ID mapping: symbol/entrez/ensembl/uniprot

To override default bundled references, pass custom paths to the corresponding parameters.


Outputs

For each run (out_file_prefix), Splitpea produces:

  • <prefix>.edges.dat — edge list with node1 node2 weight chaos
  • <prefix>.edges.pickle — NetworkX graph

Python API: splitpea.run(...) returns the rewired networkx.Graph.


Other Subcommands

plot

Load a saved network (.edges.pickle) and export it in various formats or as a plot.

Example (Python):

splitpea.plot(
    pickle_path="out/sample1.edges.pickle",
    pdf_path="plot/sample1_plot.pdf",         # optional
    gephi_path="plot/sample1_gephi.csv",      # optional
    cytoscape_path="plot/sample1_cyto.gml",   # optional
    with_labels=True,
    symbol=True,
    map_path=None,        # use bundled mapping if None
    species="human",
    self_edges=False,
    lcc=True
)

Parameters:

  • pickle_path (required) — Path to .edges.pickle file generated by splitpea run.
  • --with_labels — Draw node labels in plots. Default: False.
  • --pdf_path — Path to save a PDF of the plotted network via Matplotlib. Omit to skip PDF.
  • --gephi_path — Path to save a Gephi-compatible TSV file. Omit to skip.
  • --cytoscape_path — Path to save a Cytoscape-compatible .gml file. Omit to skip.
  • --symbol — If True (default), replace Entrez IDs with gene symbols.
  • --map_path — Path to custom gene ID mapping file. Defaults to bundled mapping if not provided.
  • --species — Species for mapping defaults (human or mouse).
  • --self_edges — If True, keep self-loop edges in output. Default: False.
  • --lcc — If True (default), plot only the largest connected component.
  • --max_nodes — Maximum nodes allowed in matplotlib plotting; larger graphs skip matplotlib. Default: 2000.
  • --max_edges — Maximum edges allowed in matplotlib plotting; larger graphs skip matplotlib. Default: 10000.

Outputs (depending on args):

  • PDF plot, Gephi TSV, and/or Cytoscape GML.

stats

Summarize edge counts and write per-gene statistics.

Example (Python):

splitpea.stats(
    dat_file="out/sample1.edges.dat",
    rewire_net="out/sample1.edges.pickle",
    out_file_prefix="stats/sample1",
    species="human",
    map_path=None,      # use bundled if None
    ppif=None, ddif=None, entrezpfamf=None  # override to use custom references
)

Parameters:

  • dat_file (required) — Path to .edges.dat file generated by splitpea.run.
  • rewire_net (required) — Path to .edges.pickle file for the same network.
  • out_file_prefix (required) — Prefix for output files (e.g., stats/sample1).
  • ppif — Path to PPI reference file. Defaults to bundled species reference.
  • ddif — Path to DDI reference file. Defaults to bundled species reference.
  • entrezpfamf — Path to Entrez–Pfam mapping file. Defaults to bundled species reference.
  • map_path — Path to gene ID mapping file. Defaults to bundled mapping if not provided.
  • species — Species for selecting default references (human or mouse).

Outputs:

  • Console printout of:
    • Gain — number of edges gained in rewired network
    • Loss — number of edges lost in rewired network
    • Chaos — number of chaos edges
  • <out_file_prefix>_gene_stats.csv containing:
    • Entrez ID
    • Gene symbol
    • Node degree in rewired network
    • Normalized degree relative to background PPI network
    • Counts of gain/loss/chaos edges incident to the gene

preprocess_pooled

Helper function that builds sample-specific Splitpea inputs by comparing each target sample to a pooled normal background.
It either downloads a normal splicing matrix from IRIS (GTEx) for a chosen tissue or a user can provide their own normal matrix. You can provide either a .txt file from rMATS output files (SE.MATS.JC.txt or SE.MATS.JCEC.txt) for a single sample or a folder containing multiple samples and rMATS output files (SE.MATS.JC.txt or SE.MATS.JCEC.txt). The function will automatically process these inputs. For rMATS, it may be helpful to rename each file in the folder to match your sample names.

What the function does:

  1. Load a target splicing matrix (compare_path) with exon rows and sample columns (PSI values).
  2. Obtain a normal/background splicing matrix:
    • Either download GTEx <Tissue> from IRIS (if background is given), or
    • Use your local file (background_path).
  3. Compute mean PSI per exon for the background.
  4. For each target sample, compute delta PSI vs. background and a p-value.
  5. Write per-sample Splitpea-ready files ({sample}-psi.txt) into out_psi_dir.

Example (Python):

# Option A: download and use IRIS/GTEx matrix
splitpea.preprocess_pooled(
    compare_path="compare.txt",
    background="Brain",
    background_download_root="/data/iris_cache",   # creates GTEx_<Tissue>/splicing_matrix/ here
    out_psi_dir="out_psi"
)

# Option B: use a local normal matrix
splitpea.preprocess_pooled(
    compare_path="compare.txt",
    background_path="/path/to/GTEx_Brain/splicing_matrix.txt",
    out_psi_dir="out_psi"
)

Inputs:

  • compare_path (required) — Target (case) splicing matrix to compare against the pooled normal.
    • Can be:
      • a .txt file from rMATS (SE.MATS.JC.txt or SE.MATS.JCEC.txt format)
      • a directory of rMATS files (SE.MATS.JC.txt or SE.MATS.JCEC.txt format), with each file renamed to match its sample, or
      • a .txt file in tab-delimited matrix format (rows = exons, columns = samples, values = PSI [0–1]).
        • Expected header example:
          GeneID  geneSymbol  chr  strand  exonStart  exonEnd  upstreamEE  downstreamES  sample1  sample2  ...
          
  • Normal/background matrix — Provide one of:
    • background_ — IRIS/GTEx tissue name to auto-download the background matrix (e.g., Brain, AdiposeTissue, …).
      Optional: tissue_download_root to control where the GTEx_<Tissue>/splicing_matrix/ folder is created.
    • background_path — Path to a pre-downloaded normal splicing data. Same accepted formats/data files as compare_path.

You must supply either background_ or background__path.

Parameters:

  • compare_path (str, required) — Path to the target splicing data to be compared.
  • out_psi_dir (str) — Output directory for per-sample Splitpea files ({sample}-psi.txt). Will be created if missing. If none is given, defaults to create a out_psi folder in the current working directory.
  • background (str) — IRIS/GTEx tissue name to auto-download the normal data.
    Must be one of: AdiposeTissue, AdrenalGland, Bladder, Blood, BloodVessel, Brain, Breast, CervixUteri, Colon, Esophagus, FallopianTube, Heart, Kidney, Liver, Lung, Muscle, Nerve, Ovary, Pancreas, Pituitary, Prostate, SalivaryGland, Skin, SmallIntestine, Spleen, Stomach, Testis, Thyroid, Uterus, Vagina.
  • background_download_root (str) — Root directory for the IRIS download cache (creates GTEx_<Tissue>/splicing_matrix/ under this path). If not given, defaults to current working directory.
  • background_path (str) — Path to an existing normal/background splicing data file; bypasses download.
  • map_path (str) — Gene ID mapping: symbol/entrez/ensembl/uniprot. If none is given, uses bundled mapping.
  • species (str) — Species identifier (e.g. human, mouse); determines which default map path to use.
  • single_rMATS_compare (bool, optional) — If True, treat compare_path as a single rMATS file instead of a directory.
  • single_rMATS_background (bool, optional) — If True, treat background_path as a single rMATS file instead of a directory.
  • inclevel (int, optional) — Which inclusion-level field to use when parsing rMATS (1 or 2). Defaults to 1.

Outputs (written to out_psi_dir):

  • {sample}-psi.txt files (one per target sample), each a Splitpea (sample-specific) format table.

get_consensus_network

Build consensus loss and gain summary networks from a directory of sample networks (*.pickle from splitpea.run). Chaos edges are excluded; edge weights and counts are accumulated across samples.

Example (Python):

cons_neg, cons_pos = splitpea.get_consensus_network(
    net_dir="output"
)

Parameters:

  • net_dir (str, required) — Directory containing sample .pickle graphs.

Outputs:

  • (cons_neg, cons_pos) — two NetworkX graphs:
    • cons_neg: edges with weight <= 0, attributes weight (sum) and num_neg (count).
    • cons_pos: edges with weight > 0, attributes weight (sum) and num_pos (count).
    • Both graphs have graph['num_graphs'] = number of sample graphs combined.
  • Files written:
    • consensus_network_neg.pickle
    • consensus_network_pos.pickle

Citation

If you use Splitpea, please cite:

@inproceedings{dannenfelser2023splitpea,
  title={Splitpea: quantifying protein interaction network rewiring changes due to alternative splicing in cancer},
  author={Dannenfelser, Ruth and Yao, Vicky},
  booktitle={Pacific Symposium on Biocomputing 2024},
  pages={579--593},
  year={2023},
  organization={World Scientific}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

splitpea-0.1.0.tar.gz (15.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

splitpea-0.1.0-py3-none-any.whl (15.7 MB view details)

Uploaded Python 3

File details

Details for the file splitpea-0.1.0.tar.gz.

File metadata

  • Download URL: splitpea-0.1.0.tar.gz
  • Upload date:
  • Size: 15.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for splitpea-0.1.0.tar.gz
Algorithm Hash digest
SHA256 18cdcb703bf5a906a6bb7a84d3dea910400d7f2261c0d3d7b7aa3d431f17a603
MD5 7897f9f60a93cd6beaeba4249e2d1790
BLAKE2b-256 88b43d70552ba64f2957c94d65f7563bf021b025b764c5b20c28d6e7b1f64a1f

See more details on using hashes here.

File details

Details for the file splitpea-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: splitpea-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for splitpea-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 10f1efc094ed49f2b01f314b8dbeadd33fc1403eda77b9686079e8bbc9524a0e
MD5 eb30d70c343a26083a054bd46a8d5916
BLAKE2b-256 4a3ef4f6dc54c49a9cef9833462406cef92f40d6cb64175a27c8f140619cddb6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page