Splitpea: method for calculating network rewiring changes due to splicing

These details have not been verified by PyPI

Project links

Homepage

Project description

Splitpea

This is the Python package implementation of Splitpea: SPLicing InTeractions PErsonAlized.

Original repository: https://github.com/ylaboratory/splitpea
Repository for this package's code: https://github.com/ylaboratory/splitpea_package
This pip package is an extended version with easier installation and additional functions.

Splitpea quantifies rewiring in protein-protein interaction (PPI) networks driven by alternative splicing events. It integrates differential exon usage (PSI values) with domain-domain interactions (DDIs) and PPIs to generate condition or sample specific networks.

Installation

Install from PyPI:

pip install splitpea

Some functions require tabix to be installed; to do so, simply run:

sudo apt-get install tabix

Alternatively, if you are using conda, you can install tabix via:

conda install -c bioconda tabix

Functions that depend on tabix will raise an error if tabix is not found.

Requirements

Python >= 3.8
Packages:
- numpy, networkx, intervaltree, ipykernel, matplotlib, plotnine, scikit-learn, adjustText, importlib_resources, requests

Features

Adjustable PSI and significance thresholds
Bundled reference files, with options for customization
Species-specific references: human (default) or mouse
Main Outputs:
- .edges.dat: Edge list with weights and if an edge is chaos
- .edges.pickle: NetworkX graph object
Additional features and outputs:
- Edge rewiring summaries
- Gene level statistics
- Auto-generated network plots
- Gephi-compatible TSV files
- Cytoscape-compatible gml files

Two modes

Sample-specific mode (differential_format=sample_specific)
- Input: a differential exon table for a single sample.
- Tab-delimited header:
```
ensembl.id  symbol  chr  strand  exon.start  exon.end  psi.background  psi.sample  delta.psi  pval
```
- Can be generated via preprocess_pooled (see below) from two splicing matrices: one for background samples and another for samples to compare individually against it. Supports SUPPA2 (.psi) and rMATS formats (SE.MATS.JC.txt or SE.MATS.JCEC.txt).
Condition-specific mode (differential_format=suppa2 or rmats)
- SUPPA2: provide both .psivec and .dpsi (order agnostic)
- rMATS: use SE.MATS.JC.txt or SE.MATS.JCEC.txt

Currently, only skipped exon (SE) events are supported.

Quick start

Splitpea can be run from the command line or within Python.

Main subcommands:

run — Build a rewired network
plot — Visualize a saved network
stats — Compute edge/gene statistics
preprocess_pooled — Generate the differential exon table for sample-specific mode
get_consensus_network- Generate summary consensus networks from a directory of rewired networks

Command line

Examples:

# Sample-specific mode
splitpea run sample1-psi.txt out/sample1

# Condition-specific mode (SUPPA2)
splitpea run diffSplice.psivec diffSplice.dpsi out/condA \
  --differential_format suppa2

# Condition-specific mode (rMATS)
splitpea run SE.MATS.JCEC.txt out/condB \
  --differential_format rmats

Note: The second argument (out/sample1, out/condA, etc.) is the output file prefix.

Python API

Examples:

import splitpea

# Sample-specific mode
G = splitpea.run("sample1-psi.txt", "out/sample1")

# Condition-specific mode (SUPPA2)
G = splitpea.run(["diffSplice.psivec", "diffSplice.dpsi"], "out/condA",
                 differential_format="suppa2")

# Condition-specific mode (rMATS)
G = splitpea.run("SE.MATS.JCEC.txt", "out/condB",
                 differential_format="rmats")

`run`

Required:

in_file —
- sample_specific: one file path
- suppa2: two file paths (.psivec and .dpsi, typically outputed from diffSplice)
- rmats: one JC or JCEC file path

Options:

out_file_prefix — Prefix for all output files (directory + base name); if no prefix is given, the output prefix defaults to out_rewired_network
--differential_format {sample_specific,suppa2,rmats} (default: sample_specific)
--skip (int) — Number of lines to skip in input file (default: 1)
--dpsi_cut (float) — Delta PSI cutoff (default: 0.05)
--sigscore_cut (float) — Significance score cutoff (default: 0.05)
--include_nas (bool) — Include NAs in significance testing (default: True)
--index {0,1} — Coordinate indexing (default: auto-set per format)
--map_path — Custom mapping file between IDs and symbols (tab-delimited with headers: symbol entrez ensembl uniprot)
--edge_stats_file — Append gain/loss/chaos counts to a stats file

Reference files Splitpea ships bundled reference datasets for human and mouse. These load automatically based on --species unless you override paths.

--species {human,mouse} — Which bundled references to use (default: human)
ppif — protein–protein interactions
ddif — domain–domain interactions
entrezpfamf - Entrez–Pfam mapping*
pfamcoordsf - Pfam genome coordinates
tbf - Tabix index ( usually the .tbi for pfamcoordsf)
map_path - Gene ID mapping: symbol/entrez/ensembl/uniprot

To override default bundled references, pass custom paths to the corresponding parameters.

Outputs

For each run (out_file_prefix), Splitpea produces:

<prefix>.edges.dat — edge list with node1 node2 weight chaos
<prefix>.edges.pickle — NetworkX graph

Python API: splitpea.run(...) returns the rewired networkx.Graph.

Other Subcommands

`plot`

Load a saved network (.edges.pickle) and export it in various formats or as a plot.

Example (Python):

splitpea.plot(
    pickle_path="out/sample1.edges.pickle",
    pdf_path="plot/sample1_plot.pdf",         # optional
    gephi_path="plot/sample1_gephi.csv",      # optional
    cytoscape_path="plot/sample1_cyto.gml",   # optional
    with_labels=True,
    symbol=True,
    map_path=None,        # use bundled mapping if None
    species="human",
    self_edges=False,
    lcc=True
)

Parameters:

pickle_path (required) — Path to .edges.pickle file generated by splitpea run.
--with_labels — Draw node labels in plots. Default: False.
--pdf_path — Path to save a PDF of the plotted network via Matplotlib. Omit to skip PDF.
--gephi_path — Path to save a Gephi-compatible TSV file. Omit to skip.
--cytoscape_path — Path to save a Cytoscape-compatible .gml file. Omit to skip.
--symbol — If True (default), replace Entrez IDs with gene symbols.
--map_path — Path to custom gene ID mapping file. Defaults to bundled mapping if not provided.
--species — Species for mapping defaults (human or mouse).
--self_edges — If True, keep self-loop edges in output. Default: False.
--lcc — If True (default), plot only the largest connected component.
--max_nodes — Maximum nodes allowed in matplotlib plotting; larger graphs skip matplotlib. Default: 2000.
--max_edges — Maximum edges allowed in matplotlib plotting; larger graphs skip matplotlib. Default: 10000.

Outputs (depending on args):

PDF plot, Gephi TSV, and/or Cytoscape GML.

`stats`

Summarize edge counts and write per-gene statistics.

Example (Python):

splitpea.stats(
    dat_file="out/sample1.edges.dat",
    rewire_net="out/sample1.edges.pickle",
    out_file_prefix="stats/sample1",
    species="human",
    map_path=None,      # use bundled if None
    ppif=None, ddif=None, entrezpfamf=None  # override to use custom references
)

Parameters:

dat_file (required) — Path to .edges.dat file generated by splitpea.run.
rewire_net (required) — Path to .edges.pickle file for the same network.
out_file_prefix (required) — Prefix for output files (e.g., stats/sample1).
ppif — Path to PPI reference file. Defaults to bundled species reference.
ddif — Path to DDI reference file. Defaults to bundled species reference.
entrezpfamf — Path to Entrez–Pfam mapping file. Defaults to bundled species reference.
map_path — Path to gene ID mapping file. Defaults to bundled mapping if not provided.
species — Species for selecting default references (human or mouse).

Outputs:

Console printout of:
- Gain — number of edges gained in rewired network
- Loss — number of edges lost in rewired network
- Chaos — number of chaos edges
<out_file_prefix>_gene_stats.csv containing:
- Entrez ID
- Gene symbol
- Node degree in rewired network
- Normalized degree relative to background PPI network
- Counts of gain/loss/chaos edges incident to the gene

`preprocess_pooled`

Helper function that builds sample-specific Splitpea inputs by comparing each target sample to a pooled normal background.
It either downloads a normal splicing matrix from IRIS (GTEx) for a chosen tissue or a user can provide their own normal matrix. You can provide either a .txt file from rMATS output files (SE.MATS.JC.txt or SE.MATS.JCEC.txt) for a single sample or a folder containing multiple samples and rMATS output files (SE.MATS.JC.txt or SE.MATS.JCEC.txt). The function will automatically process these inputs. For rMATS, it may be helpful to rename each file in the folder to match your sample names.

What the function does:

Load a target splicing matrix (compare_path) with exon rows and sample columns (PSI values).
Obtain a normal/background splicing matrix:
- Either download GTEx <Tissue> from IRIS (if background is given), or
- Use your local file (background_path).
Compute mean PSI per exon for the background.
For each target sample, compute delta PSI vs. background and a p-value.
Write per-sample Splitpea-ready files ({sample}-psi.txt) into out_psi_dir.

Example (Python):

# Option A: download and use IRIS/GTEx matrix
splitpea.preprocess_pooled(
    compare_path="compare.txt",
    background="Brain",
    background_download_root="/data/iris_cache",   # creates GTEx_<Tissue>/splicing_matrix/ here
    out_psi_dir="out_psi"
)

# Option B: use a local normal matrix
splitpea.preprocess_pooled(
    compare_path="compare.txt",
    background_path="/path/to/GTEx_Brain/splicing_matrix.txt",
    out_psi_dir="out_psi"
)

Inputs:

compare_path (required) — Target (case) splicing matrix to compare against the pooled normal.
- Can be:
  - a .txt file from rMATS (SE.MATS.JC.txt or SE.MATS.JCEC.txt format)
  - a directory of rMATS files (SE.MATS.JC.txt or SE.MATS.JCEC.txt format), with each file renamed to match its sample, or
  - a .txt file in tab-delimited matrix format (rows = exons, columns = samples, values = PSI [0–1]).
    - Expected header example:
```
GeneID  geneSymbol  chr  strand  exonStart  exonEnd  upstreamEE  downstreamES  sample1  sample2  ...
```
Normal/background matrix — Provide one of:
- background_ — IRIS/GTEx tissue name to auto-download the background matrix (e.g., Brain, AdiposeTissue, …).
  Optional: tissue_download_root to control where the GTEx_<Tissue>/splicing_matrix/ folder is created.
- background_path — Path to a pre-downloaded normal splicing data. Same accepted formats/data files as compare_path.

You must supply either background_ or background__path.

Parameters:

compare_path (str, required) — Path to the target splicing data to be compared.
out_psi_dir (str) — Output directory for per-sample Splitpea files ({sample}-psi.txt). Will be created if missing. If none is given, defaults to create a out_psi folder in the current working directory.
background (str) — IRIS/GTEx tissue name to auto-download the normal data.
Must be one of: AdiposeTissue, AdrenalGland, Bladder, Blood, BloodVessel, Brain, Breast, CervixUteri, Colon, Esophagus, FallopianTube, Heart, Kidney, Liver, Lung, Muscle, Nerve, Ovary, Pancreas, Pituitary, Prostate, SalivaryGland, Skin, SmallIntestine, Spleen, Stomach, Testis, Thyroid, Uterus, Vagina.
background_download_root (str) — Root directory for the IRIS download cache (creates GTEx_<Tissue>/splicing_matrix/ under this path). If not given, defaults to current working directory.
background_path (str) — Path to an existing normal/background splicing data file; bypasses download.
map_path (str) — Gene ID mapping: symbol/entrez/ensembl/uniprot. If none is given, uses bundled mapping.
species (str) — Species identifier (e.g. human, mouse); determines which default map path to use.
single_rMATS_compare (bool, optional) — If True, treat compare_path as a single rMATS file instead of a directory.
single_rMATS_background (bool, optional) — If True, treat background_path as a single rMATS file instead of a directory.
inclevel (int, optional) — Which inclusion-level field to use when parsing rMATS (1 or 2). Defaults to 1.

Outputs (written to out_psi_dir):

{sample}-psi.txt files (one per target sample), each a Splitpea (sample-specific) format table.

`get_consensus_network`

Build consensus loss and gain summary networks from a directory of sample networks (*.pickle from splitpea.run). Chaos edges are excluded; edge weights and counts are accumulated across samples.

Example (Python):

cons_neg, cons_pos = splitpea.get_consensus_network(
    net_dir="output"
)

Parameters:

net_dir (str, required) — Directory containing sample .pickle graphs.

Outputs:

(cons_neg, cons_pos) — two NetworkX graphs:
- cons_neg: edges with weight <= 0, attributes weight (sum) and num_neg (count).
- cons_pos: edges with weight > 0, attributes weight (sum) and num_pos (count).
- Both graphs have graph['num_graphs'] = number of sample graphs combined.
Files written:
- consensus_network_neg.pickle
- consensus_network_pos.pickle

Citation

If you use Splitpea, please cite:

@inproceedings{dannenfelser2023splitpea,
  title={Splitpea: quantifying protein interaction network rewiring changes due to alternative splicing in cancer},
  author={Dannenfelser, Ruth and Yao, Vicky},
  booktitle={Pacific Symposium on Biocomputing 2024},
  pages={579--593},
  year={2023},
  organization={World Scientific}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.0

Mar 17, 2026

0.1.1

Sep 13, 2025

This version

0.1.0

Sep 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

splitpea-0.1.0.tar.gz (15.5 MB view details)

Uploaded Sep 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

splitpea-0.1.0-py3-none-any.whl (15.7 MB view details)

Uploaded Sep 10, 2025 Python 3

File details

Details for the file splitpea-0.1.0.tar.gz.

File metadata

Download URL: splitpea-0.1.0.tar.gz
Upload date: Sep 10, 2025
Size: 15.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for splitpea-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`18cdcb703bf5a906a6bb7a84d3dea910400d7f2261c0d3d7b7aa3d431f17a603`
MD5	`7897f9f60a93cd6beaeba4249e2d1790`
BLAKE2b-256	`88b43d70552ba64f2957c94d65f7563bf021b025b764c5b20c28d6e7b1f64a1f`

See more details on using hashes here.

File details

Details for the file splitpea-0.1.0-py3-none-any.whl.

File metadata

Download URL: splitpea-0.1.0-py3-none-any.whl
Upload date: Sep 10, 2025
Size: 15.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for splitpea-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`10f1efc094ed49f2b01f314b8dbeadd33fc1403eda77b9686079e8bbc9524a0e`
MD5	`eb30d70c343a26083a054bd46a8d5916`
BLAKE2b-256	`4a3ef4f6dc54c49a9cef9833462406cef92f40d6cb64175a27c8f140619cddb6`

See more details on using hashes here.

splitpea 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Splitpea

Installation

Requirements

Features

Two modes

Quick start

Command line

Python API

run

Outputs

Other Subcommands

plot

stats

preprocess_pooled

get_consensus_network

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`run`

`plot`

`stats`

`preprocess_pooled`

`get_consensus_network`