Splitpea: method for calculating network rewiring changes due to splicing
Project description
Splitpea
This is the Python package implementation of Splitpea: SPLicing InTeractions PErsonAlized.
Original repository: https://github.com/ylaboratory/splitpea
Repository for this package's code: https://github.com/ylaboratory/splitpea_package
This pip package is an extended version with easier installation and additional functions.
Splitpea quantifies rewiring in protein-protein interaction (PPI) networks driven by alternative splicing events. It integrates differential exon usage (PSI values) with domain-domain interactions (DDIs) and PPIs to generate condition or sample specific networks.
Installation
Install from PyPI:
pip install splitpea
Some functions require tabix to be installed; to do so, simply run:
sudo apt-get install tabix
Alternatively, if you are using conda, you can install tabix via:
conda install -c bioconda tabix
Functions that depend on tabix will raise an error if tabix is not found.
Requirements
- Python >= 3.8
- Packages:
numpy,networkx,intervaltree,ipykernel,matplotlib,plotnine,scikit-learn,adjustText,importlib_resources,requests
Features
- Adjustable PSI and significance thresholds
- Bundled reference files, with options for customization
- Species-specific references: human (default) or mouse
- Main Outputs:
.edges.dat: Edge list with weights and if an edge is chaos.edges.pickle: NetworkX graph object
- Additional features and outputs:
- Edge rewiring summaries
- Gene level statistics
- Auto-generated network plots
- Gephi-compatible TSV files
- Cytoscape-compatible gml files
Two modes
-
Sample-specific mode (
differential_format=sample_specific)- Input: a differential exon table for a single sample.
- Tab-delimited header:
ensembl.id symbol chr strand exon.start exon.end psi.background psi.sample delta.psi pval - Can be generated via
preprocess_pooled(see below) from two splicing matrices: one for background samples and another for samples to compare individually against it. Supports SUPPA2 (.psi) and rMATS formats (SE.MATS.JC.txt or SE.MATS.JCEC.txt).
-
Condition-specific mode (
differential_format=suppa2orrmats)- SUPPA2: provide both
.psivecand.dpsi(order agnostic) - rMATS: use
SE.MATS.JC.txtorSE.MATS.JCEC.txt
- SUPPA2: provide both
Currently, only skipped exon (SE) events are supported.
Quick start
Splitpea can be run from the command line or within Python.
Main subcommands:
run— Build a rewired networkplot— Visualize a saved networkstats— Compute edge/gene statisticspreprocess_pooled— Generate the differential exon table for sample-specific modeget_consensus_network- Generate summary consensus networks from a directory of rewired networks
Command line
Examples:
# Sample-specific mode
splitpea run sample1-psi.txt out/sample1
# Condition-specific mode (SUPPA2)
splitpea run diffSplice.psivec diffSplice.dpsi out/condA \
--differential_format suppa2
# Condition-specific mode (rMATS)
splitpea run SE.MATS.JCEC.txt out/condB \
--differential_format rmats
Note: The second argument (
out/sample1,out/condA, etc.) is the output file prefix.
Python API
Examples:
import splitpea
# Sample-specific mode
G = splitpea.run("sample1-psi.txt", "out/sample1")
# Condition-specific mode (SUPPA2)
G = splitpea.run(["diffSplice.psivec", "diffSplice.dpsi"], "out/condA",
differential_format="suppa2")
# Condition-specific mode (rMATS)
G = splitpea.run("SE.MATS.JCEC.txt", "out/condB",
differential_format="rmats")
run
Required:
in_file—sample_specific: one file pathsuppa2: two file paths (.psivecand.dpsi, typically outputed from diffSplice)rmats: one JC or JCEC file path
Options:
out_file_prefix— Prefix for all output files (directory + base name); if no prefix is given, the output prefix defaults toout_rewired_network--differential_format {sample_specific,suppa2,rmats}(default:sample_specific)--skip(int) — Number of lines to skip in input file (default:1)--dpsi_cut(float) — Delta PSI cutoff (default:0.05)--sigscore_cut(float) — Significance score cutoff (default:0.05)--include_nas(bool) — Include NAs in significance testing (default:True)--index {0,1}— Coordinate indexing (default: auto-set per format)--map_path— Custom mapping file between IDs and symbols (tab-delimited with headers:symbol entrez ensembl uniprot)--edge_stats_file— Append gain/loss/chaos counts to a stats file
Reference files
Splitpea ships bundled reference datasets for human and mouse. These load automatically based on --species unless you override paths.
--species {human,mouse}— Which bundled references to use (default:human)ppif— protein–protein interactionsddif— domain–domain interactionsentrezpfamf- Entrez–Pfam mapping*pfamcoordsf- Pfam genome coordinatestbf- Tabix index ( usually the .tbi for pfamcoordsf)map_path- Gene ID mapping: symbol/entrez/ensembl/uniprot
To override default bundled references, pass custom paths to the corresponding parameters.
Outputs
For each run (out_file_prefix), Splitpea produces:
<prefix>.edges.dat— edge list withnode1 node2 weight chaos<prefix>.edges.pickle— NetworkX graph
Python API:
splitpea.run(...)returns the rewirednetworkx.Graph.
Other Subcommands
plot
Load a saved network (.edges.pickle) and export it in various formats or as a plot.
Example (Python):
splitpea.plot(
pickle_path="out/sample1.edges.pickle",
pdf_path="plot/sample1_plot.pdf", # optional
gephi_path="plot/sample1_gephi.csv", # optional
cytoscape_path="plot/sample1_cyto.gml", # optional
with_labels=True,
symbol=True,
map_path=None, # use bundled mapping if None
species="human",
self_edges=False,
lcc=True
)
Parameters:
pickle_path(required) — Path to.edges.picklefile generated bysplitpea run.--with_labels— Draw node labels in plots. Default:False.--pdf_path— Path to save a PDF of the plotted network via Matplotlib. Omit to skip PDF.--gephi_path— Path to save a Gephi-compatible TSV file. Omit to skip.--cytoscape_path— Path to save a Cytoscape-compatible.gmlfile. Omit to skip.--symbol— IfTrue(default), replace Entrez IDs with gene symbols.--map_path— Path to custom gene ID mapping file. Defaults to bundled mapping if not provided.--species— Species for mapping defaults (humanormouse).--self_edges— IfTrue, keep self-loop edges in output. Default:False.--lcc— IfTrue(default), plot only the largest connected component.--max_nodes— Maximum nodes allowed in matplotlib plotting; larger graphs skip matplotlib. Default:2000.--max_edges— Maximum edges allowed in matplotlib plotting; larger graphs skip matplotlib. Default:10000.
Outputs (depending on args):
- PDF plot, Gephi TSV, and/or Cytoscape GML.
stats
Summarize edge counts and write per-gene statistics.
Example (Python):
splitpea.stats(
dat_file="out/sample1.edges.dat",
rewire_net="out/sample1.edges.pickle",
out_file_prefix="stats/sample1",
species="human",
map_path=None, # use bundled if None
ppif=None, ddif=None, entrezpfamf=None # override to use custom references
)
Parameters:
dat_file(required) — Path to.edges.datfile generated bysplitpea.run.rewire_net(required) — Path to.edges.picklefile for the same network.out_file_prefix(required) — Prefix for output files (e.g.,stats/sample1).ppif— Path to PPI reference file. Defaults to bundled species reference.ddif— Path to DDI reference file. Defaults to bundled species reference.entrezpfamf— Path to Entrez–Pfam mapping file. Defaults to bundled species reference.map_path— Path to gene ID mapping file. Defaults to bundled mapping if not provided.species— Species for selecting default references (humanormouse).
Outputs:
- Console printout of:
- Gain — number of edges gained in rewired network
- Loss — number of edges lost in rewired network
- Chaos — number of chaos edges
<out_file_prefix>_gene_stats.csvcontaining:- Entrez ID
- Gene symbol
- Node degree in rewired network
- Normalized degree relative to background PPI network
- Counts of gain/loss/chaos edges incident to the gene
preprocess_pooled
Helper function that builds sample-specific Splitpea inputs by comparing each target sample to a pooled normal background.
It either downloads a normal splicing matrix from IRIS (GTEx) for a chosen tissue or a user can provide their own normal matrix.
You can provide either a .txt file from rMATS output files (SE.MATS.JC.txt or SE.MATS.JCEC.txt) for a single sample or a folder containing multiple samples and rMATS output files (SE.MATS.JC.txt or SE.MATS.JCEC.txt). The function will automatically process these inputs. For rMATS, it may be helpful to rename each file in the folder to match your sample names.
What the function does:
- Load a target splicing matrix (
compare_path) with exon rows and sample columns (PSI values). - Obtain a normal/background splicing matrix:
- Either download GTEx
<Tissue>from IRIS (ifbackgroundis given), or - Use your local file (
background_path).
- Either download GTEx
- Compute mean PSI per exon for the background.
- For each target sample, compute delta PSI vs. background and a p-value.
- Write per-sample Splitpea-ready files (
{sample}-psi.txt) intoout_psi_dir.
Example (Python):
# Option A: download and use IRIS/GTEx matrix
splitpea.preprocess_pooled(
compare_path="compare.txt",
background="Brain",
background_download_root="/data/iris_cache", # creates GTEx_<Tissue>/splicing_matrix/ here
out_psi_dir="out_psi"
)
# Option B: use a local normal matrix
splitpea.preprocess_pooled(
compare_path="compare.txt",
background_path="/path/to/GTEx_Brain/splicing_matrix.txt",
out_psi_dir="out_psi"
)
Inputs:
compare_path(required) — Target (case) splicing matrix to compare against the pooled normal.- Can be:
- a
.txtfile from rMATS (SE.MATS.JC.txtorSE.MATS.JCEC.txtformat) - a directory of rMATS files (
SE.MATS.JC.txtorSE.MATS.JCEC.txtformat), with each file renamed to match its sample, or - a
.txtfile in tab-delimited matrix format (rows = exons, columns = samples, values = PSI [0–1]).- Expected header example:
GeneID geneSymbol chr strand exonStart exonEnd upstreamEE downstreamES sample1 sample2 ...
- Expected header example:
- a
- Can be:
- Normal/background matrix — Provide one of:
background_— IRIS/GTEx tissue name to auto-download the background matrix (e.g.,Brain,AdiposeTissue, …).
Optional:tissue_download_rootto control where theGTEx_<Tissue>/splicing_matrix/folder is created.background_path— Path to a pre-downloaded normal splicing data. Same accepted formats/data files ascompare_path.
You must supply either
background_orbackground__path.
Parameters:
compare_path(str, required) — Path to the target splicing data to be compared.out_psi_dir(str) — Output directory for per-sample Splitpea files ({sample}-psi.txt). Will be created if missing. If none is given, defaults to create a out_psi folder in the current working directory.background(str) — IRIS/GTEx tissue name to auto-download the normal data.
Must be one of:AdiposeTissue,AdrenalGland,Bladder,Blood,BloodVessel,Brain,Breast,CervixUteri,Colon,Esophagus,FallopianTube,Heart,Kidney,Liver,Lung,Muscle,Nerve,Ovary,Pancreas,Pituitary,Prostate,SalivaryGland,Skin,SmallIntestine,Spleen,Stomach,Testis,Thyroid,Uterus,Vagina.background_download_root(str) — Root directory for the IRIS download cache (createsGTEx_<Tissue>/splicing_matrix/under this path). If not given, defaults to current working directory.background_path(str) — Path to an existing normal/background splicing data file; bypasses download.map_path(str) — Gene ID mapping: symbol/entrez/ensembl/uniprot. If none is given, uses bundled mapping.species(str) — Species identifier (e.g.human,mouse); determines which default map path to use.single_rMATS_compare(bool, optional) — If True, treatcompare_pathas a single rMATS file instead of a directory.single_rMATS_background(bool, optional) — If True, treatbackground_pathas a single rMATS file instead of a directory.inclevel(int, optional) — Which inclusion-level field to use when parsing rMATS (1 or 2). Defaults to 1.
Outputs (written to out_psi_dir):
{sample}-psi.txtfiles (one per target sample), each a Splitpea (sample-specific) format table.
get_consensus_network
Build consensus loss and gain summary networks from a directory of sample networks (*.pickle from splitpea.run). Chaos edges are excluded; edge weights and counts are accumulated across samples.
Example (Python):
cons_neg, cons_pos = splitpea.get_consensus_network(
net_dir="output"
)
Parameters:
net_dir(str, required) — Directory containing sample .pickle graphs.
Outputs:
(cons_neg, cons_pos)— two NetworkX graphs:cons_neg: edges withweight <= 0, attributesweight(sum) andnum_neg(count).cons_pos: edges withweight > 0, attributesweight(sum) andnum_pos(count).- Both graphs have
graph['num_graphs']= number of sample graphs combined.
- Files written:
consensus_network_neg.pickleconsensus_network_pos.pickle
Citation
If you use Splitpea, please cite:
@inproceedings{dannenfelser2023splitpea,
title={Splitpea: quantifying protein interaction network rewiring changes due to alternative splicing in cancer},
author={Dannenfelser, Ruth and Yao, Vicky},
booktitle={Pacific Symposium on Biocomputing 2024},
pages={579--593},
year={2023},
organization={World Scientific}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file splitpea-0.1.0.tar.gz.
File metadata
- Download URL: splitpea-0.1.0.tar.gz
- Upload date:
- Size: 15.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18cdcb703bf5a906a6bb7a84d3dea910400d7f2261c0d3d7b7aa3d431f17a603
|
|
| MD5 |
7897f9f60a93cd6beaeba4249e2d1790
|
|
| BLAKE2b-256 |
88b43d70552ba64f2957c94d65f7563bf021b025b764c5b20c28d6e7b1f64a1f
|
File details
Details for the file splitpea-0.1.0-py3-none-any.whl.
File metadata
- Download URL: splitpea-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10f1efc094ed49f2b01f314b8dbeadd33fc1403eda77b9686079e8bbc9524a0e
|
|
| MD5 |
eb30d70c343a26083a054bd46a8d5916
|
|
| BLAKE2b-256 |
4a3ef4f6dc54c49a9cef9833462406cef92f40d6cb64175a27c8f140619cddb6
|