Python-native implementation of selected PhosR-style phosphoproteomics workflows.
Project description
PhosPy
PhosPy is an unofficial Python implementation of selected PhosR-style workflows for phosphoproteomics.
It is designed for people who want a small, Python-native way to:
- preprocess phosphoproteomics tables
- analyse kinase activity from
predMat - run a native kinase workflow from scoring through prediction
PhosPy is deliberately narrow. It is not a full replacement for the R PhosR package.
Install
PhosPy supports Python 3.10 and newer.
Install the supported Python API and the phospy CLI:
pip install phospy
A small note before you start: the file-path examples below use examples/data/..., so they assume you are working from a
repository checkout. If you installed from PyPI, use the same code with paths to your own input files instead.
What You Can Do With PhosPy
Preprocess Phosphoproteomics Data
Start from total and phospho input tables and produce corrected phosphosite matrices for downstream use.
Analyse Kinase Activity From predMat
Generate weighted activity scores, KSEA-style summaries, and target counts from predicted kinase–substrate relationships.
Run a Native Kinase Workflow
Construct substrate profiles, score motifs, combine evidence, select candidates, and perform adaptive SVM-based kinase prediction.
Supported Public API
The stable root-level API for is intentionally small:
PhosphoDatasetPhosRPipelineanalyze_kinase_activityKinaseWorkflow
Returned result dataclasses:
CoreProcessingResultSiteMatrixResultCoreOutputsKinaseActivityResultKinasePredictionResultKinaseWorkflowResult
The examples below use only those imports.
For a compact guide to the supported classes, methods, and result objects, see docs/api.md.
Input Tables at a Glance
PhosPy expects a small, fixed set of input shapes.
Total-proteome table
Required columns:
genesgroup1togroup6
Phosphoproteome table
Required columns:
uidgene_namesgene_p_sitelocalization_probcentralized_sequencep_group1top_group6
gene_p_site must look like GENE_SITE, for example PRKACA_S339.
predMat
predMat must be a numeric matrix with:
- phosphosite IDs as the index, for example
BTK;Y551; - kinase names as columns
- scores in the range
[0, 1]
When you load tables from files, PhosPy normalises input headers to lowercase snake case before validation. For example,
Gene Names and gene-names both become gene_names. That makes file input a little more forgiving, but it also
means loading fails if two raw headers collapse to the same cleaned name.
If you build PhosphoDataset from in-memory pandas data frames instead, those column names are validated as provided.
Quick Start
The quickest way to get started from a source checkout is to use the bundled example data in examples/data/.
Core Preprocessing
from phospy import PhosphoDataset
dataset = PhosphoDataset.from_files(
"examples/data/total.tsv",
"examples/data/phospho.tsv",
phospho_encoding="utf-16le",
)
core = dataset.process_core(max_unmatched_fraction=0.1)
site_matrix = core.site_matrix.matrix
corrected = core.phospho_corrected
For the bundled example data, site_matrix.index.tolist() is ['BTK;Y551;'].
process_core() returns a CoreProcessingResult with:
total_uniquetotal_filteredphospho_filteredphospho_correctedsite_matrix
If your analysis needs explicit pairwise comparisons, pass them when you build the dataset:
from phospy import PhosphoDataset
dataset = PhosphoDataset.from_files(
"examples/data/total.tsv",
"examples/data/phospho.tsv",
phospho_encoding="utf-16le",
comparisons=[("group1", "group4"), ("group2", "group5")],
)
core = dataset.process_core(max_unmatched_fraction=0.1)
If you do not pass comparisons, preprocessing still runs normally and no extra pairwise columns are added.
Downstream Kinase Analysis From predMat
from phospy import PhosphoDataset, analyze_kinase_activity
import pandas as pd
dataset = PhosphoDataset.from_files(
"examples/data/total.tsv",
"examples/data/phospho.tsv",
phospho_encoding="utf-16le",
)
core = dataset.process_core(max_unmatched_fraction=0.1)
pred_mat = pd.read_csv("examples/data/predMat.csv", index_col=0)
kinase = analyze_kinase_activity(
pred_mat=pred_mat,
phospho_matrix=core.site_matrix.matrix,
threshold=0.6,
min_substrates=1,
top_n_substrates=1,
)
target_counts = kinase.target_counts
ksea_scores = kinase.ksea_scores
The bundled example uses min_substrates=1 and top_n_substrates=1 because the example matrix is intentionally tiny.
For larger real datasets, the defaults (min_substrates=3, top_n_substrates=20) are usually the better starting
point.
For the bundled example data, target_counts.to_dict() is {'PRKACA': 3, 'BTK': 2}.
analyze_kinase_activity(...) returns a KinaseActivityResult with:
weighted_activityksea_scoresksea_countstarget_countstarget_table
End-to-End Pipeline
from phospy import PhosRPipeline
pipeline = PhosRPipeline.from_files(
total_path="examples/data/total.tsv",
phospho_path="examples/data/phospho.tsv",
pred_mat_path="examples/data/predMat.csv",
phospho_encoding="utf-16le",
max_unmatched_fraction=0.1,
)
outputs = pipeline.run(outdir="examples/output")
outputs is a CoreOutputs object with:
outputs.coreoutputs.kinase_activity
This writes the core CSV outputs together with downstream kinase-analysis tables, including:
df_total_unique.csvdf_total_filtered.csvdf_phospho_filtered.csvdf_phospho_corrected.csvphosr_input.csvmat_phospho_corrected.csvsite_sequences.csvkinase_activity_matrix.csvksea_scores.csvksea_counts.csvkinase_target_counts.csvkinase_target_table.csv
If you omit pred_mat_path, the pipeline still runs the core preprocessing path and simply skips the downstream
kinase-analysis outputs.
Native End-to-End Kinase Workflow
A complete runnable native-workflow example is included at
examples/native_workflow_demo.py.
If PhosPy is installed in the environment, for example with pip install phospy or pip install -e . from a local
checkout, you can run it directly:
python examples/native_workflow_demo.py
From a local checkout, there is also a Make target that runs the example with the repository src/ path configured for
that shell session:
make native-workflow-demo
That example uses only the supported API and prints a small prediction matrix for a synthetic two-kinase setup.
The native workflow expects:
- a phosphosite matrix
- a
substrate_map site_sequenceskeyed by phosphosite ID when motif scoring is usedmotif_sequencesfor end-to-end motif-aware prediction
site_sequences can be passed as either a mapping keyed by phosphosite ID or a pandas Series with a phosphosite index.
If you want profile-only prediction, pass allow_profile_only_fallback=True and omit motif_sequences.
Command-Line Demo
After installation, you can run the CLI on your own files. The example below uses the bundled tables from a source checkout:
phospy \
--total examples/data/total.tsv \
--phospho examples/data/phospho.tsv \
--pred-mat examples/data/predMat.csv \
--phospho-encoding utf-16le \
--max-unmatched-fraction 0.1 \
--outdir examples/output
The example output directory in examples/output/ shows the generated CSV files.
The CLI currently supports these options:
--totaland--phosphoare required input files--phospho-encodingoptionally overrides the defaultutf-8reader encoding--outdiris the required output directory--pred-matis optional--localization-thresholddefaults to0.75--min-observeddefaults to4--total-sentineldefaults to10.0--phospho-sentineldefaults to12.0--max-unmatched-fractiondefaults to0.0
--max-unmatched-fraction=0.0 means protein correction fails if the inner join would silently drop any phosphosite
rows. Raise it only when you want to allow a small, bounded amount of row loss.
The CLI is intentionally small. It does not currently expose pairwise comparison generation or the native
KinaseWorkflow path.
Validation Rules Worth Knowing
A few checks are especially useful to know up front:
localization_probmust stay within[0, 1].predMatvalues must stay within[0, 1].- file-loaded total and phospho headers are cleaned to lowercase snake case before validation, so duplicate cleaned names are rejected.
predMatand the phosphosite matrix must overlap by at least one phosphosite row, and that overlap must cover at least 10% of the phosphosite matrix.- Protein correction normalises gene identifiers before matching and, by default, refuses to drop unmatched phosphosite rows.
- Site-matrix construction drops rows with missing sequences or incomplete corrected values, then deduplicates repeated phosphosites by keeping the row with the highest mean corrected signal.
- In the native workflow,
motif_sequencesrequire matchingsite_sequences. If you omit motif data entirely, setallow_profile_only_fallback=True.
Where to Go Next
If you want more detail, these are the most useful follow-on docs:
docs/api.mdmaps the supported public APIdocs/validation-and-parity.mdexplains how validation is approached in PhosPydocs/parity.mdexplains what parity means here, especially for the native kinase workflowdocs/fixtures.mdmaps the committed fixture and trace directoriesdocs/roadmap.mdoutlines the most likely next stepsCHANGELOG.mdcontains the release notes
If you want to contribute or work from a local checkout, see CONTRIBUTING.md.
Known Limitations
A few boundaries are worth knowing up front:
- Selective scope only. PhosPy covers the workflows documented above and nothing broader.
- Parity is seam-level, not package-wide. Validation claims are limited to the committed fixture-backed seams
described in
docs/validation-and-parity.mdanddocs/parity.md. KinaseWorkflowis native first. It includes ansvm_mode="r_parity"option for narrower learner-seam comparison, but the default mode is the preferred Python-native path and is not claimed to numerically match every PhosR result.- The CLI is intentionally small. It covers the core preprocessing and
predMat-driven downstream path. The native kinase workflow is currently exposed through the Python API and example script. - R is only required for fixture regeneration. You do not need R to install PhosPy or run the committed Python test suite.
For Contributors
Most users can ignore this section.
To work from a local checkout:
pip install -e .
To run tests:
pip install -e ".[test]"
pytest -m "not parity"
pytest -m parity
If you want the parity suite to print its optional comparison metrics while you debug a seam, these environment variables are available:
PHOSPY_SHOW_PARITY: master switch for parity metrics outputPHOSPY_SHOW_PROFILE_CONSTRUCTION: also print the optional profile-construction metricsPHOSPY_SHOW_PREDICTION_MODE_COMPARISON: also print default-versus-r_parityprediction comparison metricsPHOSPY_SHOW_REPLAYED_PREDICTION_MODE_COMPARISON: also print replayed prediction comparison metrics
The three more specific flags only do anything when PHOSPY_SHOW_PARITY is enabled first. Truthy values are
case-insensitive and include 1, true, yes, and on.
To see the printed summaries in the terminal, run pytest with -s (or --capture=no). If you enable all four flags
and run the full parity suite, PhosPy prints every available metrics block reached by those tests.
Linux or macOS quick example:
PHOSPY_SHOW_PARITY=1 PHOSPY_SHOW_PROFILE_CONSTRUCTION=1 PHOSPY_SHOW_PREDICTION_MODE_COMPARISON=1 PHOSPY_SHOW_REPLAYED_PREDICTION_MODE_COMPARISON=1 pytest -m parity -s
For Linux/macOS, PowerShell, and Command Prompt examples together with a sample of the bundled parity output, see
docs/parity.md.
To run the usual contributor checks:
pip install -e ".[dev]"
pre-commit install
pre-commit run --all-files
R Requirements for Fixture Regeneration
The committed parity fixtures are already included in the repository. You only need R if you want to regenerate or extend them.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phospy-1.1.1.tar.gz.
File metadata
- Download URL: phospy-1.1.1.tar.gz
- Upload date:
- Size: 88.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
229c7a3015108cbfab12e389e055acd8275987c8fe80dd1e3aaa0489441ac32a
|
|
| MD5 |
3dc53daacb9aef51c8977788223e18c2
|
|
| BLAKE2b-256 |
6f3356d9c8b4e140a5d79cdad4f8e787c7074189409137e65d177e45019cdac4
|
Provenance
The following attestation bundles were made for phospy-1.1.1.tar.gz:
Publisher:
publish.yml on falconsmilie/phospy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phospy-1.1.1.tar.gz -
Subject digest:
229c7a3015108cbfab12e389e055acd8275987c8fe80dd1e3aaa0489441ac32a - Sigstore transparency entry: 1203686656
- Sigstore integration time:
-
Permalink:
falconsmilie/phospy@e285f7510cf84bfd8be9db3f41af84d60988b5c6 -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/falconsmilie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e285f7510cf84bfd8be9db3f41af84d60988b5c6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file phospy-1.1.1-py3-none-any.whl.
File metadata
- Download URL: phospy-1.1.1-py3-none-any.whl
- Upload date:
- Size: 69.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6409a23298582d1f522d8bc20f104792c52c0e2002591123997a268bf6f60b79
|
|
| MD5 |
4a9ccc1c75a92a7e053312e2b12ee227
|
|
| BLAKE2b-256 |
ce17281153d66f77f7781d23f5cdc2cc22587dbd22d8dca4974136c3ef85bfaf
|
Provenance
The following attestation bundles were made for phospy-1.1.1-py3-none-any.whl:
Publisher:
publish.yml on falconsmilie/phospy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phospy-1.1.1-py3-none-any.whl -
Subject digest:
6409a23298582d1f522d8bc20f104792c52c0e2002591123997a268bf6f60b79 - Sigstore transparency entry: 1203686658
- Sigstore integration time:
-
Permalink:
falconsmilie/phospy@e285f7510cf84bfd8be9db3f41af84d60988b5c6 -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/falconsmilie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e285f7510cf84bfd8be9db3f41af84d60988b5c6 -
Trigger Event:
push
-
Statement type: