Python-native implementation of selected PhosR-style phosphoproteomics workflows.
Project description
PhosPy
PhosPy 1.0.0 is an unofficial Python implementation of selected PhosR-style workflows for phosphoproteomics. PhosPy is
a deliberately narrow, Python-native subset with test-backed validation at defined seams. It is not presented as a
full replacement for the R PhosR package.
Preprocess Phosphoproteomics Data
Start from total and phospho input tables and produce corrected phosphosite matrices for downstream use.
Analyse Kinase Activity From predMat
Generate weighted activity scores, KSEA-style summaries, and target counts from predicted kinase–substrate relationships.
Run a Native Kinase Workflow
Construct substrate profiles, score motifs, combine evidence, select candidates, and perform adaptive SVM-based kinase prediction.
Install
Install From PyPI
Install the supported root-level API and the phospy CLI.
pip install phospy
Install From Source
Use an editable installation only when you want to work from a local checkout:
pip install -e .
Test Dependencies
To run the test suite from a local checkout:
pip install -e ".[test]"
pytest -m "not parity"
pytest -m parity
Development Checks
To run linting and other contributor checks from a local checkout:
pip install -e ".[dev]"
pre-commit install
pre-commit run --all-files
R Requirements for Fixture Regeneration
The committed parity fixtures are already included in the repository. You only need R when you want to regenerate or extend those fixtures.
The current scripts use these R packages:
PhosRSummarizedExperimente1071readrdplyrtidyrtibblejanitor
A practical greenfield setup in R is:
install.packages(c("BiocManager", "devtools", "e1071", "readr", "dplyr", "tidyr", "tibble", "janitor"))
BiocManager::install("SummarizedExperiment")
devtools::install_github("PYangLab/PhosR")
The bundled fixture scripts check for the packages they need and stop with a clear error if anything is missing.
To regenerate the committed R reference fixtures:
Rscript scripts/generate_r_fixtures.R
Rscript scripts/generate_r_l6_fixtures.R
Supported Public API for 1.0.0
The supported root-level public API is intentionally small:
PhosphoDatasetPhosRPipelineKinaseActivityAnalyzerKinaseWorkflow- result dataclasses returned by those classes:
CoreProcessingResultSiteMatrixResultCoreOutputsKinaseActivityResultKinasePredictionResultKinaseWorkflowResult
Examples below use only those supported root imports. Lower-level submodule imports may still exist for internal use and testing, but they are not part of the stable public API unless documented here.
Quick Start
The quickest path is to use the bundled example data under examples/data/.
Core Preprocessing
from phospy import PhosphoDataset
dataset = PhosphoDataset.from_files(
"examples/data/total.tsv",
"examples/data/phospho.tsv",
)
core = dataset.process_core(max_unmatched_fraction=0.1)
site_matrix = core.site_matrix.matrix
corrected = core.phospho_corrected
For the bundled example data, site_matrix.index.tolist() is ['BTK;Y551;'].
If your analysis needs pairwise comparisons, pass them explicitly:
from phospy import PhosphoDataset
dataset = PhosphoDataset.from_files(
"examples/data/total.tsv",
"examples/data/phospho.tsv",
comparisons=[("group1", "group4"), ("group2", "group5")],
)
core = dataset.process_core(max_unmatched_fraction=0.1)
Downstream Kinase Analysis From predMat
from phospy import KinaseActivityAnalyzer, PhosphoDataset
dataset = PhosphoDataset.from_files(
"examples/data/total.tsv",
"examples/data/phospho.tsv",
)
core = dataset.process_core(max_unmatched_fraction=0.1)
analyzer = KinaseActivityAnalyzer.from_csv("examples/data/predMat.csv")
kinase = analyzer.analyze(
core.site_matrix.matrix,
threshold=0.6,
min_substrates=1,
top_n_substrates=1,
)
target_counts = kinase.target_counts
ksea_scores = kinase.ksea_scores
For the bundled example data, target_counts.to_dict() is {'PRKACA': 3, 'BTK': 2}.
End-to-End Pipeline
from phospy import PhosRPipeline
pipeline = PhosRPipeline.from_files(
total_path="examples/data/total.tsv",
phospho_path="examples/data/phospho.tsv",
pred_mat_path="examples/data/predMat.csv",
max_unmatched_fraction=0.1,
)
outputs = pipeline.run(outdir="examples/output")
This writes the core CSV outputs plus downstream kinase-analysis tables, including kinase_target_table.csv.
Native End-to-End Kinase Workflow
A complete runnable native-workflow example lives at
examples/native_workflow_demo.py:
python examples/native_workflow_demo.py
That example uses only the supported 1.0.0 root API and prints a small prediction matrix for a synthetic two-kinase setup.
CLI Demo
A small synthetic dataset is included.
After installation, you can run:
phospy \
--total examples/data/total.tsv \
--phospho examples/data/phospho.tsv \
--pred-mat examples/data/predMat.csv \
--max-unmatched-fraction 0.1 \
--outdir examples/output
The example output directory under examples/output/ shows the generated CSV files.
--max-unmatched-fraction defaults to 0.0, which means protein correction fails if the inner join would silently drop
any phosphosite rows. Raise it only when you deliberately want to allow a bounded amount of row loss.
Testing, Validation, and Release Gate
The 1.0.0 release gate is intentionally simple:
pre-commit run --all-files
pytest -m "not parity"
pytest -m parity
That gate covers:
- linting and formatting via
pre-commit - the regular Python test suite, including the documented example smoke test
- the parity suite against the committed R-backed fixtures
Supporting documentation:
docs/validation-and-parity.mdexplains the validation layers, release gate, and test commandsdocs/parity.mdexplains what parity means here, especially for the native kinase workflowdocs/fixtures.mdexplains the fixture and trace directories, generation commands, and which outputs are committed reference datadocs/roadmap.mdexplains the most likely next expansion areas after 1.0.0CONTRIBUTING.mdcovers local setup, linting, tests, and CI expectationsCHANGELOG.mdcontains the 1.0.0 release notes
Roadmap
docs/roadmap.md sets out the most credible next steps after 1.0.0. The short version is that
PhosPy is most likely to grow by extending the native workflow surface already in the repository: CLI support for
KinaseWorkflow, broader seam-level validation around the native workflow, better trace tooling, and carefully chosen
PhosR-inspired ports that fit the current narrow scope.
Known Limitations
Before you adopt PhosPy, the important boundaries are:
- Selective scope only. PhosPy 1.0.0 covers the workflows documented above and nothing broader.
- Parity is seam-level, not package-wide. Validation claims are limited to the committed fixture-backed seams
described in
docs/validation-and-parity.mdanddocs/parity.md. KinaseWorkflowis native first. It includes ansvm_mode="r_parity"option for narrower learner-seam comparisons, but the default mode is the preferred Python-native path and is not claimed to numerically match every PhosR result.- The CLI is intentionally small. It covers the core preprocessing and
predMat-driven downstream path. The native kinase workflow is currently exposed through the Python API and example script. - R is only required for fixture regeneration. Running the committed Python test suite does not require R, but regenerating the R reference fixtures does.
Attribution
All scientific credit for the original methods, package design, and biological workflow belongs to the PhosR authors and maintainers.
Please cite and acknowledge the original PhosR work when using this repository:
- Kim, H. J., Kim, T., Hoffman, N. J., Xiao, D., James, D. E., Humphrey, S. J., & Yang, P. (2021). PhosR enables processing and functional analysis of phosphoproteomic data. Cell Reports, 34(8), 108771.
- Kim, H., Kim, T., Xiao, D., & Yang, P. (2021). Protocol for the processing and downstream analysis of phosphoproteomic data with PhosR. STAR Protocols, 2(2), 100585.
- Original R package:
PYangLab/PhosR
PhosPy should be described as an unofficial implementation unless and until the original PhosR authors choose to endorse or participate in it.
License
This repository is distributed under the GNU General Public License v3.0 only (GPL-3.0-only). See
LICENSE.
That choice is deliberate. PhosR is distributed under GPL-3, and the GNU GPL FAQ treats translation of a program into another programming language as a kind of modification or translation under copyright law. This project therefore uses GPL-3.0-only as the conservative licensing position for a Python implementation of selected PhosR-style workflows.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phospy-1.0.0.tar.gz.
File metadata
- Download URL: phospy-1.0.0.tar.gz
- Upload date:
- Size: 69.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94d894fde2239e58f8116aa32d518115acdbaa8c85c2e8e862a2074025d8d807
|
|
| MD5 |
8f0770f2ddcd33a0f3f28050af8bc3ee
|
|
| BLAKE2b-256 |
d69ca43a7098ba9f6bb07396fb677226c9a804aa0853b585081abd27dd82f964
|
Provenance
The following attestation bundles were made for phospy-1.0.0.tar.gz:
Publisher:
publish.yml on falconsmilie/phospy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phospy-1.0.0.tar.gz -
Subject digest:
94d894fde2239e58f8116aa32d518115acdbaa8c85c2e8e862a2074025d8d807 - Sigstore transparency entry: 1185760511
- Sigstore integration time:
-
Permalink:
falconsmilie/phospy@770a3b0049e3d1d54fcbbbf55ef4d3a9b2501d9b -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/falconsmilie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@770a3b0049e3d1d54fcbbbf55ef4d3a9b2501d9b -
Trigger Event:
push
-
Statement type:
File details
Details for the file phospy-1.0.0-py3-none-any.whl.
File metadata
- Download URL: phospy-1.0.0-py3-none-any.whl
- Upload date:
- Size: 50.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1df8a81aa3c3028c2fc848408e10d47b18b0fbaa3b7d5f74b377c5d10cd71af
|
|
| MD5 |
b69ccf4bd009c740752110dae526ce46
|
|
| BLAKE2b-256 |
207288d4f286023094f810f8eb96e8f6c30fd5da57eda34279e335db7c02c12a
|
Provenance
The following attestation bundles were made for phospy-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on falconsmilie/phospy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phospy-1.0.0-py3-none-any.whl -
Subject digest:
f1df8a81aa3c3028c2fc848408e10d47b18b0fbaa3b7d5f74b377c5d10cd71af - Sigstore transparency entry: 1185760513
- Sigstore integration time:
-
Permalink:
falconsmilie/phospy@770a3b0049e3d1d54fcbbbf55ef4d3a9b2501d9b -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/falconsmilie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@770a3b0049e3d1d54fcbbbf55ef4d3a9b2501d9b -
Trigger Event:
push
-
Statement type: