Skip to main content

Python-native implementation of selected PhosR-style phosphoproteomics workflows.

Project description

PhosPy

PhosPy 1.0.0 is an unofficial Python implementation of selected PhosR-style workflows for phosphoproteomics. PhosPy is a deliberately narrow, Python-native subset with test-backed validation at defined seams. It is not presented as a full replacement for the R PhosR package.

Preprocess Phosphoproteomics Data

Start from total and phospho input tables and produce corrected phosphosite matrices for downstream use.

Analyse Kinase Activity From predMat

Generate weighted activity scores, KSEA-style summaries, and target counts from predicted kinase–substrate relationships.

Run a Native Kinase Workflow

Construct substrate profiles, score motifs, combine evidence, select candidates, and perform adaptive SVM-based kinase prediction.

Install

Install From PyPI

Install the supported root-level API and the phospy CLI.

pip install phospy

Install From Source

Use an editable installation only when you want to work from a local checkout:

pip install -e .

Test Dependencies

To run the test suite from a local checkout:

pip install -e ".[test]"
pytest -m "not parity"
pytest -m parity

Development Checks

To run linting and other contributor checks from a local checkout:

pip install -e ".[dev]"
pre-commit install
pre-commit run --all-files

R Requirements for Fixture Regeneration

The committed parity fixtures are already included in the repository. You only need R when you want to regenerate or extend those fixtures.

The current scripts use these R packages:

  • PhosR
  • SummarizedExperiment
  • e1071
  • readr
  • dplyr
  • tidyr
  • tibble
  • janitor

A practical greenfield setup in R is:

install.packages(c("BiocManager", "devtools", "e1071", "readr", "dplyr", "tidyr", "tibble", "janitor"))
BiocManager::install("SummarizedExperiment")
devtools::install_github("PYangLab/PhosR")

The bundled fixture scripts check for the packages they need and stop with a clear error if anything is missing.

To regenerate the committed R reference fixtures:

Rscript scripts/generate_r_fixtures.R
Rscript scripts/generate_r_l6_fixtures.R

Supported Public API for 1.0.0

The supported root-level public API is intentionally small:

  • PhosphoDataset
  • PhosRPipeline
  • KinaseActivityAnalyzer
  • KinaseWorkflow
  • result dataclasses returned by those classes:
    • CoreProcessingResult
    • SiteMatrixResult
    • CoreOutputs
    • KinaseActivityResult
    • KinasePredictionResult
    • KinaseWorkflowResult

Examples below use only those supported root imports. Lower-level submodule imports may still exist for internal use and testing, but they are not part of the stable public API unless documented here.

Quick Start

The quickest path is to use the bundled example data under examples/data/.

Core Preprocessing

from phospy import PhosphoDataset

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
)
core = dataset.process_core(max_unmatched_fraction=0.1)

site_matrix = core.site_matrix.matrix
corrected = core.phospho_corrected

For the bundled example data, site_matrix.index.tolist() is ['BTK;Y551;'].

If your analysis needs pairwise comparisons, pass them explicitly:

from phospy import PhosphoDataset

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    comparisons=[("group1", "group4"), ("group2", "group5")],
)
core = dataset.process_core(max_unmatched_fraction=0.1)

Downstream Kinase Analysis From predMat

from phospy import KinaseActivityAnalyzer, PhosphoDataset

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
)
core = dataset.process_core(max_unmatched_fraction=0.1)

analyzer = KinaseActivityAnalyzer.from_csv("examples/data/predMat.csv")
kinase = analyzer.analyze(
    core.site_matrix.matrix,
    threshold=0.6,
    min_substrates=1,
    top_n_substrates=1,
)

target_counts = kinase.target_counts
ksea_scores = kinase.ksea_scores

For the bundled example data, target_counts.to_dict() is {'PRKACA': 3, 'BTK': 2}.

End-to-End Pipeline

from phospy import PhosRPipeline

pipeline = PhosRPipeline.from_files(
    total_path="examples/data/total.tsv",
    phospho_path="examples/data/phospho.tsv",
    pred_mat_path="examples/data/predMat.csv",
    max_unmatched_fraction=0.1,
)
outputs = pipeline.run(outdir="examples/output")

This writes the core CSV outputs plus downstream kinase-analysis tables, including kinase_target_table.csv.

Native End-to-End Kinase Workflow

A complete runnable native-workflow example lives at examples/native_workflow_demo.py:

python examples/native_workflow_demo.py

That example uses only the supported 1.0.0 root API and prints a small prediction matrix for a synthetic two-kinase setup.

CLI Demo

A small synthetic dataset is included.

After installation, you can run:

phospy \
  --total examples/data/total.tsv \
  --phospho examples/data/phospho.tsv \
  --pred-mat examples/data/predMat.csv \
  --max-unmatched-fraction 0.1 \
  --outdir examples/output

The example output directory under examples/output/ shows the generated CSV files.

--max-unmatched-fraction defaults to 0.0, which means protein correction fails if the inner join would silently drop any phosphosite rows. Raise it only when you deliberately want to allow a bounded amount of row loss.

Testing, Validation, and Release Gate

The 1.0.0 release gate is intentionally simple:

pre-commit run --all-files
pytest -m "not parity"
pytest -m parity

That gate covers:

  • linting and formatting via pre-commit
  • the regular Python test suite, including the documented example smoke test
  • the parity suite against the committed R-backed fixtures

Supporting documentation:

  • docs/validation-and-parity.md explains the validation layers, release gate, and test commands
  • docs/parity.md explains what parity means here, especially for the native kinase workflow
  • docs/fixtures.md explains the fixture and trace directories, generation commands, and which outputs are committed reference data
  • docs/roadmap.md explains the most likely next expansion areas after 1.0.0
  • CONTRIBUTING.md covers local setup, linting, tests, and CI expectations
  • CHANGELOG.md contains the 1.0.0 release notes

Roadmap

docs/roadmap.md sets out the most credible next steps after 1.0.0. The short version is that PhosPy is most likely to grow by extending the native workflow surface already in the repository: CLI support for KinaseWorkflow, broader seam-level validation around the native workflow, better trace tooling, and carefully chosen PhosR-inspired ports that fit the current narrow scope.

Known Limitations

Before you adopt PhosPy, the important boundaries are:

  • Selective scope only. PhosPy 1.0.0 covers the workflows documented above and nothing broader.
  • Parity is seam-level, not package-wide. Validation claims are limited to the committed fixture-backed seams described in docs/validation-and-parity.md and docs/parity.md.
  • KinaseWorkflow is native first. It includes an svm_mode="r_parity" option for narrower learner-seam comparisons, but the default mode is the preferred Python-native path and is not claimed to numerically match every PhosR result.
  • The CLI is intentionally small. It covers the core preprocessing and predMat-driven downstream path. The native kinase workflow is currently exposed through the Python API and example script.
  • R is only required for fixture regeneration. Running the committed Python test suite does not require R, but regenerating the R reference fixtures does.

Attribution

All scientific credit for the original methods, package design, and biological workflow belongs to the PhosR authors and maintainers.

Please cite and acknowledge the original PhosR work when using this repository:

  • Kim, H. J., Kim, T., Hoffman, N. J., Xiao, D., James, D. E., Humphrey, S. J., & Yang, P. (2021). PhosR enables processing and functional analysis of phosphoproteomic data. Cell Reports, 34(8), 108771.
  • Kim, H., Kim, T., Xiao, D., & Yang, P. (2021). Protocol for the processing and downstream analysis of phosphoproteomic data with PhosR. STAR Protocols, 2(2), 100585.
  • Original R package: PYangLab/PhosR

PhosPy should be described as an unofficial implementation unless and until the original PhosR authors choose to endorse or participate in it.

License

This repository is distributed under the GNU General Public License v3.0 only (GPL-3.0-only). See LICENSE.

That choice is deliberate. PhosR is distributed under GPL-3, and the GNU GPL FAQ treats translation of a program into another programming language as a kind of modification or translation under copyright law. This project therefore uses GPL-3.0-only as the conservative licensing position for a Python implementation of selected PhosR-style workflows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phospy-1.0.0.tar.gz (69.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phospy-1.0.0-py3-none-any.whl (50.3 kB view details)

Uploaded Python 3

File details

Details for the file phospy-1.0.0.tar.gz.

File metadata

  • Download URL: phospy-1.0.0.tar.gz
  • Upload date:
  • Size: 69.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phospy-1.0.0.tar.gz
Algorithm Hash digest
SHA256 94d894fde2239e58f8116aa32d518115acdbaa8c85c2e8e862a2074025d8d807
MD5 8f0770f2ddcd33a0f3f28050af8bc3ee
BLAKE2b-256 d69ca43a7098ba9f6bb07396fb677226c9a804aa0853b585081abd27dd82f964

See more details on using hashes here.

Provenance

The following attestation bundles were made for phospy-1.0.0.tar.gz:

Publisher: publish.yml on falconsmilie/phospy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phospy-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: phospy-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 50.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phospy-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f1df8a81aa3c3028c2fc848408e10d47b18b0fbaa3b7d5f74b377c5d10cd71af
MD5 b69ccf4bd009c740752110dae526ce46
BLAKE2b-256 207288d4f286023094f810f8eb96e8f6c30fd5da57eda34279e335db7c02c12a

See more details on using hashes here.

Provenance

The following attestation bundles were made for phospy-1.0.0-py3-none-any.whl:

Publisher: publish.yml on falconsmilie/phospy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page