Python-native implementation of selected PhosR-style phosphoproteomics workflows.

These details have not been verified by PyPI

Project description

PhosPy

PhosPy is an unofficial Python implementation of selected PhosR-style workflows for phosphoproteomics.

It is designed for people who want a small, Python-native way to:

preprocess phosphoproteomics tables
analyse kinase activity from predMat
run a native kinase workflow from scoring through prediction

PhosPy is deliberately narrow. It is not a full replacement for the R PhosR package.

Install

PhosPy supports Python 3.10 and newer.

Install the supported Python API and the phospy CLI:

pip install phospy

A small note before you start: the file-path examples below use examples/data/..., so they assume you are working from a repository checkout. If you installed from PyPI, use the same code with paths to your own input files instead.

What You Can Do With PhosPy

Preprocess Phosphoproteomics Data

Start from total and phospho input tables and produce corrected phosphosite matrices for downstream use.

Analyse Kinase Activity From `predMat`

Generate weighted activity scores, KSEA-style summaries, and target counts from predicted kinase–substrate relationships.

Run a Native Kinase Workflow

Construct substrate profiles, score motifs, combine evidence, select candidates, and perform adaptive SVM-based kinase prediction.

Supported Public API

The stable root-level API for is intentionally small:

PhosphoDataset
PhosRPipeline
analyze_kinase_activity
KinaseWorkflow

Returned result dataclasses:

CoreProcessingResult
SiteMatrixResult
CoreOutputs
KinaseActivityResult
KinasePredictionResult
KinaseWorkflowResult

The examples below use only those imports.

For a compact guide to the supported classes, methods, and result objects, see docs/api.md.

Input Tables at a Glance

PhosPy expects a small, fixed set of input shapes.

Total-proteome table

Required columns:

genes
group1 to group6

Phosphoproteome table

Required columns:

uid
gene_names
gene_p_site
localization_prob
centralized_sequence
p_group1 to p_group6

gene_p_site must look like GENE_SITE, for example PRKACA_S339.

`predMat`

predMat must be a numeric matrix with:

phosphosite IDs as the index, for example BTK;Y551;
kinase names as columns
scores in the range [0, 1]

When you load tables from files, PhosPy normalises input headers to lowercase snake case before validation. For example, Gene Names and gene-names both become gene_names. That makes file input a little more forgiving, but it also means loading fails if two raw headers collapse to the same cleaned name.

If you build PhosphoDataset from in-memory pandas data frames instead, those column names are validated as provided.

Quick Start

The quickest way to get started from a source checkout is to use the bundled example data in examples/data/.

Core Preprocessing

from phospy import PhosphoDataset

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
)
core = dataset.process_core(max_unmatched_fraction=0.1)

site_matrix = core.site_matrix.matrix
corrected = core.phospho_corrected

For the bundled example data, site_matrix.index.tolist() is ['BTK;Y551;'].

process_core() returns a CoreProcessingResult with:

total_unique
total_filtered
phospho_filtered
phospho_corrected
site_matrix

If your analysis needs explicit pairwise comparisons, pass them when you build the dataset:

from phospy import PhosphoDataset

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
    comparisons=[("group1", "group4"), ("group2", "group5")],
)
core = dataset.process_core(max_unmatched_fraction=0.1)

If you do not pass comparisons, preprocessing still runs normally and no extra pairwise columns are added.

Downstream Kinase Analysis From `predMat`

from phospy import PhosphoDataset, analyze_kinase_activity
import pandas as pd

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
)
core = dataset.process_core(max_unmatched_fraction=0.1)
pred_mat = pd.read_csv("examples/data/predMat.csv", index_col=0)

kinase = analyze_kinase_activity(
    pred_mat=pred_mat,
    phospho_matrix=core.site_matrix.matrix,
    threshold=0.6,
    min_substrates=1,
    top_n_substrates=1,
)

target_counts = kinase.target_counts
ksea_scores = kinase.ksea_scores

The bundled example uses min_substrates=1 and top_n_substrates=1 because the example matrix is intentionally tiny. For larger real datasets, the defaults (min_substrates=3, top_n_substrates=20) are usually the better starting point.

For the bundled example data, target_counts.to_dict() is {'PRKACA': 3, 'BTK': 2}.

analyze_kinase_activity(...) returns a KinaseActivityResult with:

weighted_activity
ksea_scores
ksea_counts
target_counts
target_table

End-to-End Pipeline

from phospy import PhosRPipeline

pipeline = PhosRPipeline.from_files(
    total_path="examples/data/total.tsv",
    phospho_path="examples/data/phospho.tsv",
    pred_mat_path="examples/data/predMat.csv",
    phospho_encoding="utf-16le",
    max_unmatched_fraction=0.1,
)
outputs = pipeline.run(outdir="examples/output")

outputs is a CoreOutputs object with:

outputs.core
outputs.kinase_activity

This writes the core CSV outputs together with downstream kinase-analysis tables, including:

df_total_unique.csv
df_total_filtered.csv
df_phospho_filtered.csv
df_phospho_corrected.csv
phosr_input.csv
mat_phospho_corrected.csv
site_sequences.csv
kinase_activity_matrix.csv
ksea_scores.csv
ksea_counts.csv
kinase_target_counts.csv
kinase_target_table.csv

If you omit pred_mat_path, the pipeline still runs the core preprocessing path and simply skips the downstream kinase-analysis outputs.

Native End-to-End Kinase Workflow

A complete runnable native-workflow example is included at examples/native_workflow_demo.py.

If PhosPy is installed in the environment, for example with pip install phospy or pip install -e . from a local checkout, you can run it directly:

python examples/native_workflow_demo.py

From a local checkout, there is also a Make target that runs the example with the repository src/ path configured for that shell session:

make native-workflow-demo

That example uses only the supported API and prints a small prediction matrix for a synthetic two-kinase setup.

The native workflow expects:

a phosphosite matrix
a substrate_map
site_sequences keyed by phosphosite ID when motif scoring is used
motif_sequences for end-to-end motif-aware prediction

site_sequences can be passed as either a mapping keyed by phosphosite ID or a pandas Series with a phosphosite index. If you want profile-only prediction, pass allow_profile_only_fallback=True and omit motif_sequences.

Command-Line Demo

After installation, you can run the CLI on your own files. The example below uses the bundled tables from a source checkout:

phospy \
  --total examples/data/total.tsv \
  --phospho examples/data/phospho.tsv \
  --pred-mat examples/data/predMat.csv \
  --phospho-encoding utf-16le \
  --max-unmatched-fraction 0.1 \
  --outdir examples/output

The example output directory in examples/output/ shows the generated CSV files.

The CLI currently supports these options:

--total and --phospho are required input files
--phospho-encoding optionally overrides the default utf-8 reader encoding
--outdir is the required output directory
--pred-mat is optional
--localization-threshold defaults to 0.75
--min-observed defaults to 4
--total-sentinel defaults to 10.0
--phospho-sentinel defaults to 12.0
--max-unmatched-fraction defaults to 0.0

--max-unmatched-fraction=0.0 means protein correction fails if the inner join would silently drop any phosphosite rows. Raise it only when you want to allow a small, bounded amount of row loss.

The CLI is intentionally small. It does not currently expose pairwise comparison generation or the native KinaseWorkflow path.

Validation Rules Worth Knowing

A few checks are especially useful to know up front:

localization_prob must stay within [0, 1].
predMat values must stay within [0, 1].
file-loaded total and phospho headers are cleaned to lowercase snake case before validation, so duplicate cleaned names are rejected.
predMat and the phosphosite matrix must overlap by at least one phosphosite row, and that overlap must cover at least 10% of the phosphosite matrix.
Protein correction normalises gene identifiers before matching and, by default, refuses to drop unmatched phosphosite rows.
Site-matrix construction drops rows with missing sequences or incomplete corrected values, then deduplicates repeated phosphosites by keeping the row with the highest mean corrected signal.
In the native workflow, motif_sequences require matching site_sequences. If you omit motif data entirely, set allow_profile_only_fallback=True.

Where to Go Next

If you want more detail, these are the most useful follow-on docs:

docs/api.md maps the supported public API
docs/validation-and-parity.md explains how validation is approached in PhosPy
docs/parity.md explains what parity means here, especially for the native kinase workflow
docs/fixtures.md maps the committed fixture and trace directories
docs/roadmap.md outlines the most likely next steps
CHANGELOG.md contains the release notes

If you want to contribute or work from a local checkout, see CONTRIBUTING.md.

Known Limitations

A few boundaries are worth knowing up front:

Selective scope only. PhosPy covers the workflows documented above and nothing broader.
Parity is seam-level, not package-wide. Validation claims are limited to the committed fixture-backed seams described in docs/validation-and-parity.md and docs/parity.md.
KinaseWorkflow is native first. It includes an svm_mode="r_parity" option for narrower learner-seam comparison, but the default mode is the preferred Python-native path and is not claimed to numerically match every PhosR result.
The CLI is intentionally small. It covers the core preprocessing and predMat-driven downstream path. The native kinase workflow is currently exposed through the Python API and example script.
R is only required for fixture regeneration. You do not need R to install PhosPy or run the committed Python test suite.

For Contributors

Most users can ignore this section.

To work from a local checkout:

pip install -e .

To run tests:

pip install -e ".[test]"
pytest -m "not parity"
pytest -m parity

If you want the parity suite to print its optional comparison metrics while you debug a seam, these environment variables are available:

PHOSPY_SHOW_PARITY: master switch for parity metrics output
PHOSPY_SHOW_PROFILE_CONSTRUCTION: also print the optional profile-construction metrics
PHOSPY_SHOW_PREDICTION_MODE_COMPARISON: also print default-versus-r_parity prediction comparison metrics
PHOSPY_SHOW_REPLAYED_PREDICTION_MODE_COMPARISON: also print replayed prediction comparison metrics

The three more specific flags only do anything when PHOSPY_SHOW_PARITY is enabled first. Truthy values are case-insensitive and include 1, true, yes, and on.

To see the printed summaries in the terminal, run pytest with -s (or --capture=no). If you enable all four flags and run the full parity suite, PhosPy prints every available metrics block reached by those tests.

Linux or macOS quick example:

PHOSPY_SHOW_PARITY=1 PHOSPY_SHOW_PROFILE_CONSTRUCTION=1 PHOSPY_SHOW_PREDICTION_MODE_COMPARISON=1 PHOSPY_SHOW_REPLAYED_PREDICTION_MODE_COMPARISON=1 pytest -m parity -s

For Linux/macOS, PowerShell, and Command Prompt examples together with a sample of the bundled parity output, see docs/parity.md.

To run the usual contributor checks:

pip install -e ".[dev]"
pre-commit install
pre-commit run --all-files

R Requirements for Fixture Regeneration

The committed parity fixtures are already included in the repository. You only need R if you want to regenerate or extend them.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.5.2

May 22, 2026

1.5.1

Apr 29, 2026

1.5.0

Apr 23, 2026

1.4.0

Apr 15, 2026

1.3.0

Apr 13, 2026

1.2.3

Apr 9, 2026

1.2.2

Apr 7, 2026

1.2.1

Apr 2, 2026

1.2.0

Apr 2, 2026

This version

1.1.1

Mar 31, 2026

1.1.0

Mar 31, 2026

1.0.0

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phospy-1.1.1.tar.gz (88.3 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

phospy-1.1.1-py3-none-any.whl (69.1 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file phospy-1.1.1.tar.gz.

File metadata

Download URL: phospy-1.1.1.tar.gz
Upload date: Mar 31, 2026
Size: 88.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phospy-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`229c7a3015108cbfab12e389e055acd8275987c8fe80dd1e3aaa0489441ac32a`
MD5	`3dc53daacb9aef51c8977788223e18c2`
BLAKE2b-256	`6f3356d9c8b4e140a5d79cdad4f8e787c7074189409137e65d177e45019cdac4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phospy-1.1.1.tar.gz:

Publisher: publish.yml on falconsmilie/phospy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phospy-1.1.1.tar.gz
- Subject digest: 229c7a3015108cbfab12e389e055acd8275987c8fe80dd1e3aaa0489441ac32a
- Sigstore transparency entry: 1203686656
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: falconsmilie/phospy@e285f7510cf84bfd8be9db3f41af84d60988b5c6
- Branch / Tag: refs/tags/v1.1.1
- Owner: https://github.com/falconsmilie
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e285f7510cf84bfd8be9db3f41af84d60988b5c6
- Trigger Event: push

File details

Details for the file phospy-1.1.1-py3-none-any.whl.

File metadata

Download URL: phospy-1.1.1-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 69.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phospy-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6409a23298582d1f522d8bc20f104792c52c0e2002591123997a268bf6f60b79`
MD5	`4a9ccc1c75a92a7e053312e2b12ee227`
BLAKE2b-256	`ce17281153d66f77f7781d23f5cdc2cc22587dbd22d8dca4974136c3ef85bfaf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phospy-1.1.1-py3-none-any.whl:

Publisher: publish.yml on falconsmilie/phospy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phospy-1.1.1-py3-none-any.whl
- Subject digest: 6409a23298582d1f522d8bc20f104792c52c0e2002591123997a268bf6f60b79
- Sigstore transparency entry: 1203686658
- Sigstore integration time: Mar 31, 2026
Source repository:
- Permalink: falconsmilie/phospy@e285f7510cf84bfd8be9db3f41af84d60988b5c6
- Branch / Tag: refs/tags/v1.1.1
- Owner: https://github.com/falconsmilie
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e285f7510cf84bfd8be9db3f41af84d60988b5c6
- Trigger Event: push

phospy 1.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PhosPy

Install

What You Can Do With PhosPy

Preprocess Phosphoproteomics Data

Analyse Kinase Activity From predMat

Run a Native Kinase Workflow

Supported Public API

Input Tables at a Glance

Total-proteome table

Phosphoproteome table

predMat

Quick Start

Core Preprocessing

Downstream Kinase Analysis From predMat

End-to-End Pipeline

Native End-to-End Kinase Workflow

Command-Line Demo

Validation Rules Worth Knowing

Where to Go Next

Known Limitations

For Contributors

R Requirements for Fixture Regeneration

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Analyse Kinase Activity From `predMat`

`predMat`

Downstream Kinase Analysis From `predMat`