Python-native implementation of selected PhosR-style phosphoproteomics workflows.

These details have not been verified by PyPI

Project description

PhosPy

PhosPy is an unofficial Python implementation of selected PhosR-style workflows for phosphoproteomics.

It is designed for people who want a small, Python-native way to:

preprocess phosphoproteomics tables
analyse kinase activity from predMat
run a native kinase workflow from scoring through prediction

PhosPy is deliberately narrow. It is not a full replacement for the R PhosR package.

Install

PhosPy supports Python 3.10 and newer.

Install the supported Python API and the phospy CLI:

pip install phospy

The file-path examples below use examples/data/..., so they assume you are working from a repository checkout. If you installed from PyPI, use the same code with paths to your own input files.

What You Can Do With PhosPy

Preprocess Phosphoproteomics Data

Start from total and phospho input tables and produce corrected phosphosite matrices for downstream use.

Analyse Kinase Activity From `predMat`

Generate weighted activity scores, KSEA-style summaries, and target counts from predicted kinase–substrate relationships.

Run a Native Kinase Workflow

Construct substrate profiles, score motifs, combine evidence, select candidates, and perform adaptive SVM-based kinase prediction.

Supported Public API

The stable API is intentionally small:

PhosphoDataset
PhosRPipeline
KinaseActivityAnalyzer
KinaseWorkflow

Returned result dataclasses:

CoreProcessingResult
SiteMatrixResult
CoreOutputs
KinaseActivityResult
KinasePredictionResult
KinaseWorkflowResult

The examples below use only those imports.

For a compact guide to the supported classes, methods, and result objects, see docs/api.md.

Input Tables at a Glance

PhosPy expects a small, fixed set of input shapes.

Total Proteome Table

Required columns:

genes
group1 to group6

Phosphoproteome Table

Required columns:

uid
gene_names
gene_p_site
localization_prob
centralized_sequence
p_group1 to p_group6

gene_p_site must look like GENE_SITE, for example PRKACA_S339.

`predMat`

predMat must be a numeric matrix with:

phosphosite IDs as the index, for example BTK;Y551;
kinase names as columns
scores in the range [0, 1]

On disk, PhosphoDataset.from_files(...), PhosRPipeline.from_files(...), and the CLI read the total and phospho inputs as tab-delimited text tables. predMat is read separately as CSV with the first column used as the phosphosite index.

When you load tables from files, PhosPy normalises input headers to lowercase snake case before validation. For example, Gene Names and gene-names both become gene_names. That makes file input a little more forgiving, but it also means loading fails if two raw headers collapse to the same cleaned name.

If you build PhosphoDataset from in-memory pandas data frames instead, those column names are validated as provided.

Quick Start

The quickest way to get started from a source checkout is to use the bundled example data in examples/data/.

Core Preprocessing

from phospy import CoreOutputWriter, PhosphoDataset

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
)
core = dataset.preprocessing.run(max_unmatched_fraction=0.1)

writer = CoreOutputWriter()
writer.write(core, outdir="examples/output", format="csv")
# Use format="tsv" or format="parquet" for alternative core output bundles.

site_matrix = core.site_matrix.matrix
corrected = core.phospho_corrected

For the bundled example data, site_matrix.index.tolist() returns ['BTK;Y551;'].

dataset.preprocessing is the bound preprocessing facade for the dataset and the preferred public entrypoint for core preprocessing. Use dataset.preprocessing.run(...) as the routine API. CoreOutputWriter is the canonical public API for persisting core preprocessing outputs.

dataset.preprocessing.run() returns a CoreProcessingResult with:

total_unique
total_filtered
phospho_filtered
phospho_corrected
site_matrix

If your analysis needs explicit pairwise comparisons, pass them when you build the dataset:

from phospy import PhosphoDataset

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
    comparisons=[("group1", "group4"), ("group2", "group5")],
)
core = dataset.preprocessing.run(max_unmatched_fraction=0.1)

If you do not pass comparisons, preprocessing still runs normally and no extra pairwise columns are added.

If you only want the phosphosite localisation filter as a standalone preprocessing step, use the public helper in phospy.preprocessing:

from phospy.preprocessing import filter_localized_sites

filtered = filter_localized_sites(phospho_df, threshold=0.75)
summary_result = filter_localized_sites(
    phospho_df,
    threshold=0.75,
    return_summary=True,
)

summary_result.filtered contains the retained rows and summary_result.summary reports how many rows were kept or removed.

If you want to filter by observed data coverage before the broader workflow, use the standalone coverage helper:

from phospy.preprocessing import filter_sites_by_coverage

coverage_result = filter_sites_by_coverage(
    phospho_df,
    columns=["p_group1", "p_group2", "p_group3", "p_group4", "p_group5", "p_group6"],
    min_coverage=0.5,
    return_summary=True,
)

filter_localized_sites(...) removes sites with weak localisation evidence, while filter_sites_by_coverage(...) removes sites with too many missing sample values. These standalone helpers are for targeted advanced use; the preferred end-to-end preprocessing path remains dataset.preprocessing.run(...). Coverage filtering currently operates across the sample columns you provide rather than from a separate group metadata model.

Downstream Kinase Analysis From `predMat`

KinaseActivityAnalyzer is the public orchestration layer for standalone downstream kinase analysis. Use it when you already have a phosphosite matrix and a predMat and want the downstream kinase summary tables without going through PhosRPipeline.

from phospy import KinaseActivityAnalyzer, PhosphoDataset

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
)
core = dataset.preprocessing.run(max_unmatched_fraction=0.1)

analyzer = KinaseActivityAnalyzer()
kinase = analyzer.load_and_analyze(
    pred_mat_path="examples/data/predMat.csv",
    phospho_matrix=core.site_matrix.matrix,
    threshold=0.6,
    min_substrates=1,
    top_n_substrates=1,
)
analyzer.write_outputs(kinase, outdir="examples/output")

target_counts = kinase.target_counts
ksea_scores = kinase.ksea_scores

The bundled example uses min_substrates=1 and top_n_substrates=1 because the example matrix is intentionally tiny. For larger real datasets, the defaults (min_substrates=3, top_n_substrates=20) are usually the better starting point.

For the bundled example data, target_counts.to_dict() is {'PRKACA': 3, 'BTK': 2}.

KinaseActivityAnalyzer.load_and_analyze(...) returns a KinaseActivityResult with:

weighted_activity
ksea_scores
ksea_counts
target_counts
target_table

End-to-End Pipeline

from phospy import PhosRPipeline

pipeline = PhosRPipeline.from_files(
    total_path="examples/data/total.tsv",
    phospho_path="examples/data/phospho.tsv",
    pred_mat_path="examples/data/predMat.csv",
    phospho_encoding="utf-16le",
    max_unmatched_fraction=0.1,
)
outputs = pipeline.run(outdir="examples/output")

outputs is a CoreOutputs object with:

outputs.core
outputs.kinase_activity

This writes the default core CSV outputs together with downstream kinase-analysis tables, including:

df_total_unique.csv
df_total_filtered.csv
df_phospho_filtered.csv
df_phospho_corrected.csv
phosr_input.csv
mat_phospho_corrected.csv
site_sequences.csv
kinase_activity_matrix.csv
ksea_scores.csv
ksea_counts.csv
kinase_target_counts.csv
kinase_target_table.csv
run_manifest.json

run_manifest.json records a small summary of the run, including whether kinase activity outputs were produced, row counts for the core tables, the preprocessing configuration, and the installed package version.

If you omit pred_mat_path, the pipeline still runs the core preprocessing path and simply skips the downstream kinase-analysis outputs. For explicit non-CSV core persistence outside the pipeline, use CoreOutputWriter directly with format="tsv", format="csv", or format="parquet". Parquet output requires an installed pandas parquet engine such as pyarrow; the package now exposes this as the optional phospy[parquet] extra.

Native End-to-End Kinase Workflow

A complete runnable native-workflow example is included at examples/native_workflow_demo.py.

If PhosPy is installed in the environment, for example with pip install phospy or pip install -e . from a local checkout, you can run it directly:

python examples/native_workflow_demo.py

From a local checkout, there is also a Make target that runs the example with the repository src/ path configured for that shell session:

make native-workflow-demo

That example uses only the supported API and prints a small prediction matrix for a synthetic two-kinase setup.

The native workflow expects:

a phosphosite matrix
a substrate_map
site_sequences keyed by phosphosite ID when motif scoring is used
motif_sequences for end-to-end motif-aware prediction

site_sequences can be passed as either a mapping keyed by phosphosite ID or a pandas Series with a phosphosite index. If you want profile-only prediction, pass allow_profile_only_fallback=True and omit motif_sequences.

Command-Line Demo

After installation, you can run the CLI on your own files. The example below uses the bundled tables from a source checkout:

phospy \
  --total examples/data/total.tsv \
  --phospho examples/data/phospho.tsv \
  --pred-mat examples/data/predMat.csv \
  --phospho-encoding utf-16le \
  --max-unmatched-fraction 0.1 \
  --outdir examples/output

The checked-in example output directory in examples/output/ shows the generated CSV tables. A fresh CLI or pipeline run also writes run_manifest.json to the chosen output directory.

The CLI currently supports these options:

--total and --phospho are required tab-delimited input files
--phospho-encoding optionally overrides the default utf-8 reader encoding
--outdir is the required output directory
--pred-mat is optional
--localization-threshold defaults to 0.75
--min-observed defaults to 4
--total-sentinel defaults to 10.0
--phospho-sentinel defaults to 12.0
--max-unmatched-fraction defaults to 0.0

--max-unmatched-fraction=0.0 means protein correction fails if the inner join would silently drop any phosphosite rows. Raise it only when you want to allow a small, bounded amount of row loss.

The CLI is intentionally small. It does not currently expose pairwise comparison generation or the native KinaseWorkflow path.

Validation Rules Worth Knowing

A few checks are especially useful to know up front:

localization_prob must stay within [0, 1].
predMat values must stay within [0, 1].
file-loaded total and phospho headers are cleaned to lowercase snake case before validation, so duplicate cleaned names are rejected.
predMat and the phosphosite matrix must overlap by at least one phosphosite row, and that overlap must cover at least 10% of the phosphosite matrix.
Protein correction normalises gene identifiers before matching and, by default, refuses to drop unmatched phosphosite rows.
Site-matrix construction drops rows with missing sequences or incomplete corrected values, then deduplicates repeated phosphosites by keeping the row with the highest mean corrected signal.
In the native workflow, motif_sequences require matching site_sequences. If you omit motif data entirely, set allow_profile_only_fallback=True.

Where to Go Next

If you want more detail, these are the most useful follow-on docs:

docs/api.md for the public API reference
docs/validation-and-parity.md for the short validation and PhosR parity guide
docs/parity.md for the detailed parity guide against the R PhosR package
docs/fixtures.md for the committed fixture and trace layout
docs/roadmap.md for likely next steps

If you want to contribute or work from a local checkout, see CONTRIBUTING.md.

Known Limitations

A few boundaries are worth knowing up front:

Selective scope only. PhosPy covers the workflows documented above and nothing broader.
Parity claims are narrow. When PhosPy says there is parity, it means seam-level parity to the R PhosR package backed by committed fixtures and tests. See docs/parity.md.
KinaseWorkflow is native first. svm_mode="r_parity" narrows one learner comparison seam. It does not make the whole workflow equivalent to PhosR.
The CLI is intentionally small. It covers the core preprocessing and predMat-driven downstream path. The native kinase workflow is exposed through the Python API and example script.
R is only required for fixture regeneration. You do not need R to install PhosPy or run the committed Python test suite.

For Contributors to PhosPy

To work from a local checkout:

pip install -e .

To run tests:

pip install -e ".[test]"
pytest -m "not parity"
pytest -m parity

For the short validation and parity guide, see docs/validation-and-parity.md. For the detailed guide to parity against the R PhosR package, see docs/parity.md.

To run the usual contributor checks:

pip install -e ".[dev]"
pre-commit install
pre-commit run --all-files

R Requirements for Fixture Regeneration

The committed parity fixtures are already included in the repository. You only need R if you want to regenerate or extend them.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.5.2

May 22, 2026

1.5.1

Apr 29, 2026

1.5.0

Apr 23, 2026

1.4.0

Apr 15, 2026

1.3.0

Apr 13, 2026

1.2.3

Apr 9, 2026

1.2.2

Apr 7, 2026

This version

1.2.1

Apr 2, 2026

1.2.0

Apr 2, 2026

1.1.1

Mar 31, 2026

1.1.0

Mar 31, 2026

1.0.0

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phospy-1.2.1.tar.gz (116.4 kB view details)

Uploaded Apr 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

phospy-1.2.1-py3-none-any.whl (91.3 kB view details)

Uploaded Apr 2, 2026 Python 3

File details

Details for the file phospy-1.2.1.tar.gz.

File metadata

Download URL: phospy-1.2.1.tar.gz
Upload date: Apr 2, 2026
Size: 116.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phospy-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`263d1015aa7a86663d181aa36bedd7040671c190a72e9f607f5402c039214fd7`
MD5	`fd96da0b6c18073bfce2b925077a0456`
BLAKE2b-256	`8bb8af3a6f4a8a3d6950958df8d3aab2e3df180fc6a33eb1498ead9c31d8e546`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phospy-1.2.1.tar.gz:

Publisher: publish.yml on falconsmilie/phospy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phospy-1.2.1.tar.gz
- Subject digest: 263d1015aa7a86663d181aa36bedd7040671c190a72e9f607f5402c039214fd7
- Sigstore transparency entry: 1217490700
- Sigstore integration time: Apr 2, 2026
Source repository:
- Permalink: falconsmilie/phospy@451343b6c6367b41b181da51603c00e3ce270b48
- Branch / Tag: refs/tags/v1.2.1
- Owner: https://github.com/falconsmilie
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@451343b6c6367b41b181da51603c00e3ce270b48
- Trigger Event: push

File details

Details for the file phospy-1.2.1-py3-none-any.whl.

File metadata

Download URL: phospy-1.2.1-py3-none-any.whl
Upload date: Apr 2, 2026
Size: 91.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phospy-1.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ac04a71f568eb45557b11af6073d89da45cecea10e8296cec6e24af38f08f35a`
MD5	`9ce4618e9b4d7dbc804d2d812da4cf3c`
BLAKE2b-256	`361cb610ea2a76516b1ef122248fad5500d90b5bbd2cdc445cd83a292db80a1f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phospy-1.2.1-py3-none-any.whl:

Publisher: publish.yml on falconsmilie/phospy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phospy-1.2.1-py3-none-any.whl
- Subject digest: ac04a71f568eb45557b11af6073d89da45cecea10e8296cec6e24af38f08f35a
- Sigstore transparency entry: 1217490771
- Sigstore integration time: Apr 2, 2026
Source repository:
- Permalink: falconsmilie/phospy@451343b6c6367b41b181da51603c00e3ce270b48
- Branch / Tag: refs/tags/v1.2.1
- Owner: https://github.com/falconsmilie
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@451343b6c6367b41b181da51603c00e3ce270b48
- Trigger Event: push

phospy 1.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PhosPy

Install

What You Can Do With PhosPy

Preprocess Phosphoproteomics Data

Analyse Kinase Activity From predMat

Run a Native Kinase Workflow

Supported Public API

Input Tables at a Glance

Total Proteome Table

Phosphoproteome Table

predMat

Quick Start

Core Preprocessing

Downstream Kinase Analysis From predMat

End-to-End Pipeline

Native End-to-End Kinase Workflow

Command-Line Demo

Validation Rules Worth Knowing

Where to Go Next

Known Limitations

For Contributors to PhosPy

R Requirements for Fixture Regeneration

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Analyse Kinase Activity From `predMat`

`predMat`

Downstream Kinase Analysis From `predMat`