Skip to main content

Python native implementation of selected PhosR style workflows for phosphoproteomics.

Project description

PhosPy

PhosPy is a focused Python library for selected phosphoproteomics workflows inspired by PhosR.

It is built for a small set of jobs:

  • preprocess total and phospho tables
  • analyse kinase activity from an existing predMat
  • generate a predMat from phosphosite inputs
  • run the native Python kinase workflow
  • construct signalomes from scoring and prediction outputs

PhosPy is intentionally narrow. It is not a full PhosR replacement.

Install

PhosPy supports Python 3.10 and newer.

pip install phospy

For parquet output:

pip install "phospy[parquet]"

The examples below use repository paths such as examples/data/.... If you installed from PyPI, use your own local paths instead.

Pick the Right Entry Point

PhosphoDataset

Use PhosphoDataset when you want validated total and phospho inputs plus the standard preprocessing flow.

from phospy import PhosphoDataset
from phospy.writers import CoreOutputWriter

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
)
core = dataset.preprocessing.run(max_unmatched_fraction=0.1)

CoreOutputWriter().write(core, outdir="examples/output", format="csv")

site_matrix = core.site_matrix.matrix
corrected = core.phospho_corrected

KinaseActivityAnalyzer

Use KinaseActivityAnalyzer when you already have a phosphosite matrix and a predMat.

from phospy import KinaseActivityAnalyzer, PhosphoDataset

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
)
core = dataset.preprocessing.run(max_unmatched_fraction=0.1)

analyzer = KinaseActivityAnalyzer()
result = analyzer.run(
    pred_mat=analyzer.load_pred_mat("examples/data/predMat.csv"),
    phospho_matrix=core.site_matrix.matrix,
    threshold=0.6,
    min_substrates=1,
    top_n_substrates=1,
)

ksea_scores = result.ksea_scores

The bundled example data is tiny, so it uses min_substrates=1 and top_n_substrates=1.

PhosRPipeline

Use PhosRPipeline when you want file loading, preprocessing, optional kinase analysis, and output publishing in one call.

from phospy import PhosRPipeline

pipeline = PhosRPipeline.from_files(
    total_path="examples/data/total.tsv",
    phospho_path="examples/data/phospho.tsv",
    pred_mat_path="examples/data/predMat.csv",
    phospho_encoding="utf-16le",
    max_unmatched_fraction=0.1,
    kinase_activity_threshold=0.6,
    kinase_activity_min_substrates=1,
    kinase_activity_top_n_substrates=1,
)
outputs = pipeline.run(outdir="examples/output")

When outdir is set, the pipeline writes the core outputs, any kinase-analysis outputs, and run_manifest.json.

PredMatWorkflow

Use PredMatWorkflow when your goal is to generate a predMat from phosphosite inputs.

import json
from pathlib import Path

import pandas as pd

from phospy import PredMatWorkflow

phospho_matrix = pd.read_csv("examples/data/predmat_phospho_matrix.csv", index_col=0)
site_sequences = json.loads(Path("examples/data/predmat_site_sequences.json").read_text())
substrate_map = json.loads(Path("examples/data/predmat_substrate_map.json").read_text())
motif_sequences = json.loads(Path("examples/data/predmat_motif_sequences.json").read_text())

workflow = PredMatWorkflow(flank_size=2, svm_mode="default")
result = workflow.run(
    phospho_matrix=phospho_matrix,
    substrate_map=substrate_map,
    site_sequences=site_sequences,
    motif_sequences=motif_sequences,
    min_substrates=2,
    min_motif_size=2,
    ensemble_size=3,
    top=4,
    score_threshold=0.75,
    inclusion=3,
    n_iterations=2,
    random_state=17,
)

pred_mat = result.pred_mat_result.to_frame(copy=False)
result.pred_mat_result.to_csv("predMat.csv")

Use svm_mode="default" for the recommended stable native path. Use svm_mode="r_parity" when you want the supported parity-oriented learner, sampling, and final-scoring preset for parity-sensitive comparisons.

When thresholds are too strict and no kinase candidates qualify, PhosPy raises NoCandidateKinasesError instead of returning an empty invalid predMat.

A runnable repository example lives in examples/predmat_workflow_demo.py.

KinaseWorkflow

Use KinaseWorkflow when you want the fuller native Python scoring and prediction workflow, including intermediate profile and motif scoring outputs.

A runnable repository example lives in examples/native_workflow_demo.py.

From a repository checkout:

make native-workflow-demo

SignalomeWorkflow

Use SignalomeWorkflow when you already have scoring and prediction outputs and want downstream signalome, map-ready, and network-ready outputs.

from phospy import PredMatWorkflow, SignalomeWorkflow

pred_mat_result = PredMatWorkflow(flank_size=2, svm_mode="default").run(...)
signalome_result = SignalomeWorkflow().run(
    scoring_result=pred_mat_result.scoring_result,
    prediction_result=pred_mat_result.prediction_result,
    expression_matrix=phospho_matrix,
    kinases_of_interest=["KINASE_A", "KINASE_B"],
    signalome_cutoff=0.5,
)

map_data = signalome_result.to_map_data()
network_data = signalome_result.to_network_data()

Use signalome_result.to_csv(...), map_data.to_csv(...), and network_data.to_csv(...) when you want exportable tables.

A runnable repository example lives in examples/signalome_workflow_demo.py.

File Inputs

PhosPy works with:

  • total input as TSV
  • phospho input as TSV
  • predMat as CSV, with the first column used as the phosphosite index

For the default table schema and method-level validation rules, see docs/api.md and docs/validation.md.

CLI

PhosPy also ships with a small CLI for the file-based preprocessing path and optional predMat analysis.

phospy \
  --total examples/data/total.tsv \
  --phospho examples/data/phospho.tsv \
  --pred-mat examples/data/predMat.csv \
  --phospho-encoding utf-16le \
  --max-unmatched-fraction 0.1 \
  --outdir examples/output

Read Next

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phospy-1.2.3.tar.gz (154.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phospy-1.2.3-py3-none-any.whl (125.3 kB view details)

Uploaded Python 3

File details

Details for the file phospy-1.2.3.tar.gz.

File metadata

  • Download URL: phospy-1.2.3.tar.gz
  • Upload date:
  • Size: 154.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for phospy-1.2.3.tar.gz
Algorithm Hash digest
SHA256 0a350260fe8be50d7f02b371b93c130580844f20e92acdca9eeee73f2697be28
MD5 899ea9656a5e32d02307b1c2f3870db4
BLAKE2b-256 ce8879af4989c2e31ef8d16583a08f78882aaf2039f02e86c5ca5649f6cc545f

See more details on using hashes here.

Provenance

The following attestation bundles were made for phospy-1.2.3.tar.gz:

Publisher: publish.yml on falconsmilie/phospy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phospy-1.2.3-py3-none-any.whl.

File metadata

  • Download URL: phospy-1.2.3-py3-none-any.whl
  • Upload date:
  • Size: 125.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for phospy-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e468e94eb9db2004fa4fcd03eff3f5195be9e1f324c9423d790df2db6a0e682d
MD5 fd07856b8a81ebb62ec72ceb59faa632
BLAKE2b-256 3e9c2066c5bc6b04d4703b52eb169de7e11057b9ba03267144fec5bee928b317

See more details on using hashes here.

Provenance

The following attestation bundles were made for phospy-1.2.3-py3-none-any.whl:

Publisher: publish.yml on falconsmilie/phospy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page