Kinase-substrate network prediction: PostgreSQL, Numba, pynetphorest, live data, false-negative recovery

These details have not been verified by PyPI

Project links

Project description

PyNetworKIN

PyNetworKIN is a Bayesian kinase–substrate prediction pipeline for phosphoproteomics. It integrates sequence-motif scoring (via pynetphorest) with protein-interaction context (via the STRING network) to predict which kinases, phosphatases, or phospho-binding domains are responsible for observed phosphorylation events.

This repository is a modernised Python 3 port of the original NetworKIN 3.0 tool (Linding, Jensen, Horn & Kim, 2005–2013), extended to support STRING v12 protein interaction data.

Features

Predicts kinase/phosphatase/phospho-binding domain substrates from FASTA + phosphosite input.
Supports human (9606) and yeast (4932) proteomes.
Accepts multiple phosphosite input formats: NetworKIN TSV, ProteomeDiscoverer, MaxQuant, and custom formats.
Integrates sequence motif posterior probabilities with STRING network proximity scores using pre-calibrated Bayesian likelihood-ratio tables.
Outputs per-site predictions as a TSV file in the results/ directory.

Requirements

Dependency	Version	Notes
Python	≥ 3.10
NumPy	≥ 1.26
Pandas	≥ 2.2
pynetphorest	≥ 0.1.1	Motif scoring atlas
NCBI BLAST+	≥ 2.9	`blastp` must be on `PATH` or supplied via `--blast-dir`

Installation

From source

pip install -e .

Docker (GHCR)

docker pull ghcr.io/bibymaths/pynetworkin:latest
docker run --rm -v "$(pwd):/work" ghcr.io/bibymaths/pynetworkin:latest predict /work/input.fasta

Or use the provided Compose file:

docker compose up -d
docker compose exec networkin pynetworkin predict /work/input.fasta

Usage

CLI

pynetworkin predict <FASTA-file> [options]

Argument / Option	Default	Description
`FASTA-file`	(required)	Input FASTA or phosphosite file
`--output` / `-o`	`<input>.networkin.tsv`	Output file path
`--format` / `-f`	`tsv`	Output format: `tsv` or `sif`
`--species`	`9606`	NCBI taxonomy ID (`9606` = human, `4932` = yeast)
`--refresh` / `-r`	off	Force re-fetch of cached network data
`--verbose` / `-v`	off	Enable verbose logging

Example

pynetworkin predict data_MaxQuant_sample/test.fasta --output results/test.networkin.tsv

Results are written to results/<fasta-filename>.result.tsv.

Other commands

pynetworkin info       # Show runtime/package information
pynetworkin cache      # Show cache contents
pynetworkin cache --clear  # Clear cached network data

Python API

from pynetworkin import AppConfig, run_pipeline

config = AppConfig(
    organism="9606",
    fasta_path="data_MaxQuant_sample/test.fasta",
    sites_path=None,
    datadir="data",
    blast_dir="",
)
results = run_pipeline(config)
print(results["prediction_count"], "predictions written to", results["output_path"])

Input formats

FASTA file

Standard FASTA format. Protein IDs are taken as everything between > and the first _ on the header line.

Sites file (auto-detected)

Format	Detection	Description
NetworKIN TSV	3-column TSV	`protein_id \t position \t residue`
ProteomeDiscoverer	2-column	`protein_id \t phosphopeptide` (phosphosites in lowercase)
MaxQuant	Column header `Proteins` + `Leading`	Direct MaxQuant phosphosite output
Space-separated	column 2 = `phospho`	Space-separated with residue+position in col 2

Output format

Results TSV columns:

Column	Description
Name	Target protein ID
Position	Phosphosite position in the protein
Tree	NetPhorest tree (KIN, SH2, PTP, 1433, …)
Motif Group	NetPhorest classifier group
Kinase/Phosphatase/Phospho-binding domain	Predicted enzyme
NetworKIN score	Integrated Bayesian score (≥ 0.02 reported)
Motif probability	Raw NetPhorest posterior
STRING score	STRING best-path proximity score
Target STRING ID	Ensembl protein ID of the substrate
Kinase STRING ID	Ensembl protein ID of the enzyme
Target Name	Human-readable substrate name
Kinase Name	Human-readable enzyme name
Target description	STRING functional description of substrate
Kinase description	STRING functional description of enzyme
Peptide sequence window	±7 aa window around the phosphosite
Intermediate nodes	Best-path intermediate proteins in STRING
recovered	`True` if recovered by the false-negative recovery step
recovery_method	Method used for recovery (e.g. `context_proximity`)

Repository structure

src/
  pynetworkin/          # Core pipeline package
    __init__.py         # Public API (AppConfig, run_pipeline)
    networkin.py        # Main pipeline: AppConfig, run_pipeline, detect_site_file_type, …
    motif_scoring.py    # pynetphorest batch scorer wrapper
    graph_scoring.py    # STRING network context scoring & prediction ranking
    likelihood.py       # Bayesian likelihood conversion tables
    logger.py           # Loguru/Rich logging wrapper
    output.py           # TSV / Cytoscape SIF output writers
    recovery.py         # False-negative recovery via network proximity
    cli.py              # Typer CLI entry-point
    inputs/
      phosphosites.py   # OmniPath / PhosphoSitePlus / fallback fetcher
      string_network.py # STRING flat-file / REST API / fallback fetcher
scripts/
  backup.py                  # Legacy NetworKIN 3.0 reference script (Python 3 port)
  cleanup_HGNC_mapping.py    # HGNC symbol–Ensembl ID reconciliation utility
  generate_sample_data.py    # Generate offline fallback data files
  migrate_to_parquet.py      # Migrate legacy .txt conversion tables → Parquet
data/
  conversion_direct.parquet   # Pre-built likelihood tables (direct STRING paths)
  conversion_indirect.parquet # Pre-built likelihood tables (indirect STRING paths)
  fallback/                   # Bundled offline sample data
  string_data/                # STRING interaction flat files
tests/
  conftest.py             # pytest path setup (adds src/ to sys.path)
  test_motif_scoring.py
  test_output.py
  test_recovery.py
  test_networkin.py       # Tests for load_conversion_tables, detect_site_file_type, run_pipeline

See ARCHITECTURE.md for a detailed description of the execution flow.

Data sources

pynetphorest: kinase-group motif models (Python package).
STRING v12: human protein interactions and sequences. Downloaded from string-db.org.
OmniPath: phosphorylation site reference data (fetched live, cached locally).

This repository provides a modern reimplementation of the NetworKIN framework.

Original NetworKIN was described in: Linding et al., Cell 2007
This implementation:
- Does NOT reuse original NetworKIN source code
- Replaces NetPhorest with pynetphorest
- Uses a rewritten likelihood model
- Implements a new modular pipeline

License: MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.4

Apr 6, 2026

0.1.3

Apr 6, 2026

This version

0.1.2

Apr 6, 2026

0.1.1

Apr 6, 2026

0.1.0

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pynetworkin_bio-0.1.2-py3-none-any.whl (448.0 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file pynetworkin_bio-0.1.2-py3-none-any.whl.

File metadata

Download URL: pynetworkin_bio-0.1.2-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 448.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pynetworkin_bio-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c98ec5a7f8eec95fa367ebcbc794b604ddce08b4b70b3837acafa3b707d3149e`
MD5	`20c9a2d525d1367f133c28fe8b623edc`
BLAKE2b-256	`7f7e63849e10af52f63450c5687c5f9d37747d6a3b674c8772f31cc6cd497d6d`

See more details on using hashes here.

pynetworkin-bio 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyNetworKIN

Features

Requirements

Installation

From source

Docker (GHCR)

Usage

CLI

Example

Other commands

Python API

Input formats

FASTA file

Sites file (auto-detected)

Output format

Repository structure

Data sources

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes