Skip to main content

MOSSN: sample-specific protein network inference from gene expression with a direct-coupled multi-omics extension

Project description

mossn

mossn packages the MOSSN algorithm for constructing sample-specific protein interaction networks from gene expression data, together with ablation variants and a direct-coupled multi-omics extension.

The main single-omics API is:

  • prepare_data(...)
  • run_single_sample(...)
  • run_samples(...)

Features

  • Sample-specific edge reweighting using gene-expression-derived correction scores.
  • Random walk with restart (RWR) to estimate node importance per sample.
  • User-tunable parameters in the main API:
    • gamma
    • rwr_alpha
    • seed_quantile
    • use_seed
    • use_rwr
    • use_correction
    • use_prior
  • Support for either:
    • a user-provided PPI links table
    • a user-provided networkx.Graph
  • Multi-omics extension:
    • direct-coupled cross-layer graph

Installation

pip install mossn

For local development:

pip install -e .

Quick Start

import pandas as pd
from mossn import prepare_data, run_single_sample

links = pd.DataFrame(
    {
        "protein1": ["A", "A", "B"],
        "protein2": ["B", "C", "C"],
        "score": [0.8, 0.6, 0.9],
    }
)

expression_data = pd.DataFrame(
    {
        "sample_1": [4.2, 7.1, 3.5],
        "sample_2": [5.3, 6.4, 2.8],
    },
    index=["A", "B", "C"],
)

graph, base_weights, expression_data = prepare_data(
    expression_data=expression_data,
    links=links,
    use_prior=True,
)
edge_table = run_single_sample(
    sample_id="sample_1",
    graph=graph,
    base_weights=base_weights,
    expression_data=expression_data,
    gamma=2.0,
    rwr_alpha=0.3,
    seed_quantile=0.9,
)

print(edge_table.head())

If you want to run the full expression matrix sample by sample:

from mossn import prepare_data, run_samples

graph, base_weights, expression_data = prepare_data(
    expression_data=expression_data,
    links=links,
)

edge_table = run_samples(
    graph=graph,
    base_weights=base_weights,
    expression_data=expression_data,
)

Example Data

If you want to try the packaged BLCA and STRING example files:

from mossn import prepare_data, run_single_sample
from mossn.example_data import load_example_expression, load_example_links

links = load_example_links()
expression_data = load_example_expression()

graph, base_weights, expression_data = prepare_data(
    expression_data=expression_data,
    links=links,
)

edge_table = run_single_sample(
    sample_id=expression_data.columns[0],
    graph=graph,
    base_weights=base_weights,
    expression_data=expression_data,
)

Main Parameters

The main single-omics workflow exposes three tunable parameters:

  • gamma: strength of expression-based edge reweighting
  • rwr_alpha: restart probability in random walk with restart
  • seed_quantile: expression quantile used to define seed genes

It also exposes four logical switches:

  • use_seed=True: use high-expression seed genes
  • use_rwr=True: use random walk with restart
  • use_correction=True: use expression-based edge correction
  • use_prior=True: use input edge weights instead of uniform weights

For example, if you do not want to use seed genes:

edge_table = run_single_sample(
    sample_id="sample_1",
    graph=graph,
    base_weights=base_weights,
    expression_data=expression_data,
    use_seed=False,
)

If you want to ignore prior edge weights and use a uniform network:

graph, base_weights, expression_data = prepare_data(
    expression_data=expression_data,
    links=links,
    use_prior=False,
    uniform_weight=1.0,
)

Use Your Own Network

You can also provide your own networkx.Graph instead of a links table. Edge weights are read from the weight attribute by default.

import networkx as nx
from mossn import prepare_data, run_single_sample
from mossn.example_data import load_example_expression

graph = nx.Graph()
graph.add_edge("A", "B", weight=0.8)
graph.add_edge("B", "C", weight=0.6)

expression_data = load_example_expression()
graph, base_weights, expression_data = prepare_data(
    expression_data=expression_data,
    graph=graph,
)

edge_table = run_single_sample(
    sample_id=expression_data.columns[0],
    graph=graph,
    base_weights=base_weights,
    expression_data=expression_data,
    gamma=1.5,
    rwr_alpha=0.2,
    seed_quantile=0.8,
)

Bundled Example Data

The package includes the following example datasets:

  • TCGA BLCA expression matrix
  • STRING-derived PPI links

You can access them with:

from mossn.example_data import (
    get_example_expression_path,
    get_example_links_path,
    load_example_expression,
    load_example_links,
)

Input format

PPI links

The links table must contain:

  • protein1
  • protein2
  • score

Expression matrix

The expression matrix must use:

  • rows as genes or proteins
  • columns as sample IDs

Main API

The recommended public API consists of six core functions.

1. prepare_data(...)

Prepare a single-omics background network from either:

  • a PPI links table
  • a user-provided networkx.Graph

Returns:

  • graph
  • base_weights
  • filtered expression_data

2. run_single_sample(...)

Run MOSSN for one sample.

Returns:

  • one sample-specific edge-weight table

3. run_samples(...)

Run MOSSN across multiple samples in an expression matrix.

Returns:

  • a combined edge-weight table for all requested samples

4. prepare_data_driven(...)

Construct a data-driven background network directly from the expression matrix when no external prior network is available.

Returns:

  • graph
  • base_weights
  • filtered expression_data

5. prepare_data_direct_coupled(...)

Prepare the direct-coupled multi-omics graph.

Returns:

  • graph
  • base_weights
  • filtered omic_data
  • exp_genes

6. run_direct_coupled_single_sample(...)

Run the direct-coupled multi-omics version for one sample.

Returns:

  • one sample-specific edge-weight table

The current package release supports only the direct-coupled multi-omics extension.

Direct-Coupled Multi-Omics Example

from mossn import prepare_data_direct_coupled, run_direct_coupled_single_sample

graph, base_weights, omic_data, exp_genes = prepare_data_direct_coupled(
    links=links,
    omic_data=omic_data,
    coupled_omics=["CNV"],
)

edge_table = run_direct_coupled_single_sample(
    sample_id=sample_id,
    graph=graph,
    base_weights=base_weights,
    omic_data=omic_data,
    exp_genes=exp_genes,
    coupled_omics=["CNV"],
)

License

mossn is distributed under a research-only license. Non-commercial research, teaching, and evaluation use are allowed. Commercial use requires prior written permission from the copyright holder.

Notes

  • The package expects matched identifiers between the network and omics tables.
  • The main single-sample API lets you set gamma, rwr_alpha, seed_quantile, use_seed, use_rwr, and use_correction directly.
  • Sample-specific normalization uses median and interquartile range (IQR).
  • Node importance is rank-normalized before computing final edge weights.
  • The data-driven mode first infers a background graph from expression correlations when an external reference network is unavailable.

Data-Driven Example

from mossn import prepare_data_driven, run_samples
from mossn.example_data import load_example_expression

expression_data = load_example_expression()
graph, base_weights, expression_data = prepare_data_driven(
    expression_data=expression_data,
    cor_threshold=0.9,
)

edge_table = run_samples(
    graph=graph,
    base_weights=base_weights,
    expression_data=expression_data,
    sample_ids=[expression_data.columns[0]],
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mossn-0.1.1.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mossn-0.1.1-py3-none-any.whl (3.0 MB view details)

Uploaded Python 3

File details

Details for the file mossn-0.1.1.tar.gz.

File metadata

  • Download URL: mossn-0.1.1.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for mossn-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6088dc2a0a660a154dbf598da49cc5470fdf6c6886ca02b4af52951ec2d6625a
MD5 25adf04bd8918cd66765bfe3931e3003
BLAKE2b-256 8de63066642aaf9a3bea34d5759079b2e9a508f68242ede72279b34fd8180a8f

See more details on using hashes here.

File details

Details for the file mossn-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mossn-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for mossn-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c52545b4281a8fa99b5d68f59cc3baf37edfe404fa8324c5f564ec6a12afe4bc
MD5 f299eebad008d64e9e7ab8463593ac5a
BLAKE2b-256 f5502cbc5b3349a744c9d41ca159339f95782ab79599d76ebd7190a909098f81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page