MOTCO: Multi-omics Trajectory Comparison — latent spaces (PLS, SNF) and group differences
Project description
MOTCO — Multi-omics Trajectory Comparison
MOTCO provides tooling to generate latent spaces from multi‑omics data and to test for group differences in multivariate trajectories within those spaces.
Two approaches are implemented to build latent spaces:
- Partial Least Squares Regression/Discriminant Analysis (PLSR/PLS‑DA)
- Similarity Network Fusion (SNF) with optional spectral embedding
Once a latent space is constructed, MOTCO includes statistics to estimate differences in multivariate trajectories between groups (magnitude, orientation/angle, and shape), with an option for permutation testing via RRPP (Residual Randomization in a Permutation Procedure).
This repository contains the core statistical routines in src/motco/stats and a command‑line interface for common tasks.
Install (with uv)
Prerequisites: Python 3.11+ and uv installed.
# Create and activate a virtual environment
uv venv
source .venv/bin/activate
# Install MOTCO in editable mode (and dependencies)
uv pip install -e .
# Verify CLI is available
motco --help
Alternatively, using pip:
python -m venv .venv
source .venv/bin/activate
pip install -e .
Quick Example
See examples/motco_example.ipynb for an end-to-end walkthrough using the bundled dataset.
The equivalent CLI commands:
# 1. Build latent space with PLS-DA
motco plsr --data tests/data/evo_649_sm_example1.csv --label-col taxa \
--cv1-splits 7 --cv2-splits 8 --n-repeats 5 --max-components 2 \
--out-table results/plsr_table.csv
# 2. Estimate group differences
motco de \
--Y results/latent_space.csv \
--model-matrix results/model_matrix.csv \
--ls-means results/ls_means.csv \
--contrast contrast.json \
--out-json results/de_result.json \
--out-observed results/ls_mean_vectors.csv
Command Line Interface
MOTCO exposes a single entry‑point motco with subcommands for PLSR/PLS‑DA, SNF, and Differential Effects (group differences).
1) PLS‑DA with double cross‑validation
motco plsr \
--data path/to/table.csv \
--label-col diagnosis \
--cv1-splits 7 --cv2-splits 8 --n-repeats 30 --max-components 50 \
--out-table results/plsr_models.csv
Options:
- Use
--datafor a single CSV containing predictors and a label column specified by--label-col. - Or provide separate matrices via
--xand--yCSV files (mutually exclusive with--data). - If
--yhas a single column, it is treated as a label vector and will be one‑hot encoded internally; if it has multiple columns, it is treated as an already encoded class matrix. - Outputs a table with the best model per outer CV repeat (LV and AUROC). The actual trained models are kept in memory; export of models is not included in the CLI at this time. If
--out-tableis omitted, the table is printed to stdout.
Input expectations:
- CSV files with samples in rows, features in columns. For
--data, include a label column with binary or multi‑class outcome; it will be one‑hot encoded internally.
2) Similarity Network Fusion (SNF)
motco snf \
--input omics1.csv --input omics2.csv [--input omics3.csv ...] \
--K 20 --eps 0.5 --k 20 --t 20 \
--out-fused fused_affinity.csv \
--out-embedding spectral_embedding.csv
Notes:
- Each
--inputCSV must contain the same samples in the same order (rows = samples). --Kand--epsare used when constructing per‑dataset affinity matrices;--kand--tcontrol SNF neighborhood size and iterations.- The fused similarity matrix is saved to
--out-fused. If--out-embeddingis provided, a 10‑dimensional spectral embedding is also computed and saved. If no output paths are provided, the fused matrix is printed to stdout.
3) Differential Effects (group differences)
Estimate differences in magnitude and direction between groups using least‑squares means, with optional permutation testing (RRPP).
motco de \
--Y latent_space.csv \
--model-matrix model_matrix.csv \
--ls-means ls_means.csv \
--contrast contrast.json \
--out-json de_result.json
# With permutations (RRPP)
motco de \
--Y latent_space.csv \
--model-full model_full.csv \
--model-reduced model_reduced.csv \
--ls-means ls_means.csv \
--contrast contrast.json \
--rrpp-permutations 999 \
--out-json rrpp_result.json
Where:
latent_space.csvis the outcome matrixY(e.g., coordinates in a latent space; rows = samples, columns = dimensions).model_matrix.csvis a design matrix (with intercept) aligned toYrows. For RRPP, provide--model-fulland--model-reduced(both with intercept).ls_means.csvcontains the least‑squares means to compare (rows = groups/cohorts, columns = same dimensions asY).contrast.jsonis a JSON array of index lists, where each inner list enumerates the cohort indices belonging to the same group. Example:[[0,1],[2,3]].- Output JSON includes
deltas,angles, andshapesas symmetric matrices (lists of lists). With--rrpp-permutations > 0, these are returned as lists of matrices per permutation: e.g.,deltas[perm_idx] -> matrix.
Important: The sd.py utilities are now generalized and no longer assume dataset‑specific column names. See the Python API notes below for helpers to build model matrices and LS means directly from your group and level columns.
Python API (selected)
from motco.stats.pls import plsda_doubleCV
from motco.stats.snf import get_affinity_matrix, SNF, get_spectral
from motco.stats.sd import (
estimate_difference,
RRPP,
center_matrix,
get_model_matrix,
build_ls_means,
get_observed_vectors,
pair_difference,
)
# Example: build model matrix and LS means from group/level columns
import pandas as pd
import numpy as np
# X_factors contains two categorical columns: 'group' and 'level'
X_factors = pd.DataFrame({
'group': ['A','A','B','B'],
'level': ['t0','t1','t0','t1'],
})
# Y is the outcome/feature matrix aligned by rows to X_factors (e.g., latent space)
Y = pd.DataFrame(np.random.randn(4, 3), columns=['z1','z2','z3'])
# Build design and estimate LS means for all group×level cells
X = get_model_matrix(X_factors, group_col='group', level_col='level', full=True)
ls = build_ls_means(
group_levels=sorted(X_factors['group'].astype(str).unique()),
level_levels=sorted(X_factors['level'].astype(str).unique()),
full=True,
)
deltas, angles, shapes = estimate_difference(Y=Y.values, model_matrix=X, LS_means=ls, contrast=[[0,1],[2,3]])
# Two‑state comparison between two groups at two levels (angle & delta)
df = X_factors.copy()
df = pd.concat([df, Y], axis=1)
angle_deg, delta_mag = pair_difference(df, group_col='group', level_col='level', feature_cols=['z1','z2','z3'])
# Center features within groups (optional preprocessing)
df_centered = center_matrix(df, group_col='group', level_col='level', feature_cols=['z1','z2','z3'])
# RRPP with parallelism from Python API (optional)
# deltas_list, angles_list, shapes_list = RRPP(
# Y=Y.values,
# model_full=X,
# model_reduced=X[:, :3], # example reduced model
# LS_means=ls,
# contrast=[[0,1],[2,3]],
# permutations=999,
# n_jobs=-1, # use all CPUs
# )
See inline docstrings in the modules under src/motco/stats/ for full details.
Inspecting LS-mean coordinates
Before running estimate_difference, use get_observed_vectors to see the predicted
mean position of each group × level cell in Y space:
from motco.stats.sd import get_observed_vectors
# X_factors: DataFrame with group_col and level_col
# Y: outcome matrix aligned to X_factors by row
obs = get_observed_vectors(X_factors, Y, group_col='group', level_col='level', full=True)
# Returns a DataFrame with MultiIndex (group, level) and columns matching Y
print(obs)
Interpreting Results
estimate_difference and RRPP return three symmetric matrices:
| Output | Meaning |
|---|---|
deltas |
Absolute difference in trajectory magnitude (total path length) between group pairs. Larger = one group changed more than the other. |
angles |
Angle in degrees between trajectory orientations. 0° = same direction; 90° = orthogonal; 180° = exactly opposite. |
shapes |
Procrustes distance between trajectory shapes after removing size and orientation differences. 0 = identical shape. |
P-values via RRPP: Use a right-tailed test with the add-one correction:
def pvalue(samples, observed, i, j):
vals = np.array([s[i, j] for s in samples])
return (np.sum(vals >= observed) + 1) / (len(vals) + 1)
Significance threshold is typically α = 0.05.
Breaking changes
- The statistical helpers in
motco.stats.sdwere generalized from dataset‑specific assumptions (e.g., columns likePTGENDERandDX) to explicitgroup_colandlevel_colparameters. There are no defaults; you must provide your column names. Legacy parameter names (e.g.,sex_col) are removed.
License
This project is licensed under the terms of the LICENSE file included in this repository.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file motco-0.4.0.tar.gz.
File metadata
- Download URL: motco-0.4.0.tar.gz
- Upload date:
- Size: 63.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4713bc3f969840e345a6840c928c49a6e273a3d5171c5b8380a3e76086ef476a
|
|
| MD5 |
6eb7b3443b082b4ed9e09c5b30da8802
|
|
| BLAKE2b-256 |
40276f0e2f01275c1dea98249972de0debc36bd52dc837a893a774efc4f5d85e
|
File details
Details for the file motco-0.4.0-py3-none-any.whl.
File metadata
- Download URL: motco-0.4.0-py3-none-any.whl
- Upload date:
- Size: 66.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3475f066ff68cb4eb47bd27305bdb126e8c906c3dd7a4db5a6dcbb58a45a8564
|
|
| MD5 |
cb4abb8e96c2d89cfaadb02d38bea62d
|
|
| BLAKE2b-256 |
9552fac9175e4cfc6c0c340844096dd3c11bba2535bc0243058bd3ac43ee7f86
|