Skip to main content

Python version of R package SLIDE

Project description

SLIDE_py

A bunch of python wrappers for R code

Overview

SLIDE combines the LOVE (Latent Model-Based Clustering for Biological Discovery) clustering algorithm with knockoff-based statistical inference to identify significant standalone and interacting latent factors. Link to R package: [!https://github.com/jishnu-lab/SLIDE]

Quick Start

Basic Usage

If running into trouble, feel free to use or clone the environment here: /ix3/djishnu/alw399/envs/rhino

From command line

Use the full path if you are not calling slide.py from the same directory

python slide.py \
    --x_path /path/to/your/features.csv \
    --y_path /path/to/your/labels.csv \
    --out_path /path/to/output/directory
In a notebook
import sys
sys.path.append('src/SLIDE')

from slide import OptimizeSLIDE

# Configure input parameters
input_params = {
    'x_path': '/path/to/your/features.csv',
    'y_path': '/path/to/your/labels.csv',
    'fdr': 0.1,
    'thresh_fdr': 0.1,
    'spec': 0.2,
    'y_factor': True,
    'niter': 500,
    'SLIDE_top_feats': 20,
    'rep_CV': 50,
    'pure_homo': True,
    'delta': [0.01],
    'lambda': [0.5, 0.1],
    'out_path': '/path/to/output/directory'
}

# Initialize and run SLIDE
slider = OptimizeSLIDE(input_params)
slider.run_pipeline(verbose=True, n_workers=1)

Pipeline Overview

The run_pipeline() has three main parts:

Stage 1: Latent Factor Discovery

  • LOVE Algorithm: Runs the overlapping clustering algorithm to identify latent factors
  • Output: Generates the latent factors (z_matrix) representing underlying data structure

Stage 2: Statistical Inference with SLIDE

  • 2a) Standalone Factor Analysis: Uses knockoffs to identify statistically significant standalone latent factors
  • 2b) Interaction Analysis: Applies knockoffs to discover significant interacting latent factor pairs
  • Feature Selection: Controls false discovery rate (FDR) while maintaining statistical power

Stage 3: Visualization

  • Control Plots: Generates diagnostic plots to assess model performance and statistical validity
  • Latent Factor Genes: For each latent factor, plots the top features with loadings > abs(0.05)

Parameter Configuration

Parameter Type Description Default/Example
x_path str Path to feature matrix CSV file Required
y_path str Path to response labels CSV file Required
fdr float False discovery rate threshold (Knockoffs) 0.1
thresh_fdr float FDR threshold for feature selection (LOVE) 0.1
spec float minimum % times an LF found to be significant in order to be included 0.2
y_factor bool Treat response as factor variable True
niter int Number of iterations 500
SLIDE_top_feats int Number of top features to display 20
pure_homo bool Use homogeneous loadings for pure variables True
delta list Regularization parameter(s) [0.5, 0.1]
lambda list Penalty parameter(s) [0.1]
out_path str Output directory path Required

Advanced Configuration

  • pure_homo=True: Forces pure variable loadings to be 1 (recommended)
  • pure_homo=False: Relaxes the pure variable loading constraint being 1 without losing any guarantees. However, it is difficult to find the right delta parameter
  • n_workers: Controls parallelization (1 for sequential processing), but CURRENTLY NOTHING IS PARALLELIZED
  • verbose: Enables detailed progress reporting (just a bunch of print statements)

Project Structure

SLIDE_py/
├── src/
│   ├── SLIDE/              # Core SLIDE implementation
│   │   ├── slide.py        # Main Python interface
│   │   └── ...            # Supporting R functions
│   └── LOVE-master/        # Original LOVE algorithm
│       ├── ...            # Original LOVE code (do not use)
│       ├── ...            # pure_homo LOVE code (use carefully)
|   └── LOVE-SLIDE/        # SLIDE implementation of LOVE

Implementation Details

LOVE Algorithm Integration

  • Primary Implementation: Located in src/SLIDE/get_Latent_Factors.R
  • Alternative Version: Available in LOVE-master when pure_homo=False
  • Note: The original LOVE code in LOVE-master may yield different results than the SLIDE implementation and is provided for reference

To-do list

These files

  • Yaml conversion: Since people already have pipelines set up, it would be convenient to have a function to read yamls into dictionaries
  • Other y_factor: Currently only binary y is accomodated.
  • Parallelization: Knockoffs can be made much faster. Please see select_short_freq in src/SLIDE/knockoffs.py. I was trying to use concurrent futures/ pqdm but I couldn't figure out the errors and gave up.
  • Correlation networks: I think networkx can make similar graph-like figures, but I'm not familiar with making them

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loveslide-0.0.5.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loveslide-0.0.5-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file loveslide-0.0.5.tar.gz.

File metadata

  • Download URL: loveslide-0.0.5.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for loveslide-0.0.5.tar.gz
Algorithm Hash digest
SHA256 c371c9847b21d10657a5a5c02d88e1ba08512bb368ddbbba316362649449e198
MD5 81faa8768f9a8763e788d5bc81270fe6
BLAKE2b-256 a103077e89174726f7334a3eaec9624353141ae9651c4fd666061a01158b7166

See more details on using hashes here.

File details

Details for the file loveslide-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: loveslide-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for loveslide-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e5daaafd22afd0a8aee802c0f7c42e57d7fa2f4f82625858566de5d42ed8694d
MD5 123151604e2e4102f920e4221f57dcd8
BLAKE2b-256 c4c7d0b2b7643bebf4e6a0e871dd693679be13019b67e20571398bafc73f1610

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page