Skip to main content

Python version of R package SLIDE

Project description

SLIDE_py

A bunch of python wrappers for R code

Overview

SLIDE combines the LOVE (Latent Model-Based Clustering for Biological Discovery) clustering algorithm with knockoff-based statistical inference to identify significant standalone and interacting latent factors. Link to R package: [!https://github.com/jishnu-lab/SLIDE]

Quick Start

Basic Usage

If running into trouble, feel free to use or clone the environment here: /ix3/djishnu/alw399/envs/rhino

From command line

Use the full path if you are not calling slide.py from the same directory

python slide.py \
    --x_path /path/to/your/features.csv \
    --y_path /path/to/your/labels.csv \
    --out_path /path/to/output/directory
In a notebook
import sys
sys.path.append('src/SLIDE')

from slide import OptimizeSLIDE

# Configure input parameters
input_params = {
    'x_path': '/path/to/your/features.csv',
    'y_path': '/path/to/your/labels.csv',
    'fdr': 0.1,
    'thresh_fdr': 0.1,
    'spec': 0.2,
    'y_factor': True,
    'niter': 500,
    'SLIDE_top_feats': 20,
    'rep_CV': 50,
    'pure_homo': True,
    'delta': [0.01],
    'lambda': [0.5, 0.1],
    'out_path': '/path/to/output/directory'
}

# Initialize and run SLIDE
slider = OptimizeSLIDE(input_params)
slider.run_pipeline(verbose=True, n_workers=1)

Pipeline Overview

The run_pipeline() has three main parts:

Stage 1: Latent Factor Discovery

  • LOVE Algorithm: Runs the overlapping clustering algorithm to identify latent factors
  • Output: Generates the latent factors (z_matrix) representing underlying data structure

Stage 2: Statistical Inference with SLIDE

  • 2a) Standalone Factor Analysis: Uses knockoffs to identify statistically significant standalone latent factors
  • 2b) Interaction Analysis: Applies knockoffs to discover significant interacting latent factor pairs
  • Feature Selection: Controls false discovery rate (FDR) while maintaining statistical power

Stage 3: Visualization

  • Control Plots: Generates diagnostic plots to assess model performance and statistical validity
  • Latent Factor Genes: For each latent factor, plots the top features with loadings > abs(0.05)

Parameter Configuration

Parameter Type Description Default/Example
x_path str Path to feature matrix CSV file Required
y_path str Path to response labels CSV file Required
fdr float False discovery rate threshold (Knockoffs) 0.1
thresh_fdr float FDR threshold for feature selection (LOVE) 0.1
spec float minimum % times an LF found to be significant in order to be included 0.2
y_factor bool Treat response as factor variable True
niter int Number of iterations 500
SLIDE_top_feats int Number of top features to display 20
pure_homo bool Use homogeneous loadings for pure variables True
delta list Regularization parameter(s) [0.5, 0.1]
lambda list Penalty parameter(s) [0.1]
out_path str Output directory path Required

Advanced Configuration

  • pure_homo=True: Forces pure variable loadings to be 1 (recommended)
  • pure_homo=False: Relaxes the pure variable loading constraint being 1 without losing any guarantees. However, it is difficult to find the right delta parameter
  • n_workers: Controls parallelization (1 for sequential processing), but CURRENTLY NOTHING IS PARALLELIZED
  • verbose: Enables detailed progress reporting (just a bunch of print statements)

Project Structure

SLIDE_py/
├── src/
│   ├── SLIDE/              # Core SLIDE implementation
│   │   ├── slide.py        # Main Python interface
│   │   └── ...            # Supporting R functions
│   └── LOVE-master/        # Original LOVE algorithm
│       ├── ...            # Original LOVE code (do not use)
│       ├── ...            # pure_homo LOVE code (use carefully)
|   └── LOVE-SLIDE/        # SLIDE implementation of LOVE

Implementation Details

LOVE Algorithm Integration

  • Primary Implementation: Located in src/SLIDE/get_Latent_Factors.R
  • Alternative Version: Available in LOVE-master when pure_homo=False
  • Note: The original LOVE code in LOVE-master may yield different results than the SLIDE implementation and is provided for reference

To-do list

These files

  • Yaml conversion: Since people already have pipelines set up, it would be convenient to have a function to read yamls into dictionaries
  • Other y_factor: Currently only binary y is accomodated.
  • Parallelization: Knockoffs can be made much faster. Please see select_short_freq in src/SLIDE/knockoffs.py. I was trying to use concurrent futures/ pqdm but I couldn't figure out the errors and gave up.
  • Correlation networks: I think networkx can make similar graph-like figures, but I'm not familiar with making them

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loveslide-0.0.3.tar.gz (30.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loveslide-0.0.3-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file loveslide-0.0.3.tar.gz.

File metadata

  • Download URL: loveslide-0.0.3.tar.gz
  • Upload date:
  • Size: 30.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for loveslide-0.0.3.tar.gz
Algorithm Hash digest
SHA256 8cd3beea8510dbc8479472ed297a0b227dbc8486761ac8b1100e3093fbcb7975
MD5 0ec3e37df36645d8815dbdce9ac739e5
BLAKE2b-256 c9e0db5928dd08533cd696bf2d218d6022fae7a80f705456d9f9062cfe47c583

See more details on using hashes here.

File details

Details for the file loveslide-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: loveslide-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for loveslide-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d52008339fde8c3261fe750f61ab775522b403506792d58416ad853b972e0089
MD5 e15fb12d4d49fc2e02109710022e5549
BLAKE2b-256 d37a2e10306411bca48ad8382d7ad93161a7492c930632850b85281aa5e25f5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page