Skip to main content

Python version of R package SLIDE

Project description

SLIDE_py

A bunch of python wrappers for R code

Overview

SLIDE combines the LOVE (Latent Model-Based Clustering for Biological Discovery) clustering algorithm with knockoff-based statistical inference to identify significant standalone and interacting latent factors. Link to R package: [!https://github.com/jishnu-lab/SLIDE]

Quick Start

Basic Usage

If running into trouble, feel free to use or clone the environment here: /ix3/djishnu/alw399/envs/rhino

From command line

Use the full path if you are not calling slide.py from the same directory

python slide.py \
    --x_path /path/to/your/features.csv \
    --y_path /path/to/your/labels.csv \
    --out_path /path/to/output/directory
In a notebook
import sys
sys.path.append('src/SLIDE')

from slide import OptimizeSLIDE

# Configure input parameters
input_params = {
    'x_path': '/path/to/your/features.csv',
    'y_path': '/path/to/your/labels.csv',
    'fdr': 0.1,
    'thresh_fdr': 0.1,
    'spec': 0.2,
    'y_factor': True,
    'niter': 500,
    'SLIDE_top_feats': 20,
    'rep_CV': 50,
    'pure_homo': True,
    'delta': [0.01],
    'lambda': [0.5, 0.1],
    'out_path': '/path/to/output/directory'
}

# Initialize and run SLIDE
slider = OptimizeSLIDE(input_params)
slider.run_pipeline(verbose=True, n_workers=1)

Pipeline Overview

The run_pipeline() has three main parts:

Stage 1: Latent Factor Discovery

  • LOVE Algorithm: Runs the overlapping clustering algorithm to identify latent factors
  • Output: Generates the latent factors (z_matrix) representing underlying data structure

Stage 2: Statistical Inference with SLIDE

  • 2a) Standalone Factor Analysis: Uses knockoffs to identify statistically significant standalone latent factors
  • 2b) Interaction Analysis: Applies knockoffs to discover significant interacting latent factor pairs
  • Feature Selection: Controls false discovery rate (FDR) while maintaining statistical power

Stage 3: Visualization

  • Control Plots: Generates diagnostic plots to assess model performance and statistical validity
  • Latent Factor Genes: For each latent factor, plots the top features with loadings > abs(0.05)

Parameter Configuration

Parameter Type Description Default/Example
x_path str Path to feature matrix CSV file Required
y_path str Path to response labels CSV file Required
fdr float False discovery rate threshold (Knockoffs) 0.1
thresh_fdr float FDR threshold for feature selection (LOVE) 0.1
spec float minimum % times an LF found to be significant in order to be included 0.2
y_factor bool Treat response as factor variable True
niter int Number of iterations 500
SLIDE_top_feats int Number of top features to display 20
pure_homo bool Use homogeneous loadings for pure variables True
delta list Regularization parameter(s) [0.5, 0.1]
lambda list Penalty parameter(s) [0.1]
out_path str Output directory path Required

Advanced Configuration

  • pure_homo=True: Forces pure variable loadings to be 1 (recommended)
  • pure_homo=False: Relaxes the pure variable loading constraint being 1 without losing any guarantees. However, it is difficult to find the right delta parameter
  • n_workers: Controls parallelization (1 for sequential processing), but CURRENTLY NOTHING IS PARALLELIZED
  • verbose: Enables detailed progress reporting (just a bunch of print statements)

Project Structure

SLIDE_py/
├── src/
│   ├── SLIDE/              # Core SLIDE implementation
│   │   ├── slide.py        # Main Python interface
│   │   └── ...            # Supporting R functions
│   └── LOVE-master/        # Original LOVE algorithm
│       ├── ...            # Original LOVE code (do not use)
│       ├── ...            # pure_homo LOVE code (use carefully)
|   └── LOVE-SLIDE/        # SLIDE implementation of LOVE

Implementation Details

LOVE Algorithm Integration

  • Primary Implementation: Located in src/SLIDE/get_Latent_Factors.R
  • Alternative Version: Available in LOVE-master when pure_homo=False
  • Note: The original LOVE code in LOVE-master may yield different results than the SLIDE implementation and is provided for reference

To-do list

These files

  • Yaml conversion: Since people already have pipelines set up, it would be convenient to have a function to read yamls into dictionaries
  • Other y_factor: Currently only binary y is accomodated.
  • Parallelization: Knockoffs can be made much faster. Please see select_short_freq in src/SLIDE/knockoffs.py. I was trying to use concurrent futures/ pqdm but I couldn't figure out the errors and gave up.
  • Correlation networks: I think networkx can make similar graph-like figures, but I'm not familiar with making them

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loveslide-0.0.6.tar.gz (50.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loveslide-0.0.6-py3-none-any.whl (65.3 kB view details)

Uploaded Python 3

File details

Details for the file loveslide-0.0.6.tar.gz.

File metadata

  • Download URL: loveslide-0.0.6.tar.gz
  • Upload date:
  • Size: 50.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for loveslide-0.0.6.tar.gz
Algorithm Hash digest
SHA256 f848a6f5975337528fa56ccc927f65164c4c4c8a6d8c6dddcc6ec4c5e4804fac
MD5 46cdbef1f97dafd1557f5a8921353ed7
BLAKE2b-256 3d591b3f9dad5db615ae517986a1b1af40d9ae9965a0ef37e7104be04b74d6e4

See more details on using hashes here.

File details

Details for the file loveslide-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: loveslide-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 65.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for loveslide-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 3e4f7bab891f029f0d33cc1c126fa229e5cfe21eb9e9b2a1a7b370bb97fb928f
MD5 18acc189b5e465c02510e91a647f8c1d
BLAKE2b-256 ac48f55381fce5a138e5073a6ee3f0cd2f45e470c16ae8c9a85a195ff30c79d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page