Python version of R package SLIDE
Project description
SLIDE_py
A bunch of python wrappers for R code
Overview
SLIDE combines the LOVE (Latent Model-Based Clustering for Biological Discovery) clustering algorithm with knockoff-based statistical inference to identify significant standalone and interacting latent factors. Link to R package: [!https://github.com/jishnu-lab/SLIDE]
Quick Start
Basic Usage
If running into trouble, feel free to use or clone the environment here: /ix3/djishnu/alw399/envs/rhino
From command line
Use the full path if you are not calling slide.py from the same directory
python slide.py \
--x_path /path/to/your/features.csv \
--y_path /path/to/your/labels.csv \
--out_path /path/to/output/directory
In a notebook
import sys
sys.path.append('src/SLIDE')
from slide import OptimizeSLIDE
# Configure input parameters
input_params = {
'x_path': '/path/to/your/features.csv',
'y_path': '/path/to/your/labels.csv',
'fdr': 0.1,
'thresh_fdr': 0.1,
'spec': 0.2,
'y_factor': True,
'niter': 500,
'SLIDE_top_feats': 20,
'rep_CV': 50,
'pure_homo': True,
'delta': [0.01],
'lambda': [0.5, 0.1],
'out_path': '/path/to/output/directory'
}
# Initialize and run SLIDE
slider = OptimizeSLIDE(input_params)
slider.run_pipeline(verbose=True, n_workers=1)
Pipeline Overview
The run_pipeline() has three main parts:
Stage 1: Latent Factor Discovery
- LOVE Algorithm: Runs the overlapping clustering algorithm to identify latent factors
- Output: Generates the latent factors (z_matrix) representing underlying data structure
Stage 2: Statistical Inference with SLIDE
- 2a) Standalone Factor Analysis: Uses knockoffs to identify statistically significant standalone latent factors
- 2b) Interaction Analysis: Applies knockoffs to discover significant interacting latent factor pairs
- Feature Selection: Controls false discovery rate (FDR) while maintaining statistical power
Stage 3: Visualization
- Control Plots: Generates diagnostic plots to assess model performance and statistical validity
- Latent Factor Genes: For each latent factor, plots the top features with loadings > abs(0.05)
Parameter Configuration
| Parameter | Type | Description | Default/Example |
|---|---|---|---|
x_path |
str | Path to feature matrix CSV file | Required |
y_path |
str | Path to response labels CSV file | Required |
fdr |
float | False discovery rate threshold (Knockoffs) | 0.1 |
thresh_fdr |
float | FDR threshold for feature selection (LOVE) | 0.1 |
spec |
float | minimum % times an LF found to be significant in order to be included | 0.2 |
y_factor |
bool | Treat response as factor variable | True |
niter |
int | Number of iterations | 500 |
SLIDE_top_feats |
int | Number of top features to display | 20 |
pure_homo |
bool | Use homogeneous loadings for pure variables | True |
delta |
list | Regularization parameter(s) | [0.5, 0.1] |
lambda |
list | Penalty parameter(s) | [0.1] |
out_path |
str | Output directory path | Required |
Advanced Configuration
pure_homo=True: Forces pure variable loadings to be 1 (recommended)pure_homo=False: Relaxes the pure variable loading constraint being 1 without losing any guarantees. However, it is difficult to find the right delta parametern_workers: Controls parallelization (1 for sequential processing), but CURRENTLY NOTHING IS PARALLELIZEDverbose: Enables detailed progress reporting (just a bunch of print statements)
Project Structure
SLIDE_py/
├── src/
│ ├── SLIDE/ # Core SLIDE implementation
│ │ ├── slide.py # Main Python interface
│ │ └── ... # Supporting R functions
│ └── LOVE-master/ # Original LOVE algorithm
│ ├── ... # Original LOVE code (do not use)
│ ├── ... # pure_homo LOVE code (use carefully)
| └── LOVE-SLIDE/ # SLIDE implementation of LOVE
Implementation Details
LOVE Algorithm Integration
- Primary Implementation: Located in
src/SLIDE/get_Latent_Factors.R - Alternative Version: Available in
LOVE-masterwhenpure_homo=False - Note: The original LOVE code in
LOVE-mastermay yield different results than the SLIDE implementation and is provided for reference
To-do list
These files
Yaml conversion: Since people already have pipelines set up, it would be convenient to have a function to read yamls into dictionaries- Other y_factor: Currently only binary y is accomodated.
- Parallelization: Knockoffs can be made much faster. Please see
select_short_freqinsrc/SLIDE/knockoffs.py. I was trying to use concurrent futures/ pqdm but I couldn't figure out the errors and gave up. - Correlation networks: I think networkx can make similar graph-like figures, but I'm not familiar with making them
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file loveslide-0.0.6.tar.gz.
File metadata
- Download URL: loveslide-0.0.6.tar.gz
- Upload date:
- Size: 50.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f848a6f5975337528fa56ccc927f65164c4c4c8a6d8c6dddcc6ec4c5e4804fac
|
|
| MD5 |
46cdbef1f97dafd1557f5a8921353ed7
|
|
| BLAKE2b-256 |
3d591b3f9dad5db615ae517986a1b1af40d9ae9965a0ef37e7104be04b74d6e4
|
File details
Details for the file loveslide-0.0.6-py3-none-any.whl.
File metadata
- Download URL: loveslide-0.0.6-py3-none-any.whl
- Upload date:
- Size: 65.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e4f7bab891f029f0d33cc1c126fa229e5cfe21eb9e9b2a1a7b370bb97fb928f
|
|
| MD5 |
18acc189b5e465c02510e91a647f8c1d
|
|
| BLAKE2b-256 |
ac48f55381fce5a138e5073a6ee3f0cd2f45e470c16ae8c9a85a195ff30c79d3
|