Differential gene expression analysis via Monte-Carlo Machine Learning Inference and Network Analysis

These details have not been verified by PyPI

Project links

Project description

GeneLens: Integrated DEG Analysis & Biomarker Prediction

Python Version License

Overview

GeneLens is a Python package for functional analysis of differentially expressed genes (DEGs) and biomarker prediction, integrating:

Machine learning-based biomarker identification
Graph-based prediction of gene function via protein-protein interaction networks analysis

Key applications:

Identification of biomarkers
Analysis of gene-gene networks

Features

Core Modules

FSelector
- Machine learning pipeline for biomarker discovery
- Features:
  - Automatic Monte Carlo simulation of stable models
  - Automated model training/tuning
  - Feature importance analysis
  - Customizable thresholds
NetAnalyzer
- Implements graph-based algorithm (Osmak et al. 2020, 2021)
- Predicts genes functions via topological analysis of molecular networks
- Features:
  - Automated network construction
  - Pathway enrichment
  - Integration with Feature importance from FSelector

Additional Capabilities

Standardized analysis pipelines
Interactive network visualizations
Support for multi-omics data integration

Installation

pip install genelens

Example of use

from genelens.fselector import FeatureSelector, get_feature_space, fsplot
from genelens import netanalyzer, enrichment
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import networkx as nx
from importlib.resources import files

# data load
data = pd.read_csv(files("genelens").joinpath("data/exampl_data/train_test.csv"), index_col=0)

X = data.drop('index', axis=1)
y = list(map(int, data['index'] == 'HCM'))

print(X.shape)

(145, 14830)

# FeatureSelector initialization
FS_model = FeatureSelector(X, y,
                           C = None, 
                           C_space=np.linspace(0.0001, 1, 20), #bigger space -> more precision, more processor time
                           C_finder_iter=10,
                           cut_off_frac_estimation=True,
                           cut_off_frac_model=0,
                           cut_off_estim_params={'max_feature': 50}) # This parameter implements early stopping. Bigger feature space -> more precision, more processor time

The regularization coefficient was not specified, the search for the optimal C was started


processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [02:20<00:00, 14.07s/it]


Optimal regularization coefficient (С) =  0.053

Prefit model for cutoff weight level estimation


fit model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3000/3000 [12:38<00:00,  3.95it/s]


Prefit done
Serching cutoff level for feature weights... 0 

feature space analysis:  13%|██████████████████▌                                                                                                                               | 27/213 [00:07<00:51,  3.59it/s]


1 

feature space analysis:  13%|██████████████████▌                                                                                                                               | 27/213 [00:07<00:51,  3.60it/s]


2 

feature space analysis:  13%|██████████████████▌                                                                                                                               | 27/213 [00:07<00:51,  3.59it/s]


3 

feature space analysis:  13%|██████████████████▌                                                                                                                               | 27/213 [00:07<00:51,  3.59it/s]


4 

feature space analysis:  13%|██████████████████▌                                                                                                                               | 27/213 [00:07<00:51,  3.61it/s]


5 

feature space analysis:  13%|██████████████████▌                                                                                                                               | 27/213 [00:07<00:52,  3.57it/s]


6 

feature space analysis:  13%|██████████████████▌                                                                                                                               | 27/213 [00:07<00:51,  3.60it/s]


7 

feature space analysis:  13%|██████████████████▌                                                                                                                               | 27/213 [00:07<00:52,  3.57it/s]


8 

feature space analysis:  13%|██████████████████▌                                                                                                                               | 27/213 [00:07<00:51,  3.61it/s]


9 

feature space analysis:  13%|██████████████████▌                                                                                                                               | 27/213 [00:07<00:52,  3.56it/s]

optimal cut of weight level =  0.72

FS_model.fit(max_iter=2700, log=True, feature_resample=0) #more max_iter -> more precision, more processor time

fit model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2700/2700 [11:20<00:00,  3.97it/s]

fsplots = fsplot(FS_model)
fsplots.plot_all(fontsize=25, labels=['a.', 'b.', 'c.', 'd.', 'e.', 'f.'], 
                left=0.1, right=0.9, top=0.9, bottom=0.1, hspace=0.5, wspace=0.5)
plt.show()

png

print(get_feature_space([FS_model], cut_off_level=0.75))

{'MYH6', 'RASD1'}

FS_model.best_features

{'RASD1': np.float64(0.9510623822037754),
 'MYH6': np.float64(0.8420449794132905)}

Network Enrichment Analysis

GenGenNetwork = netanalyzer.MainNet() #Load String db and create gene-gene interaction network
GenGenNetwork.get_LCC() #get the largest connected component from the network

LCC was extracted
Total connected components=146, LCC cardinality=9844

GenGenNetwork.minimum_connected_subgraph(FS_model.best_features)

RASD1 absent from LCC, excluded from further analysis
CDC42EP4 absent from LCC, excluded from further analysis

mst-graph was extracted
Initial core feature=1, mst-graph cardinality=0

Two of the three selected genes are missing from the version of the String database we are using. Therefore, it is not possible to construct an mst-graph. To continue the analysis, we will select the top 10 genes sorted by their Score

GenGenNetwork.minimum_connected_subgraph(dict(list(FS_model.all_features.items())[:10]))

RASD1 absent from LCC, excluded from further analysis
CDC42EP4 absent from LCC, excluded from further analysis
ZFP36 absent from LCC, excluded from further analysis

mst-graph was extracted
Initial core feature=7, mst-graph cardinality=17

pos = nx.circular_layout(GenGenNetwork.mst_subgraph)

nx.draw(
    GenGenNetwork.mst_subgraph,
    pos,
    with_labels=True,       
    node_color='skyblue',   
    edge_color='gray',      
    node_size=2000,         
    font_size=15            
)

# Показываем граф
plt.show()

png

enrich_res = enrichment.reactome_enrichment(list(GenGenNetwork.mst_subgraph.nodes()), species='Homo sapiens')
enrich_res = enrichment.reac_pars(enrich_res)
G_enrich = enrichment.get_net(enrich_res) #граф сигнальных путей

reactome_df, raw_res = enrichment.dendro_reactome_to_pandas(enrich_res, G_enrich)

enrichment.get_dendro(reactome_df, FS_model.all_features)

<Figure size 2400x2400 with 0 Axes>

png

The color gradient from gray to red in the signatures reflects the increase in the weight of genes according to their calculated Score. The redder the signature, the higher the weight.

More information can be found in our publications:

Pisklova, M., Osmak, G. (2024). Unveiling MiRNA-124 as a biomarker in hypertrophic cardiomyopathy: An innovative approach using machine learning and intelligent data analysis. International Journal of Cardiology, 410, 132220.
Osmak, G., Baulina, N., Kiselev, I., & Favorova, O. (2021). MiRNA-regulated pathways for hypertrophic cardiomyopathy: network-based approach to insight into pathogenesis. Genes, 12(12), 2016.
Osmak, G., Kiselev, I., Baulina, N., & Favorova, O. (2020). From miRNA target gene network to miRNA function: miR-375 might regulate apoptosis and actin dynamics in the heart muscle via Rho-GTPases-dependent pathways. International Journal of Molecular Sciences, 21(24), 9670.
Osmak, G. J., Pisklova, M.V. (2025). Transcriptomics and the “Curse of Dimensionality”: Monte Carlo Simulations of ML-Models as a Tool for Analyzing Multidimensional Data in Tasks of Searching Markers of Biological Processes. Molecular Biology, 59, 143-149.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.11

Aug 17, 2025

0.1.10

Apr 8, 2025

0.1.9

Apr 8, 2025

0.1.8

Apr 8, 2025

0.1.7

Apr 8, 2025

0.1.6

Apr 7, 2025

0.1.5

Apr 7, 2025

0.1.4

Apr 6, 2025

0.1.3

Apr 6, 2025

This version

0.1.2

Apr 6, 2025

0.1.1

Apr 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genelens-0.1.2.tar.gz (29.5 MB view details)

Uploaded Apr 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

genelens-0.1.2-py3-none-any.whl (29.8 MB view details)

Uploaded Apr 6, 2025 Python 3

File details

Details for the file genelens-0.1.2.tar.gz.

File metadata

Download URL: genelens-0.1.2.tar.gz
Upload date: Apr 6, 2025
Size: 29.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.6

File hashes

Hashes for genelens-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`64fcf39408cf117d5b0ed222782cf387eb8ab8aadcd22bbc65feb6adb3215dd9`
MD5	`3a53de2cf70b78cc10687b7aa8f10d62`
BLAKE2b-256	`5ffa83260b7c0f00fd6a5026b00819e8dee5d529c4ec8bf035bebda4eb7c40c6`

See more details on using hashes here.

File details

Details for the file genelens-0.1.2-py3-none-any.whl.

File metadata

Download URL: genelens-0.1.2-py3-none-any.whl
Upload date: Apr 6, 2025
Size: 29.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.6

File hashes

Hashes for genelens-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a8f0e30e6b7761670eebf00418952fb7cd810085ebc36cb016a777fd0f00b41c`
MD5	`ee91c8f60be851a59b5eb07941d0c78f`
BLAKE2b-256	`78b72700b4211a60cac032dbf5c0663907a04bed348ccd98c252ff674446862a`

See more details on using hashes here.

GeneLens 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GeneLens: Integrated DEG Analysis & Biomarker Prediction

Overview

Features

Core Modules

Additional Capabilities

Installation

Example of use

Network Enrichment Analysis

Two of the three selected genes are missing from the version of the String database we are using. Therefore, it is not possible to construct an mst-graph. To continue the analysis, we will select the top 10 genes sorted by their Score

The color gradient from gray to red in the signatures reflects the increase in the weight of genes according to their calculated Score. The redder the signature, the higher the weight.

More information can be found in our publications:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes