BIO-INSIGHT is a software package designed for intelligent consensus of multiple techniques for inferring gene regulation networks. To this end, a **parallel asynchronous many-objective evolutionary algorithm** is applied to optimize the weights assigned to the different techniques based on 6 biological objectives
Project description
BIO-INSIGHT (Biologically Informed Optimizator - INtegrating Software to Infer Grns by Holistic Thinking) is a software package designed for intelligent consensus of multiple techniques for inferring gene regulation networks. Given the expression levels of different genes subjected to various perturbations, BIO-INSIGHT allows for the inference of the underlying network by applying in parallel a wide variety of known inference techniques and subsequently merging their results. To this end, a parallel asynchronous many-objective evolutionary algorithm is applied to optimize the weights assigned to the different techniques based on observed confidence levels, topological characteristics of the network, network dynamics, detection of highly recurrent motifs in real biological networks, importance of interactions, contextualized analysis of graph metrics and, in case of a time series as input, maintenance of the loyalty to it.
BIO-INSIGHT offers the following functionalities: network inference using individual techniques, optimization of consensus across multiple solutions, construction of benchmark networks (from scratch or based on real networks), evaluation of accuracy concerning gold standards, network binarization algorithms, graphical representation of networks and optimization results, and modularized network segmentation.
To implement all the functionalities mentioned above, it has been necessary to program in multiple languages such as Java, Python, R, Matlab, Julia, etc. To integrate all the utilities into a single tool, it has been decided to dockerize components and use Python as the main means of orchestration. This, in addition to facilitating task parallelization, reduces the complexity of installation and requirements for our software package.
Prerequisites
- Python == 3.10.7
- Docker == 24.0.2
Instalation
pip install geneci==3.0.1
Output
To execute BIO-INSIGHT, the run command is provided with the file containing the expression levels of the genes that make up the network, the list of techniques to be agreed upon and the values of the different algorithm parameters in the event of not wishing to use those established by default. If more than one proposed objectives are used, the following folders are obtained after execution:
ea_consensus: Evolutionary algorithm results.FUN.csv: List with fitness values for each individual in the final population for each of the objective functions.VAR.csv: List of winning weight vectors, i.e., individuals from the last generation.fitness_evolution.txtandfitness_evolution.html: In each generation, the most optimal value found for each objective function is recorded. These values are stored in the text file and subsequently represented in graphs in HTML format.parallel_coordinates.html: File containing a graphical representation of parallel coordinates. Each column refers to a specific objective function, and each horizontal line represents an individual from the final population. This graph is very useful to observe conflicts between different fitness functions in a multi-objective evolutionary algorithm.pareto_front.html: If the number of objectives is less than or equal to 3 (n <= 3), the Pareto front is represented in an n-dimensional graph, where each axis corresponds to a different fitness function.
lists: Set of networks inferred for each of the techniques before their consensus.measurements/techniques_times.txt: File containing the execution time taken by each technique.
Integrated techniques
BIO-INSIGHT integrates the 26 inference techniques already contemplated in MO-GENECI: ARACNE, BC3NET, C3NET, CLR, GENIE3_RF, GRNBOOST2, GENIE3_ET, MRNET, MRNETB, PCIT, TIGRESS, KBOOST, MEOMI, JUMP3, NARROMI, CMI2NI, RSNET, PCACMI, LOCPCACMI, PLSNET, PIDC, PUC, GRNVBEM, LEAP, NONLINEARODES, INFERELATOR.
Example procedure
-
Obtain simulated expression data and their respective gold standards. As in MO-GENECI.
-
Inference and consensus of networks for the selected expression data. To perform this task, you can make use of the run command or proceed to an equivalent execution consisting of the infer-network and optimize-ensemble commands. This can be very useful when you need to incorporate external trust lists or run the evolutionary algorithm with different configurations on the same files, without the need to infer them several times.
- Form 1: Procedure prefixed by the command run.
geneci run --expression-data input_data/DREAM4/EXP/dream4_100_01_exp.csv \
--technique ARACNE --technique BC3NET --technique C3NET --technique CLR \
--technique GENIE3_RF --technique GRNBOOST2 --technique GENIE3_ET \
--technique MRNET --technique MRNETB --technique PCIT --technique TIGRESS \
--technique KBOOST --technique MEOMI --technique NARROMI --technique CMI2NI \
--technique RSNET --technique PCACMI --technique LOCPCACMI --technique PLSNET \
--technique PIDC --technique PUC --technique GRNVBEM --technique LEAP \
--technique NONLINEARODES --technique INFERELATOR \
--crossover-probability 0.9 --mutation-probability 0.05 --population-size 100 \
--num-parents 3 --mutation-strength 0.1 \
--num-evaluations 50000 --cut-off-criteria PercLinksWithBestConf --cut-off-value 0.4 \
--function Quality --function DegreeDistribution --function Motifs \
--function ReduceNonEssentialsInteractions --function Dynamicity \
--function EigenVectorDistribution --algorithm NSGAII --plot-results \
--output-dir inferred_networks
- Form 2: Division of the procedure into several commands
# 1. Inference using individual techniques
geneci infer-network --expression-data input_data/DREAM4/EXP/dream4_100_01_exp.csv \
--technique ARACNE --technique BC3NET --technique C3NET --technique CLR \
--technique GENIE3_RF --technique GRNBOOST2 --technique GENIE3_ET \
--technique MRNET --technique MRNETB --technique PCIT --technique TIGRESS \
--technique KBOOST --technique MEOMI --technique NARROMI --technique CMI2NI \
--technique RSNET --technique PCACMI --technique LOCPCACMI --technique PLSNET \
--technique PIDC --technique PUC --technique GRNVBEM --technique LEAP \
--technique NONLINEARODES --technique INFERELATOR \
--output-dir inferred_networks/geneci_consensus
# 2. Optimize the assembly of the trust lists resulting from the above command
geneci optimize-ensemble --confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_LOCPCACMI.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_BC3NET.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_PLSNET.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_GRNVBEM.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_CMI2NI.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_CLR.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_INFERELATOR.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_GRNBOOST2.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_PCACMI.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_MRNET.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_PCIT.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_KBOOST.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_MEOMI.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_NONLINEARODES.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_GENIE3_ET.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_NARROMI.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_GENIE3_RF.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_RSNET.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_PIDC.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_ARACNE.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_MRNETB.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_TIGRESS.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_LEAP.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_PUC.csv \
--confidence-list inferred_networks/dream4_100_01_exp/lists/GRN_C3NET.csv \
--crossover-probability 0.9 --mutation-probability 0.05 --population-size 100 \
--num-parents 3 --mutation-strength 0.1 \
--num-evaluations 50000 --cut-off-criteria PercLinksWithBestConf --cut-off-value 0.4 \
--function Quality --function DegreeDistribution --function Motifs \
--function ReduceNonEssentialsInteractions --function Dynamicity \
--function EigenVectorDistribution --algorithm NSGAII --plot-results \
--output-dir inferred_networks/geneci_consensus
-
Representation of inferred networks using the draw-network command. As in MO-GENECI.
-
Evaluation of the quality of the inferred gene network with respect to the gold standard. As in MO-GENECI.
-
Binarization of the inferred gene network. As in MO-GENECI.
geneci
Usage:
$ geneci [OPTIONS] COMMAND [ARGS]...
Options:
--install-completion: Install completion for the current shell.--show-completion: Show completion for the current shell, to copy it or customize the installation.--help: Show this message and exit.
Commands:
apply-cut: Converts a list of confidence values into a binary matrix that represents the final gene network.cluster-network: Divide an initial gene network into several communities following the Infomap (recommended) or Louvain grouping algorithm.draw-network: Draw gene regulatory networks from confidence lists.evaluate: Evaluate the accuracy of the inferred network with respect to its gold standard.extract-data: Extract public data generated by simulators such as SynTReN, Rogers and GeneNetWeaver, as well as data from known challenges like DREAM3, DREAM4, DREAM5 and IRMA.generate-data: Simulate time series with gene expression levels using the SysGenSIM simulator. They can be generated from scratch or based on the interactions of a real gene network.infer-network: Infer gene regulatory networks from expression data. Several techniques are available: ARACNE, BC3NET, C3NET, CLR, GENIE3_RF, GRNBOOST2, GENIE3_ET, MRNET, MRNETB, PCIT, TIGRESS, KBOOST, MEOMI, JUMP3, NARROMI, CMI2NI, RSNET, PCACMI, LOCPCACMI, PLSNET, PIDC, PUC, GRNVBEM, LEAP, NONLINEARODES and INFERELATOR.optimize-ensemble: Analyzes several trust lists and builds a consensus network by applying an evolutionary algorithm.run: Infer gene regulatory network from expression data by employing multiple unsupervised learning techniques and applying a genetic algorithm for consensus optimization.plot-optimization: Graph execution results. The evolution of Fitness functions, the Pareto front, the parallel coordinate graph and the chord diagram can be represented. In addition, all of them will be stored in files except the chord diagram, whose interactivity is too complex to be stored.weighted-confidence: Calculate the weighted sum of the confidence levels reported in several files based on a given distribution of weights.
Only the specifications of the main commands are shown below.
geneci optimize-ensemble
Analyzes several trust lists and builds a consensus network by applying an evolutionary algorithm
Usage:
$ geneci optimize-ensemble [OPTIONS]
Options:
--confidence-list TEXT: Paths of the CSV files with the confidence lists to be agreed upon. [required]--gene-names PATH: Path to the TXT file with the name of the contemplated genes separated by comma and without space. If not specified, only the genes specified in the lists of trusts will be considered.--time-series PATH: Path to the CSV file with the time series from which the individual gene networks have been inferred. This parameter is only necessary in case of specifying the fitness function Loyalty.--crossover-probability FLOAT: Crossover probability [default: 0.9]--num-parents INTEGER: Number of parents [default: 3]--mutation-probability FLOAT: Mutation probability. [default: 1/len(files)]--mutation-strength FLOAT: Mutation strength. [default: 0.1]--population-size INTEGER: Population size [default: 100]--num-evaluations INTEGER: Number of evaluations [default: 25000]--cut-off-criteria [MinConf|NumLinksWithBestConf|PercLinksWithBestConf]: Criteria for determining which links will be part of the final binary matrix. [default: PercLinksWithBestConf]--cut-off-value FLOAT: Numeric value associated with the selected criterion. Ex: MinConf = 0.5, NumLinksWithBestConf = 10, PercLinksWithBestConf = 0.4 [default: 0.4]--function TEXT: A mathematical expression that defines a particular fitness function based on the weighted sum of several independent terms. Available terms: Quality, DegreeDistribution and Motifs. [required]--algorithm [GA|NSGAII|SMPSO]: Evolutionary algorithm to be used during the optimization process. All are intended for a multi-objective approach with the exception of the genetic algorithm (GA). [required]--threads INTEGER: Number of threads to be used during parallelization. By default, the maximum number of threads available in the system is used. [default: 64]--plot-results / --no-plot-results: Indicate if you want to represent results graphically. [default: --plot-results]--output-dir PATH: Path to the output folder. [default: <<conf_list_path>>/../ea_consensus]--help: Show this message and exit.
geneci run
Infer gene regulatory network from expression data by employing multiple unsupervised learning techniques and applying a genetic algorithm for consensus optimization.
Usage:
$ geneci run [OPTIONS]
Options:
--expression-data PATH: Path to the CSV file with the expression data. Genes are distributed in rows and experimental conditions (time series) in columns. [required]--time-series PATH: Path to the CSV file with the time series from which the individual gene networks have been inferred. This parameter is only necessary in case of specifying the fitness function Loyalty.--technique [ARACNE|BC3NET|C3NET|CLR|GENIE3_RF|GRNBOOST2|GENIE3_ET|MRNET|MRNETB|PCIT|TIGRESS|KBOOST|MEOMI|JUMP3|NARROMI|CMI2NI|RSNET|PCACMI|LOCPCACMI|PLSNET|PIDC|PUC|GRNVBEM|LEAP|NONLINEARODES|INFERELATOR]: Inference techniques to be performed. [required]--crossover-probability FLOAT: Crossover probability [default: 0.9]--num-parents INTEGER: Number of parents [default: 3]--mutation-probability FLOAT: Mutation probability. [default: 1/len(files)]--mutation-strength FLOAT: Mutation strength. [default: 0.1]--population-size INTEGER: Population size [default: 100]--num-evaluations INTEGER: Number of evaluations [default: 25000]--cut-off-criteria [MinConf|NumLinksWithBestConf|PercLinksWithBestConf]: Criteria for determining which links will be part of the final binary matrix. [default: PercLinksWithBestConf]--cut-off-value FLOAT: Numeric value associated with the selected criterion. Ex: MinConf = 0.5, NumLinksWithBestConf = 10, PercLinksWithBestConf = 0.4 [default: 0.4]--function TEXT: A mathematical expression that defines a particular fitness function based on the weighted sum of several independent terms. Available terms: Quality, DegreeDistribution and Motifs. [required]--algorithm [GA|NSGAII|SMPSO]: Evolutionary algorithm to be used during the optimization process. All are intended for a multi-objective approach with the exception of the genetic algorithm (GA). [required]--threads INTEGER: Number of threads to be used during parallelization. By default, the maximum number of threads available in the system is used. [default: 64]--str-threads TEXT: Comma-separated list with the identifying numbers of the threads to be used. If specified, the threads variable will automatically be set to the length of the list.--plot-results / --no-plot-results: Indicate if you want to represent results graphically. [default: --plot-results]--output-dir PATH: Path to the output folder. [default: inferred_networks]--help: Show this message and exit.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geneci-3.0.1.tar.gz.
File metadata
- Download URL: geneci-3.0.1.tar.gz
- Upload date:
- Size: 35.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
100e0e643a764f31fcc57809104f69b25b2dc674b50a6ce396f973b3c3d9eae9
|
|
| MD5 |
9e9e02469a4d9b5fab1005bd7165fb6c
|
|
| BLAKE2b-256 |
8ce8c0d21c1be307ffe0e9dc88f831b6480edcbf11d5727e314564ea1d0623c4
|
File details
Details for the file geneci-3.0.1-py3-none-any.whl.
File metadata
- Download URL: geneci-3.0.1-py3-none-any.whl
- Upload date:
- Size: 32.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65baf95dd5e5caaee8f6e1f97f198accd8b10a67bd23a0f2fae5b2b85c01b742
|
|
| MD5 |
6f9f7899e0255fd873de24832aead98d
|
|
| BLAKE2b-256 |
fde139d24eb8c04e936c9a2e3a311e2fbcd8e05f919895a8fe7125f00e2b34a7
|