Skip to main content

BoolSi is a tool for distributed simulations and analysis of Boolean networks

Project description

BoolSi

BoolSi is an open-source command line tool for distributed simulations of deterministic Boolean networks with synchronous update. It uses MPI standard to allow execution on computational clusters, as well as parallel processing on a single computer.

BoolSi can simulate from a range of initial states of the network, identify and analyze network attractors, and find conditions that lead to specific states of the network. It also allows user to fix the state of any network node for the length of entire simulation (e.g. modeling gene knockout in regulatory networks), or perturb the state of the node at any time step (e.g. modeling sensory input in robotics).

Installation

Prerequisites

BoolSi requires:

  • Python 3.4 or higher

To use BoolSi with MPI you must also install:

For MPI Implementation installation instructions, please refer to vendor documentation:

We also expect BoolSi to work with other MPI Implementations, although they have not been tested.

To install mpi4py:

$ pip install mpi4py

All other Python package dependencies will be installed automatically when installing BoolSi.

PIP

$ pip install boolsi

Install from source

$ git clone https://github.com/boolsi/boolsi.git
$ cd boolsi
$ pip install .

MPI usage

To run BoolSi using MPI:

$ mpiexec -np [NUMBER OF PROCESSES] boolsi ...

Note that forceful termination in MPI (Abort) does not guarantee graceful shutdown of an application. As a consequence, aborting MPI run on Windows or in attract mode without the flag -i may leave orphaned files in the database directory (defaults to "<current directory>/tmp_db").

Quickstart

BoolSi provides 3 terminal commands to simulate boolean networks:

  • simulate to simulate for a number of time steps
  • attract to find and analyze attractors for correlations between the nodes
  • target to find conditions leading to specific states of the network

Each command allows to run simulations from multiple initial states, with a time cap, and overriding update rule (see Advanced usage for the latter). All flags and arguments for the commands are covered in Command line reference.

Commands produce different types of output. simulate produces simulations — lists of states of boolean network from its initial state. attract produces attractors — lists of states that form cycles. target produces simulations, but only those that reach specified target states.

Boolean network to simulate needs to be described in YAML file of a form:

nodes:
- A
- B
- C

update rules:
    A: not B
    B: A and C
    C: not A or B
    
initial state:
    A: 0
    B: 1
    C: any

For full input file syntax refer to Input file reference.

Let's run through some basic examples.

Example 1. Simulate network from example1.yaml for 5 steps

From the command line, run:

$ boolsi simulate examples/example1.yaml -t 5

Because input file example1.yaml specifies multiple initial states (by using keyword any for node C), BoolSi will run separate simulation for each of them (A: 0, B: 1, C: 0 and A: 0, B: 1, C: 1):

18-Aug-2019 20:17:16 Hi! All BoolSi output (including this log) will appear in "/Users/user/PycharmProjects/boolsi/examples/output2_example1/".
18-Aug-2019 20:17:16 Run parameters: "simulate examples/example1.yaml -t 5 -o examples/output2_example1".
18-Aug-2019 20:17:16 Initializing temporary storage in "/Users/user/PycharmProjects/boolsi/.tmp_storage/".
18-Aug-2019 20:17:16 Read Boolean network of 3 nodes.
18-Aug-2019 20:17:16 Single process will be used to perform 2 simulations...
...

The above also tells you where to look for the simulations after the run is finished (/output_20190403T003836/):

By default the results are printed in PDF, but other formats are available, including machine-readable CSV (see Output reference).

Example 2. Find and analyze attractors of a network

Input file example1.yaml specifies 2 initial states of the network to simulate from. If we want to find all attractors of the network, we need to simulate it from all possible initial states. For this, we create input file example2.yaml, where the initial state of every node is set to any:

nodes:
- A
- B
- C

update rules:
    A: not B
    B: A and C
    C: not A or B
    
initial state:
    A: any
    B: any
    C: any

Running the command:

$ boolsi attract examples/example2.yaml

will find all attractors and calculate correlations between the node states therein.

Example 3. Find simulations that reach specific states of a network

If we want to find conditions that lead to the states where e.g. A is 0 and B is 1, we first need to specify them as target states. For this, we create input file example3.yaml with added section target state. We also set the initial state of A to 1 in order to omit the trivial results, when the network starts in a target state.

nodes:
- A
- B
- C

update rules:
    A: not B
    B: A and C
    C: not A or B
    
initial state:
    A: 1
    B: any
    C: any
   
target state:
    A: 0
    B: 1
    C: any

Running the command:

$ boolsi target examples/example3.yaml

will find the simulations reaching either A: 0, B: 1, C: 0 or A: 0, B: 1, C: 1.

Advanced usage

BoolSi allows to override simulation process, which can be used to model external influence on the network. User can fix the state of any node for the entire simulation (e.g. modeling gene knockout in regulatory networks), or perturb it at any time step (e.g. modeling sensory input in robotics).

Fixed nodes

To permanently fix the state of a node, specify it under fixed nodes section of the input file. For example, these lines set the state of node A to 0 and B to 1 for the entire simulation:

fixed nodes:
    A: 0
    B: 1

Adding the above to example1.yaml will make the simulations look like:

When running simulate or target, it is also possible to use 0?, 1?, any, any? values.

    C: 0?

0? (1?) doubles the number of produced simulations by fixing the state of node C to 0(1) in addition to simulating with unfixed C.

    C: any

any doubles the number of simulations by fixing the state of C separately to 0 and to 1.

    C: any?

any? triples the number of simulations by fixing the state of C separately to 0 and 1, and simulating with unfixed C.

Perturbations

To force a node into particular state at particular time step, use perturbations section. For example, to set the state of node A to 0 at time step 4 and to 1 at time step 5, and the state of node B to 1 at time steps 1, 2, 3, 5:

perturbations:
    A: 
        0: 4
        1: 5
    B:
        1: 1-3, 5

Similarly to fixed nodes, it is possible to use 0?, 1?,any, any? values when running simulate or target. For example, the following lines increase the number of produced simulations by a factor of 24 = 16 because the state to force C into (0 or 1) is considered independently at each time step.

perturbations:
    C: 
        any: 2, 3, 5, 7

When running attract or target, BoolSi starts looking for attractors or target states only after all perturbations have occurred.

Command line reference

To run BoolSi, open the terminal and type:

boolsi COMMAND [ARGS] [OPTIONS]

You can stop BoolSi at any point by pressing Ctrl+C.

Commands:

simulate Perform simulations.

attract Find and analyze attractors.

target Find simulations that reach target states.

Common arguments:

input_file (Mandatory) path to YAML file that contains Boolean network description.

Common options

-h, --help Prints help for the current command.

-o, --output-directory Path to directory to print output into. Defaults to "<current directory>/output_<timestamp>".

-b, --batches-per-process Number of batches to split each each process will handle. Defaults to 100.

-d, tmp-database-directory Directory to store intermediate results in. Defaults to "<current directory>/.tmp_db".

--no-pdf Disable PDF output. PDF output is enabled by default.

--pdf-page-limit Maximum number of PDF pages to print. Defaults to 100. Only works when PDF output is enabled.

--no-csv Disable CSV output. CSV output is enabled by default.

--print-png Enable PNG output. PNG output is disabled by default.

--png-dpi PNG dpi. Defaults to 300. Only works when PNG output is enabled.

--print-tiff Enable TIFF output. TIFF output is disabled by default.

--tiff-dpi TIFF dpi. Defaults to 150. Only works when TIFF output is enabled.

--print-svg Enable SVG output. SVG output is disabled by default.

simulate options

-t, --simulation-time (Mandatory) number of time steps to simulate for.

attract options

-t, --max-simulation-time Maximum simulation time. If set, simulation stops after this time step even if an attractor was not reached.

-a, --max-attractor-length Maximum length of attractor to look for. If set, attractors longer than this value will be ignored.

-r, --reduce-memory-usage Turn off storing all simulation states. Slows down the search for attractors.

-i, --ignore-stale-db-items Turn off cleaning stale attractors from database on each write. Greatly increases disk space usage, speeds processing up, and makes ETA more accurate.

-c, --no-node-correlations Turn off computing Spearman's correlations between node states in attractors.

-p, --p-value p-value threshold for statistical significance of node correlations. Defaults to 0.05.

target options

-t, --max-simulation-time Maximum simulation time. If set, simulation stops after this time step even if a target state wasn't reached.

-n, --n-simulations-reaching-target Stop after this many simulations have reached target state.

Input file reference

Input file uses YAML syntax to describe a Boolean network and how its state changes over time.

Node names

Section node names of input file contains YAML list of node names. Node names must be unique valid YAML strings and contain no whitespaces, parentheses, or commas. In addition, a node name cannot be any of the reserved words 0, 1, and, or, not, majority.

Node name format is compatible with Python's MathText, allowing for basic TeX-style expressions.

node names:
    -composite_node_name
    -$X_5$
    -SOMEGENE4

Update rules

Section update rules contains rules that define next state of the network based on its current state. A rule for each individual node is a Boolean function on a subset of the current node states. It is written as an expression that can contain node names, logical operators and, or, and not, parentheses, constants 0 and 1, and function majority.

majority takes n Boolean inputs and returns 1 if and only if more than n/2 of its inputs are 1. In particular, when n is even and exactly half of the inputs are 0 (and the other half is 1), majority returns 0. To change this tie-breaking behavior to return 1, add 1 as an input to the function. Similarly, you could make majority return current state of some node X in case of a tie by adding X as an input instead.

update rules:
    A: (A or not B) and C
    B: 1
    C: majority(not A, B, D)                # Tie cannot happen on odd number of inputs.
    D: majority(A, B)                       # Returns 0 when A and B are tied.
    E: majority(A, B, 1)                    # Returns 1 when A and B are tied.
    F: majority(A, B, B)                    # Returns B when A and B are tied.
    G: majority(A, B, C or D)               # Returns C or D when A and B are tied.
    H: A and majority(C, not B, not D, G)
    

Note: update rule for a node is stored as a truth table, whose size increases exponentially with the number of nodes it depends on. Specifying more than 20 nodes in update rule can put your system at risk of running out of memory.

Initial state

Section initial state defines initial state or range of initial states to simulate from. State of each node must be specified as either 0, 1, or any. Specifying any will simulate from both node states (0 and 1), and thus the number of simulations will double.

In the following example, the network will be simulated from both A: 0, B: 1, C: 0 and A: 0, B: 1, C: 1:

initial state:
    A: 0
    B: 1
    C: any

Fixed nodes

Section fixed nodes allows to define nodes whose state doesn't change throughout a simulation. Setting node to a fixed state 0 or 1 simply overrides its update rule with a constant, while any sets it to both 0 and 1 in separate simulations. Their counterparts with ? (0?, 1?, or any?) will simulate with both regular and overridden update rule of a node. Thus, specifying any, 0?, or 1? for a node doubles the number of simulations, while any? — triples it.

fixed nodes:
    A: 0
    B: 1
    C: 0?
    D: 1?
    E: any
    F: any?

Perturbations

Section perturbations allows to override node states at specific time steps. Similarly to Fixed nodes, the states to override with are defined by 0, 1, any, 0?, 1?, or any? but you also need to specify time steps when the override takes place.

perturbations:
    A: 
        0: 5
        1: 6
    B:
        1: 10, 15-17, 20
    C:
        0?: 100
        1?: 200    
    D:
        any?: 123

Time steps may be given individually (e.g. 5, 6, 7, 8) or in ranges (e.g. 5-8). They must be 1 or greater (0 corresponds to initial state) and can't exceed maximum simulation time (if specified).

If a perturbation occurs with a node in fixed state, this state will be overridden at the time steps of the perturbation.

Note that the search for attractors and target states starts only after the last perturbation has occurred.

Target state

Section target state is only applicable when executing target command. It defines target state or range of target states to look for in simulations.

Syntax is identical to that of initial state — state of each node must be specified as either 0, 1, or any, where any means that both 0 and 1 are acceptable in a target state.

target state:
    A: 0
    B: 1
    C: any

In the above example, the BoolSi will be looking for simulations that reach either A: 0, B: 1, C: 0 or A: 0, B: 1, C: 1.

Output

Graphical

BoolSi supports a number of graphical output formats: PDF, SVG, PNG, and TIFF. Each simulation or attractor is printed as a separate page of the PDF document and/or an SVG/PNG/TIFF image file.

Below is an example output of a network (namely, Cambium regulation network) simulated from an individual initial state for 10 time steps. Note that initial state (at time step 0) is not counted towards the simulation length.

Here is an example output of an attractor of the same network. Unlike in simulations, time steps in attractors are relative. Note that first state of an attractor in the output is not guaranteed to be the same as the first state of attractor's first occurrence in the simulation.

attract also prints pairwise Spearman's correlations between averaged node states across all attractors.

CSV

By default, BoolSi additionally writes the output to CSV to allow for machine processing. The output consists of two CSV files: one containing simulation/attractor summaries, and another — simulations or attractors themselves.

A summary of a simulation contains its length, flags representing whether individual nodes were fixed, and the number of perturbations for each node:

simulation_id,length,CK0_is_fixed,CK_is_fixed,AHK_is_fixed,AHP_is_fixed,BARR_is_fixed,IPT_is_fixed,PIN_is_fixed,CKX_is_fixed,AARR_is_fixed,ERF_is_fixed,AHP6_is_fixed,TDIF_is_fixed,PXY_is_fixed,IAA0_is_fixed,IAA_is_fixed,BR_is_fixed,BZR1_is_fixed,STM_is_fixed,GA_is_fixed,GID1_is_fixed,ETHL_is_fixed,ARF_is_fixed,HB8_is_fixed,TMO5_is_fixed,WOX4_is_fixed,LOG3_is_fixed,DELL_is_fixed,WRKY_is_fixed,LHW_is_fixed,ENDO_is_fixed,n_CK0_perturbations,n_CK_perturbations,n_AHK_perturbations,n_AHP_perturbations,n_BARR_perturbations,n_IPT_perturbations,n_PIN_perturbations,n_CKX_perturbations,n_AARR_perturbations,n_ERF_perturbations,n_AHP6_perturbations,n_TDIF_perturbations,n_PXY_perturbations,n_IAA0_perturbations,n_IAA_perturbations,n_BR_perturbations,n_BZR1_perturbations,n_STM_perturbations,n_GA_perturbations,n_GID1_perturbations,n_ETHL_perturbations,n_ARF_perturbations,n_HB8_perturbations,n_TMO5_perturbations,n_WOX4_perturbations,n_LOG3_perturbations,n_DELL_perturbations,n_WRKY_perturbations,n_LHW_perturbations,n_ENDO_perturbations
simulation_1,10,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0

A summary of an attractor contains its length, mean and standard deviation of its trajectory length, and relative frequency of attractor occurrences across all performed simulations:

attractor_id,length,trajectory_length_mean,trajectory_length_SD,relative_frequency
attractor_1,1,11.018409729,2.573900999723755,0.0625
...
attractor_15,13,7.575435556,2.5215366582580407,0.031365394592285156
...

In a file storing simulations or attractors, _ after a node state means that the state of the node is fixed. * after a node state means that the node was perturbed into this state at the corresponding time step.

Simulation example:

simulation_id,time_step,CK0,CK,AHK,AHP,BARR,IPT,PIN,CKX,AARR,ERF,AHP6,TDIF,PXY,IAA0,IAA,BR,BZR1,STM,GA,GID1,ETHL,ARF,HB8,TMO5,WOX4,LOG3,DELL,WRKY,LHW,ENDO
simulation_1,0,1,1,1,1,1,1,1,1,1,1,0,0_,1,1,0,1,0,1,0,0,1,0,1,0,0,0,0,0,0,0
simulation_1,1,1,0,1,1,1,1,1,1,1,1,0,0_,0,1,0,1,1,0,0,0,1,0,0,0,1,0,1,0,0,1
simulation_1,2,1,0,0,1,1,1,0,0,1,1,0,0_,0,1,0,1,0,0,0,0,1*,0,0,0,1,0,1,1,1,1
simulation_1,3,1,1,0,0,1,1,0,0,1,1,0,0_,0,1,1,1,0,0,0,0,1*,0,0,0,1,0,1,0,1,1
simulation_1,4,1,1,1,0,0,1,0,0,1,1,0,0_,0,1,1,1,0,0,0,0,1*,1,0,0,1,0,0,0,1,0
simulation_1,5,1,1,1,1,0,0,0,0,0,0,1,0_,0,1,1,1,1,0,0,0,1*,1,0,1,1,0,0,0,1,0
simulation_1,6,1,0,1,1,1,0,0,0,0,0,1,0_,0,1,1,1,1,0,0,0,1,1,1,1,0,1,0,1,1,0
simulation_1,7,1,1,0,1,1,1,1,0,0,1,0,0_,0,1,1,1,1,0,0,0,1,1,1,1,0,1,0,1,1,0
simulation_1,8,1,1,1,0,1,1,1,0,0,1,0,0_,0,1,0,1,1,0,0,0,1,1,1,1,1,1,0,1,1,0
simulation_1,9,1,1,1,1,0,1,1,0,0,1,0,0_,0,1,0,1,1,0,0,0,1,0,1,1,1,1,1,1,1,1
simulation_1,10,1,1,1,1,1,1,0,0,0,1,0,0_,0,1,0,1,0,0,0,0,1,0,0,0,1,1,1,1,1,1

Attractor example:

attractor_id,time_step,CK0,CK,AHK,AHP,BARR,IPT,PIN,CKX,AARR,ERF,AHP6,TDIF,PXY,IAA0,IAA,BR,BZR1,STM,GA,GID1,ETHL,ARF,HB8,TMO5,WOX4,LOG3,DELL,WRKY,LHW,ENDO
attractor_1,t,0,0,0,0,0,0,0,1,0,0,1,0_,0,1,1,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0
...
attractor_15,t,0,1,0,0,0,0,0,0,0,1,1,0_,0,1,1,1,1,0,0,0,1,1,0,1,1,0,0,0,1,0
attractor_15,t+1,0,0,1,0,0,0,0,0,0,0,1,0_,0,1,1,1,1,0,0,0,1,1,1,1,1,1,0,1,1,0
attractor_15,t+2,0,0,0,1,0,0,0,0,0,1,1,0_,0,1,1,1,1,0,0,0,1,1,1,1,0,1,0,1,1,0
attractor_15,t+3,0,0,0,0,1,0,0,0,0,1,1,0_,0,1,1,1,1,0,0,0,1,1,1,1,1,1,0,1,1,0
attractor_15,t+4,0,0,0,0,0,1,1,0,0,1,0,0_,0,1,1,1,1,0,0,0,1,1,1,1,1,1,0,1,1,0
attractor_15,t+5,0,1,0,0,0,0,0,0,0,1,1,0_,0,1,0,1,1,0,0,0,1,1,1,1,1,1,0,1,1,0
attractor_15,t+6,0,0,1,0,0,0,0,0,0,1,1,0_,0,1,1,1,1,0,0,0,1,0,1,1,1,1,1,1,1,1
attractor_15,t+7,0,0,0,1,0,1,0,0,0,1,0,0_,0,1,1,1,0,0,0,0,1,1,0,0,1,1,0,1,1,0
attractor_15,t+8,0,1,0,0,1,0,0,0,0,1,1,0_,0,1,1,1,1,0,0,0,1,1,0,1,1,0,0,0,1,0
attractor_15,t+9,0,0,1,0,0,1,1,1,0,1,0,0_,0,1,1,1,1,0,0,0,1,1,1,1,1,1,0,1,1,0
attractor_15,t+10,0,0,0,1,0,0,0,0,0,1,1,0_,0,1,0,1,1,0,0,0,1,1,1,1,1,1,0,1,1,0
attractor_15,t+11,0,0,0,0,1,0,0,0,0,1,1,0_,0,1,1,1,1,0,0,0,1,0,1,1,1,1,1,1,1,1
attractor_15,t+12,0,0,0,0,0,1,0,0,1,1,0,0_,0,1,1,1,0,0,0,0,1,1,0,0,1,1,0,1,1,0
...

In attract, Spearman's correlations between node states in attractors are written to an additional CSV file. The correlations are sorted by their magnitude in descending order, with statistically significant correlations printed first:

node 1,node 2,rho,p-value
CK,AHK,1.00000,0.00000
CK,AHP,1.00000,0.00000
CK,BARR,1.00000,0.00000
AHK,AHP,1.00000,0.00000
AHK,BARR,1.00000,0.00000
AHP,BARR,1.00000,0.00000
IPT,AHP6,-1.00000,0.00000
...

License

BoolSi is licensed with the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boolsi-0.9.4.tar.gz (63.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page