Skip to main content

Justin's, Andrea's, and Torbjörn's network inference package for Python

Project description

JATNIpy

Justin's, Andrea's, and Torbjörn's network inference package for Python

Nordlinglab JATNIpy

Why JATNIpy?

We re-implemented GeneSPIDER toolbox and chose Python as our programming language. Python is a popular high-level programming language. It is freely available and widely used by academic and commercial.

Results:

We incorporate several free available python packages and refer the package called scikit-grni into a new complete package. We finally name the new complete package as JATNIpy.

Availability and Implementation: Source code freely available for download at https://bitbucket.org/temn/JATNIpy/, reimplemented GeneSPIDER toolbox in Python.

Contact: Justin

How do I get set up?

  • Alternative 1: Use git to fetch JATNIpy repository run this command
git clone https://bitbucket.org/temn/JATNIpy/
  • Alternative 2: Download it from JATNIpy
  • Alternative 3: Use the pip3 to install JATNIpy from Pypi
	pip3 install jatnipy -t ~/JATNIpy

~/JATNIpy is the folder that you want to download. Then, change to the directory where you downloaded from the repository by

	cd ~/JATNIpy/jatnipy.

Before we uses JATNIpy, we should make sure pip3 has been installed in local computer. If there is not pip in local computer, we use the following command to install it.

  • *For Debian/Ubuntu user:
	apt-get install python3-pip
  • *For CentOS 7 user:
	yum install python34-setuptools
	easy_install pip

After we make sure pip3 is in our local computer, we then install virtualenvwrapper to create a virtual environment for our local computer by these commands

	pip3 install virtualenvwrapper
	export WORKON_HOME=~/Envs
	mkdir -p $WORKON_HOME
	source /usr/local/bin/virtualenvwrapper.sh
	mkvirtualenv env1
	workon env2

Create the new virtual environment env1 by mkvirtualenv env1 Choose the virtual environment you want to work on by workon env1 After working on the environment you want, then use pip3 to install the open source python3 packages with the command

	pip3 install -e
  • Dependencies:

    • git Version control system for tracking the development of programming
    • Scipy Python-based software for mathematics, science, and engineering
    • Numpy Fundamental python package for doing numerical or mathematics computation
    • pandas Using data structures and data analysis tools easily in Python
    • matplotlib Useful Python 2D plotting tool which provides MATLAB-like interface
    • scikit-learn Data mining and data analysis which built on Numpy, Scipy and matplotlib
    • networkx Python package which is made for studying graphs and the complex networks
    • glmnet_py The popular glmnet library for Python version
    • py-ubjson Universal Binary JSON encoder/decoder for Python version
    • CVXPY Handling convex optimization problems for Python version
    • Requests: HTTP for Humans Python library for requesting HTTP
  • Datasets are available here.

  • Networks are available here.

  • To test that basic functionality after setup as detailed above open python3 and run the commands :

new_one
from pprint import pprint as p #(nice to look at)
from datastruct.Network import Network as N
N.Load('.') #(This is for local folder)
N.Load('test') #(This is for initialize network)
N.Load('random/N10/','https://bitbucket.org/api/2.0/repositories/sonnhammergrni/gs-networks/src/master') #(search from the Net)
from datastruct.Dataset import Dataset as D
D.Load('.') #(This is for local folder)
D.Load('test') #(This is for initialize network)
D.Load('N10','https://bitbucket.org/api/2.0/repositories/sonnhammergrni/gs-datasets/src/master/') #(search from the Net)

become the object

from datastruct.Dataset import Dataset
Da = Dataset()
Data = Dataset(Dataset.Load('test'))
from datastruct.Network import Network
Ne = Network()
Net = Network(Network.Load('test'))

You should now have the default network and a dataset loaded in python.

Generating example data as used in results section

Same as GeneSPIDER toolbox, the data which we used in the examples can be downloaded from the online repository at https://bitbucket.org/sonnhammergrni/gs-networks. In addition, the example network and dataset we used is Nordling-D20100302-random-N10-L25-ID1446937}and Nordling-ID1446937-D20150825-E15-SNR3291-IDY15968, respectively. The code below is to generate a new network and dataset which will differ from the example ones. Because we use the random number generators to create the network and noise matrices.

Network generation in JATNIpy

Same as GeneSPIDER toolbox, we generate a stable random network with 10 nodes and sparsity 0.25. With the specifications, the following codes show how to create a datastruct.Network object.

from datastruct.randomNet import randomNet
from datastruct.stabilize import stabilize
from datastruct.Network import Network	
Ne = Network()
import numpy as np
N = 10
S = 0.25
A = randomNet(N,S)-np.eye(N)
A = stabilize(A,'iaa','high')
Net = Ne.Network(A,'random')
creator_setting = {'creator':'Nordlinglab_Justin'}
Ne.setname(Net,creator_setting)
Net.description = 'This is a sparse network with 10 nodes, 10 negative self-loops and 15 randomly chosen links generated by Justin 2018-08-31. The coefficients are chosen such that they form one strong component and a stable dynamical system with time constants in the range 0.089 to 12 and an interampatteness level of 145 that is in-between the estimated level of an E. coli (Gardner et al. 2003 Science) and Yeast (Lorenz et al. 2009 PNAS) gene regulatory network. The coefficients of the network have not been tuned to explain any of the data sets in the mentioned articles.'

The random network created by datastruct.randomNet and the desired IAA degree as input parameters to put in the datastruct.stabilize. It stabilizes the network by making the real part of all eigenvalues negative while adjusting the IAA degree level. The method Ne.setname is used to specify the fields of the Network object. The network's name is generated based on its properties automatically to make sure that each one is unique. Then, you can run the command to save your Network for .json format in savepath where is the directory you want to save.

Network.save(Network,Net,'savepath/','.json')

After saving the Network object, you can use these commands to display the out of new Network object what you have generated. The displayed output of the Network object is in this case:

Net = Network(Network.Load('./savepath/Nordlinglab_Justin-D20190414-random-N10-L12ID33097.json'))
Net.__dict__

The display is below. It shows the properties of the Network object. name is the name of the Network object which contains the name of the creator Nordlinglab_Justin, the date of creation D, the type of network random, the number of nodes N, and the number of edges L. The network matrix is A. G is the static gain matrix. N is the gene number. created is the dictionary record the detail information for this object. description is a description of the network. nodes contains the name assigned to each node, which are generated automatically if they are not specified. params records nonzeros in the network matrix. shape is the shape of the network matrix. structure which is same as type is random.

{'A': source        G01       G02       G03    ...           G08       G09       G10
target                                   ...                                  
G01     -9.999993  0.000000  0.000000    ...    -34.951935  0.000000  0.000000
G02      0.000000 -9.999993  0.000000    ...      0.000000  0.000000  0.000000
G03      0.000000  0.000000 -9.999993    ...      0.000000  0.000000  0.000000
G04      0.000000  0.000000  0.000000    ...      0.000000  0.000000  0.000000
G05      0.000000  0.000000  0.000000    ...      0.000000  0.000000  0.000000
G06    -46.352859  0.000000  0.000000    ...      0.000000  0.000000  0.000000
G07      0.000000  0.000000  0.000000    ...      0.000000  0.000000  0.000000
G08     34.951935  0.000000  0.000000    ...     -9.999993  0.000000  0.000000
G09      0.000000  0.000000  0.000000    ...      0.000000 -9.999993  0.000000
G10      0.000000  0.000000  0.000000    ...      0.000000  0.000000 -9.999993

[10 rows x 10 columns],
'G': source           G01           G02           G03 ...            G08  G09  G10
target                                           ...                         
G01     2.881656e-03  7.357080e-20 -0.000000e+00 ...  -1.007195e-02 -0.0 -0.0
G02    -5.187064e-18  1.000001e-01 -1.230312e-17 ...   2.462769e-17 -0.0 -0.0
G03     1.054544e-17  8.764060e-18  1.000001e-01 ...   1.465668e-17 -0.0 -0.0
G04    -1.924527e-18  7.464553e-18 -2.235816e-18 ...  -1.955644e-19 -0.0 -0.0
G05    -1.574097e-33  9.320219e-18 -2.277998e-18 ...   1.414714e-33 -0.0 -0.0
G06    -1.335731e-02 -1.022088e-17  1.552109e-17 ...   4.668640e-02 -0.0 -0.0
G07    -0.000000e+00 -0.000000e+00 -0.000000e+00 ...  -0.000000e+00 -0.0 -0.0
G08     1.007195e-02 -1.120548e-17 -1.970795e-17 ...   6.479663e-02 -0.0 -0.0
G09    -0.000000e+00 -0.000000e+00 -0.000000e+00 ...  -0.000000e+00  0.1 -0.0
G10    -0.000000e+00 -0.000000e+00 -0.000000e+00 ...  -0.000000e+00 -0.0  0.1

[10 rows x 10 columns],
'N': 10,
'created': {'creator': 'Nordlinglab _Justin',
'id': '58909',
'nodes': '10',
'sparsity': 14,
'time': '1555834954',
'type': 'random'},
'description': 'This is a sparse network with 10 nodes, 10 negative '
'self-loops and 15 randomly chosen links generated by Justin '
'2018-08-31. The coefficients are chosen such that they form '
'one strong component and a stable dynamical system with time '
'constants in the range 0.089 to 12 and an interampatteness '
'level of 145 that is in-between the estimated level of an E. '
'coli (Gardner et al. 2003 Science) and Yeast (Lorenz et al. '
'2009 PNAS) gene regulatory network. The coefficients of the '
'network have not been tuned to explain any of the data sets '
'in the mentioned articles.',
'name': 'Nordlinglab _Justin-D20190421-random-N10-L14ID58909',
'names': ['G01',
'G02',
'G03',
'G04',
'G05',
'G06',
'G07',
'G08',
'G09',
'G10'],
'nodes': 0    G01
1    G02
2    G03
3    G04
4    G05
5    G06
6    G07
7    G08
8    G09
9    G10
Name: node, dtype: object,
'params': source  target
G01     G01       -9.999993
G06      -46.352859
G08       34.951935
G02     G02       -9.999993
G03     G03       -9.999993
G04     G04       -9.999993
G05     G05       -9.999993
G06     G01       46.352859
G06       -9.999993
G07     G07       -9.999993
G08     G01      -34.951935
G08       -9.999993
G09     G09       -9.999993
G10     G10       -9.999993
Name: value, dtype: float64,
'shape': (10, 10),
'structure': 'random',
'tol': 2.220446049250313e-16}

Data generation in JATNIpy

We use the network generated by JATNIpy to simulate perturbation experiments. We simulate N single gene perturbation experiments. Each gene is perturbed one after another followed by N/2 experiments randomly.

from datastruct.Dataset import Dataset
Da = Dataset()
import numpy as np
import scipy
from scipy import sparse as sparse
from numpy import linalg as LA
from numpy import random as rd
SNR = 7
alpha = 0.01
c=sparse.rand(N,int(N/2),density=0.2,format='coo',dtype=None,random_state=None)
d=np.logical_and(c.A,1)
g = np.eye(N)
P=np.concatenate((g,d),axis = 1)
G = np.asarray(Net.G)
Y = np.dot(G,P)
s=LA.svd(Y, full_matrices=True)[1]
data = Da.Dataset()
stdE = s[N-1]/(((scipy.stats.chi2.ppf(1-alpha,P.shape[0]*P.shape[1]))**0.05)*SNR)
E = rd.rand(P.shape[0],P.shape[1])*stdE
F = np.zeros((P.shape[0],P.shape[1]))

We make a perturbation matrix P and use Net.G to correspond response matrix Y. The standard deviation is used to generate the noise matrix E with SNR is 7. We don't use the input noise F here, but we still should define it, therefore, we set it to zero. Then we populate a Dataset object with these information.

if have saved Net

D = Da.Dataset()
D.network = Net.network
import pandas as pd
names = pd.Series(Net.names, name="node")
names.name = "node"
D.M = P.shape[1]
M = D.M
D.N = P.shape[0]
N = D.N
samples = pd.Series(["S" + str(i + 1) for i in range(M)], name="sample")
D.E = pd.DataFrame(E, index=names, columns=samples)
D.F = pd.DataFrame(F, index=names, columns=samples)
D.Y = pd.DataFrame(Y+D.E, index=names, columns=samples)
D.P = pd.DataFrame(P, index=names, columns=samples)
D.lamda = pd.Series(np.array([np.array(i) for i in [stdE**2,0]]), index=["E", "F"])
sdY1 = pd.DataFrame(np.eye(int(D.N)), index=D.Y.index, columns=D.Y.index)
sdY2 = pd.DataFrame(pd.DataFrame(stdE*np.ones((D.P.shape[0],D.P.shape[1])), index=D.Y.index, columns=D.Y.columns), index=D.Y.index, columns=D.Y.columns)
frames1_Y = [sdY1,sdY2]
sdY_c_1 = pd.concat(frames1_Y,axis = 1)
sdY3 = pd.DataFrame(np.asarray(pd.DataFrame(stdE*np.ones((D.P.shape[0],D.P.shape[1])), index=D.Y.index, columns=D.Y.columns)).transpose(), index=D.Y.columns, columns=D.Y.index)
sdY4 = pd.DataFrame(np.eye(int(D.M)), index=D.Y.columns, columns=D.Y.columns)
frames2_Y = [sdY3,sdY4]
sdY_c_2 = pd.concat(frames2_Y,axis = 1)
frames_Y = [sdY_c_1,sdY_c_2]
D.E_covariance_element =  pd.concat(frames_Y,axis = 0)
D.E_covariance_variable = pd.DataFrame(D.lamda[0]*np.eye(N), index=names, columns=names)
D.F_covariance_variable = pd.DataFrame(np.zeros((N,N)), index=names, columns=names)
sdP1 = pd.DataFrame(np.eye(int(D.N)), index=D.Y.index, columns=D.Y.index)
sdP2 = pd.DataFrame(pd.DataFrame(np.zeros((D.P.shape[0],D.P.shape[1])), index=D.P.index, columns=D.P.columns), index=D.Y.index, columns=D.Y.columns)
frames1_P = [sdP1,sdP2]
sdP_c_1 = pd.concat(frames1_P,axis = 1)	
sdP3 = pd.DataFrame(np.asarray(pd.DataFrame(np.zeros((D.P.shape[0],D.P.shape[1])), index=D.P.index, columns=D.P.columns)).transpose(), index=D.Y.columns, columns=D.Y.index)
sdP4 = pd.DataFrame(np.eye(int(D.M)), index=D.Y.columns, columns=D.Y.columns)
frames2_P = [sdP3,sdP4]
sdP_c_2 = pd.concat(frames2_P,axis = 1)
frames_P = [sdP_c_1,sdP_c_2]
D.F_covariance_element = pd.concat(frames_P,axis = 0)

if have not saved Net

D = Da.Dataset()
D.network = Net.name
import pandas as pd
names = pd.Series(Net.names, name="node")
names.name = "node"
D.M = P.shape[1]
M = D.M
D.N = P.shape[0]
N = D.N
samples = pd.Series(["S" + str(i + 1) for i in range(M)], name="sample")
D.E = pd.DataFrame(E, index=names, columns=samples)
D.F = pd.DataFrame(F, index=names, columns=samples)
D.Y = pd.DataFrame(Y+D.E, index=names, columns=samples)
D.P = pd.DataFrame(P, index=names, columns=samples)
D.lamda = pd.Series(np.array([np.array(i) for i in [stdE**2,0]]), index=["E", "F"])
sdY1 = pd.DataFrame(np.eye(int(D.N)), index=D.Y.index, columns=D.Y.index)
sdY2 = pd.DataFrame(pd.DataFrame(stdE*np.ones((D.P.shape[0],D.P.shape[1])), index=D.Y.index, columns=D.Y.columns), index=D.Y.index, columns=D.Y.columns)
frames1_Y = [sdY1,sdY2]
sdY_c_1 = pd.concat(frames1_Y,axis = 1)
sdY3 = pd.DataFrame(np.asarray(pd.DataFrame(stdE*np.ones((D.P.shape[0],D.P.shape[1])), index=D.Y.index, columns=D.Y.columns)).transpose(), index=D.Y.columns, columns=D.Y.index)
sdY4 = pd.DataFrame(np.eye(int(D.M)), index=D.Y.columns, columns=D.Y.columns)
frames2_Y = [sdY3,sdY4]
sdY_c_2 = pd.concat(frames2_Y,axis = 1)
frames_Y = [sdY_c_1,sdY_c_2]
D.E_covariance_element =  pd.concat(frames_Y,axis = 0)
D.E_covariance_variable = pd.DataFrame(D.lamda[0]*np.eye(N), index=names, columns=names)
D.F_covariance_variable = pd.DataFrame(np.zeros((N,N)), index=names, columns=names)
sdP1 = pd.DataFrame(np.eye(int(D.N)), index=D.Y.index, columns=D.Y.index)
sdP2 = pd.DataFrame(pd.DataFrame(np.zeros((D.P.shape[0],D.P.shape[1])), index=D.P.index, columns=D.P.columns), index=D.Y.index, columns=D.Y.columns)
frames1_P = [sdP1,sdP2]
sdP_c_1 = pd.concat(frames1_P,axis = 1)	
sdP3 = pd.DataFrame(np.asarray(pd.DataFrame(np.zeros((D.P.shape[0],D.P.shape[1])), index=D.P.index, columns=D.P.columns)).transpose(), index=D.Y.columns, columns=D.Y.index)
sdP4 = pd.DataFrame(np.eye(int(D.M)), index=D.Y.columns, columns=D.Y.columns)
frames2_P = [sdP3,sdP4]
sdP_c_2 = pd.concat(frames2_P,axis = 1)
frames_P = [sdP_c_1,sdP_c_2]
D.F_covariance_element = pd.concat(frames_P,axis = 0)

In order to initialize the datastruct.Dataset object with data, we then do the following code snippet:

Data = Da.Dataset(D,Net)
creator_setting = {'creator':'Nordlinglab_Justin'}
Da.setname(Data,creator_setting)
Data.description = 'This data set contains 15 simulated experiments with additive white Gaussian noise with variance 0.00028 added to the response in order to make the SNR 7 and the data partly informative for network inference. The singular values of the response matrix are in the range 0.77 to 1.2.'
Data.nodes = Net.names

After initializing the datastruct.Dataset object, we then save it with the following command

Dataset.save(Dataset,Data,'savepath/','.json')

Then, you can use these code snippet to display of new \texttt{Dataset} object what you have generated

Data = Dataset(Dataset.Load('./savepath/Nordlinglab\textunderscore  Justin-ID17795-D20190415-N10-E15-SNR277998-IDY17795.json'))
Data.__dict__

The displayed is

{'E': sample        S1        S2        S3    ...          S13       S14       S15
node                                    ...                                 
G01     0.001609  0.000334  0.000934    ...     0.000664  0.000593  0.000298
G02     0.001145  0.000808  0.000116    ...     0.000497  0.001746  0.001446
G03     0.001796  0.000210  0.000173    ...     0.000684  0.001074  0.001337
G04     0.000962  0.000706  0.001298    ...     0.000916  0.000908  0.000054
G05     0.001347  0.001081  0.000509    ...     0.001760  0.001839  0.000857
G06     0.001201  0.000180  0.001394    ...     0.000518  0.000246  0.001442
G07     0.001585  0.000752  0.000197    ...     0.000918  0.000703  0.000525
G08     0.001627  0.001223  0.001366    ...     0.001103  0.000204  0.001348
G09     0.001298  0.000705  0.000014    ...     0.000690  0.001257  0.000649
G10     0.000353  0.000890  0.001845    ...     0.001720  0.000523  0.000738

[10 rows x 15 columns],
'E_covariance_element':           G01       G02       G03    ...          S13       S14       S15
G01  1.000000  0.000000  0.000000    ...     0.001864  0.001864  0.001864
G02  0.000000  1.000000  0.000000    ...     0.001864  0.001864  0.001864
G03  0.000000  0.000000  1.000000    ...     0.001864  0.001864  0.001864
G04  0.000000  0.000000  0.000000    ...     0.001864  0.001864  0.001864
G05  0.000000  0.000000  0.000000    ...     0.001864  0.001864  0.001864
G06  0.000000  0.000000  0.000000    ...     0.001864  0.001864  0.001864
G07  0.000000  0.000000  0.000000    ...     0.001864  0.001864  0.001864
G08  0.000000  0.000000  0.000000    ...     0.001864  0.001864  0.001864
G09  0.000000  0.000000  0.000000    ...     0.001864  0.001864  0.001864
G10  0.000000  0.000000  0.000000    ...     0.001864  0.001864  0.001864
S1   0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S2   0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S3   0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S4   0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S5   0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S6   0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S7   0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S8   0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S9   0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S10  0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S11  0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S12  0.001864  0.001864  0.001864    ...     0.000000  0.000000  0.000000
S13  0.001864  0.001864  0.001864    ...     1.000000  0.000000  0.000000
S14  0.001864  0.001864  0.001864    ...     0.000000  1.000000  0.000000
S15  0.001864  0.001864  0.001864    ...     0.000000  0.000000  1.000000

[25 rows x 25 columns],
'E_covariance_variable': node       G01       G02       G03    ...          G08       G09       G10
node                                  ...                                 
G01   0.000003  0.000000  0.000000    ...     0.000000  0.000000  0.000000
G02   0.000000  0.000003  0.000000    ...     0.000000  0.000000  0.000000
G03   0.000000  0.000000  0.000003    ...     0.000000  0.000000  0.000000
G04   0.000000  0.000000  0.000000    ...     0.000000  0.000000  0.000000
G05   0.000000  0.000000  0.000000    ...     0.000000  0.000000  0.000000
G06   0.000000  0.000000  0.000000    ...     0.000000  0.000000  0.000000
G07   0.000000  0.000000  0.000000    ...     0.000000  0.000000  0.000000
G08   0.000000  0.000000  0.000000    ...     0.000003  0.000000  0.000000
G09   0.000000  0.000000  0.000000    ...     0.000000  0.000003  0.000000
G10   0.000000  0.000000  0.000000    ...     0.000000  0.000000  0.000003

[10 rows x 10 columns],
'F': sample   S1   S2   S3   S4   S5   S6 ...   S10  S11  S12  S13  S14  S15
node                                 ...                               
G01     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0
G02     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0
G03     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0
G04     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0
G05     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0
G06     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0
G07     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0
G08     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0
G09     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0
G10     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0

[10 rows x 15 columns],
'F_covariance_element':      G01  G02  G03  G04  G05  G06  G07 ...    S9  S10  S11  S12  S13  S14  S15
G01  1.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
G02  0.0  1.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
G03  0.0  0.0  1.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
G04  0.0  0.0  0.0  1.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
G05  0.0  0.0  0.0  0.0  1.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
G06  0.0  0.0  0.0  0.0  0.0  1.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
G07  0.0  0.0  0.0  0.0  0.0  0.0  1.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
G08  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
G09  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
G10  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
S1   0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
S2   0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
S3   0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
S4   0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
S5   0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
S6   0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
S7   0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
S8   0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0
S9   0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   1.0  0.0  0.0  0.0  0.0  0.0  0.0
S10  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  1.0  0.0  0.0  0.0  0.0  0.0
S11  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  1.0  0.0  0.0  0.0  0.0
S12  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  1.0  0.0  0.0  0.0
S13  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  1.0  0.0  0.0
S14  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  1.0  0.0
S15  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  1.0

[25 rows x 25 columns],
'F_covariance_variable': node  G01  G02  G03  G04  G05  G06  G07  G08  G09  G10
node                                                  
G01   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
G02   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
G03   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
G04   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
G05   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
G06   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
G07   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
G08   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
G09   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
G10   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0,
'M': 15,
'N': 10,
'P': sample   S1   S2   S3   S4   S5   S6 ...   S10  S11  S12  S13  S14  S15
node                                 ...                               
G01     1.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0
G02     0.0  1.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  1.0  0.0  0.0
G03     0.0  0.0  1.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  1.0  1.0  0.0
G04     0.0  0.0  0.0  1.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  1.0  0.0
G05     0.0  0.0  0.0  0.0  1.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0
G06     0.0  0.0  0.0  0.0  0.0  1.0 ...   0.0  0.0  0.0  0.0  1.0  0.0
G07     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  1.0  0.0  1.0  0.0  0.0
G08     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  1.0  0.0  0.0  0.0
G09     0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  1.0  0.0  0.0  0.0  0.0
G10     0.0  0.0  0.0  0.0  0.0  0.0 ...   1.0  0.0  0.0  0.0  1.0  0.0

[10 rows x 15 columns],
'Y': sample        S1        S2        S3    ...          S13       S14       S15
node                                    ...                                 
G01     0.004491  0.000334  0.000934    ...     0.000664  0.013951  0.000298
G02     0.001145  0.100809  0.000116    ...     0.100497  0.001746  0.001446
G03     0.001796  0.000210  0.100173    ...     0.100685  0.101074  0.001337
G04     0.000962  0.000706  0.001298    ...     0.000916  0.100908  0.000054
G05     0.001347  0.001081  0.000509    ...     0.001760  0.001839  0.000857
G06    -0.012157  0.000180  0.001394    ...     0.000518  0.038331  0.001442
G07     0.001585  0.000752  0.000197    ...     0.100918  0.000703  0.000525
G08     0.011699  0.001223  0.001366    ...     0.001103  0.046890  0.001348
G09     0.001298  0.000705  0.000014    ...     0.000690  0.001257  0.000649
G10     0.000353  0.000890  0.001845    ...     0.001720  0.100523  0.000738

[10 rows x 15 columns],
'__model_eq__': 'X ~ -dot(P, pinv(A).T)',
'created': {'creator': 'Nordlinglab_Justin',
'id': '58909',
'nodes': '10',
'sparsity': 14,
'time': '1555834954',
'type': 'random'},
'description': 'This data set contains 15 simulated experiments with additive '
'white Gaussian noise with variance 0.00028 added to the '
'response in order to make the SNR 7 and the data partly '
'informative for network inference. The singular values of the '
'response matrix are in the range 0.77 to 1.2.',
'lamda': E_variance    0.000003
F_variance    0.000000
dtype: float64,
'name': 'Nordlinglab_Justin-ID58909-D20190422-N10-E15-SNR655-IDY58909',
'network': 'Nordlinglab _Justin-D20190421-random-N10-L14ID58909',
'nodes': 0    G01
1    G02
2    G03
3    G04
4    G05
5    G06
6    G07
7    G08
8    G09
9    G10
Name: node, dtype: object,
'tol': 2.220446049250313e-16}

If the data is generated in silico, it can connect a dataset to a network. So, the network is reported in the Data object. We also provide normalisation procedures for the Data object that will normalise the expression matrix Y. Three different normalisation procedures are available, standard normalisation, min max range scaling and unit scaling. All methods works over rows or columns, depending on input, e.g.

For standard normalisation

NewData = Da.std_normalize(Data,2)
np.sum(Da.response(NewData),axis = 1)
np.sum(Da.response(NewData)**2,axis = 1)

should return zeros as sum over rows and the squared values should be 1 for each sample so the sum over rows should be = M.

For unit scaling

NewData = Da.unit_length_scaling(Data,2)
np.sum(Da.response(NewData)**2,axis = 1)

the squared values should sum to 1.

For range scaling

NewData = Da.range_scaling(Data,2)
np.max(Da.response(NewData),axis=1)
np.min(Da.response(NewData),axis=1)

the max and min of each row should be 1 and 0 respectively.

It should be noted that the noise estimates are currently not scaled according to the new data and should therefore not be used as is in subsequent calculations.

save Dataset

Dataset.save(Dataset,Data,'savepath/','.json')

Analyse

The analyse folder provides the programming to analyse networks, datasets and benchmark results.

We first demonstrate how to load the example network Nordling-D20100302-random-N10-L25-ID1446937.json and dataset Nordling-ID1446937-D20150825-E15-SNR3291-IDY15968.json from the online repository with the following command:

from datastruct.Dataset import Dataset
Data = Dataset(Dataset.Load('test'))
from datastruct.Network import Network
Net = Network(Network.Load('test'))

Network analysis in JATNIpy:

We input the Net to the analyse.Model to analyse the network:

from analyse.Model import Model
M = Model()
net_prop = M.Model(Net)
net_prop.__dict__

It then produces the output like the following:

{
		'network': 'Nordling-D20100302-random-N10-L25-ID1446937',
		'interampatteness': 144.6936524435306,
		'NetworkComponents': 1,
		'AvgPathLength': 2.8777777777777778,
		'tauG': 0.08503206402335546
		'CC': 0.05,
		'DD': 1.5,
}

There are six measures be calculated in JATNIpy. interampatteness which is the condition number of A calculated by using numpy.linalg.cond(A). The number of strongly connected components NetworkComponents is calculated by graphconncomp function. Furthermore, AvgPathLength is the path length of the graph of the network uses median_path_length function. No matter the graphconncomp or median_path_length are used the python based networkx package to calculate. In addtion, the average clustering coefficient is CC that can be explained as as the neighborhood sparsity of each node in the network but not considering itself. And the average degree distribution of model is DD. The property analyse.Model.type can be selected as directed or undirected by using analyse.DataModel.type.

Not only the average clustering coefficients but also all clustering coefficients can be calculated. All clustering coefficients can be calculated by

CCs = M.clustering_coefficient(Net)

Data analysis in JATNIpy:

We input the Data to the analyse.DataAnalysis to analyse the data:

from analyse.DataAnalysis import DataAnalysis
DD = DataAnalysis()
data_prop = DD.Data(Data)
data_prop.__dict__

It then produces the output like the following:

{
'dataset': 'Nordling-ID1446937-D20150825-N10-E15-SNR3291-IDY15968',
'SNR\textunderscore Phi\textunderscore true': 6.999999962249559,
'SNR\textunderscore Phi\textunderscore gauss': 3.3098514156225645,
'SNR\textunderscore phi\textunderscore true': 10.991358740090298,
'SNR\textunderscore phi\textunderscore gauss': 10.340857240865667
}   

The following two functions calculates the SNR for all

SNRe = DD.calc_SNR_phi_true(Data)
SNRl = DD.calc_SNR_phi_gauss(Data)

Performance evaluation in JATNIpy:

In order to analyze the performance of an inference method, we first need to generate an output. It is easy to manipulate by using wrappers. Each method has an associated wrapper that parses the data of the method itself. In JATNIpy, we re-implement four wrappers which are LASSO, Glmnet, LSCO, RNICO, respectively. Before running Glmnet implementation, we should do these command lines

apt-get -y update
apt-get install -y libatlas-base-dev
apt-get install -y python3-tk
apt-get install libgfortran3

To run the Glmnet method we execute:

from Methods.Glmnet import Glmnet
estA,zetavec,zetaRange = Glmnet(Data,'full')

To run the lsco implementation we execute:

from Methods.lsco import lsco
estA,zetavec,zetaRange = lsco(Data,'full')

To run the RNI implementation we execute:

from Methods.RNI import RNI
estA,zetavec,zetaRange = RNI(Data,'full')

To run the Lasso implementation we execute:

from Methods.Lasso import Lasso
estA,zetavec,zetaRange = Lasso(Data,'full')

for Glmnet you should install use (if you install in virtual, then don't need this step)

pip3 install python-glmnet

The returned regularization parameters used within the algorithm is zetavec. 'full' makes the method to generate the complete regularization path from full to empty network with the zeta values scaled between 0 and 1. Only for RNI method, a zetavec can be specified and supplied to it in JATNIpy. And zetaRange can scale the factor used for the parameters. Then, RNI will use the vector of values to infer the networks

zetavec = np.logspace(-6,0,100)
estA = RNI(Data,zetavec)[0]

and the method will use that vector of values to infere the networks.

To analyse the performance of the model, we input the network estimates produced by the algorithm to the model comparison method:

from analyse.CompareModels import CompareModels
import numpy as np
M = CompareModels()
M = M.set_A(M,Net)
M = M.CompareModels(M,np.asarray(Net.A),estA)

The max function in CompareModels is used to find the maximum for each calculated measure:

maxM = M.max(M)

And the maxM contains the maximum of all measures in CompareModels . If you want to get the optimal performance for specific measure like 'MCC':

maxM[0]['MCC']

Who do I talk to?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jatnipy-1.0.14.tar.gz (50.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page