Parenclitic approach with kernels inside

These details have not been verified by PyPI

Project links

Homepage

Project description

Parenclitic Network Generalized Algorithm implementation

Parenclitic is a Python package which can effectively produce network represenatation from numeric data.

More about parenclitic
Installation
Getting Started
The Team
Acknowledgements

More About parenclitic

The main idea is consider pairwise feature planes and decide is there a connection between 2 features based on control and deviated groups. So, we consider 2 groups: control and deviated. Group of deviated samples somehow differ from control samples. And we interested in features which can identify those distinction. Here 2 cases arises: subject can distinct by one feature or they can be separated only by 2 features rather then 1. First, we identify and exclude features that can distinguish samples only by linear case. Second, we identify pairs of features and construct graph representation of those pairwise connections. One node of network is a feature, and edge characterizes deviation of subject from control group by those 2 features.

Scatter of 3 groups: Siblings = Control, DS = Deviated, Mothers = Test

Next step is a metric computation of graphs and understanding of underlying network complexity. Those metrics can be used as reduction of dimensionality for further ML algorithms.

To deal with those things we develop parenclitic library.

Our package provides 3 main features:

Build, save and load parenclitic network.
Choose or create kernel to identify edges.
Compute network metrics based on python-igraph package.

Installation

Parenclitic is available on PyPI. You can install it through pip:

pip install parenclitic

Dependencies:

NumPy
python-igraph
Pandas
sklearn
scipy

Please, carefully check that python-igraph is correctly installed.

Getting started

First load data. We generate it for example.

    import numpy as np
    num_samples = 100
    num_features = 30
    shift = 2
    X = np.random.randn(num_samples, num_features)
    y = np.random.randint(2, size = (num_samples, ))
    mask = np.array(y, np.int32)
    X[mask == 0, :] += shift
    mask[y == 0] = -1

X - data values with 100 samples each with 30 features. y - vector with features labels (0, 1) (int type) mask - vector with -1 means control group, +1 means devated group, +2 means test group (int type)

For example we shifts data for control group twice of standard deviation and we expect almost complete networks.

There are some steps to run parenclitic

Import parenclitic library

    import parenclitic

Make kernel that decides is there is link between those pairs for particular subject. For example it is a PDF kernel with automatically defined threshold.

    kernel = parenclitic.pdf_kernel()

On some datasets groups can be easily separated by only one feature. To exclude such features IG_filter can be applied.

    pair_filter = parenclitic.IG_filter()

These excluding can help to distinguish pair-based deviation from one-feature deviation.

Make parenclitic model which uses chosen kernel and filter.

    clf = parenclitic.parenclitic(kernel = kernel, pair_filter = pair_filter)

Fit data using 2 workers and number of feature pairs per worker is 1000.

    clf.fit(X, y, mask, num_workers = 2, chunk_size = 1000)

Save graphs as tsv (tab-separated values). Or you can choose 'npz' as NumPy zipped file.

    clf.save_graphs(gtype = 'csv')

Full example you can see in src/parenclitic_sample.ipynb

Parallel computation

Parallel computation based on multiprocessing library and it can paralellize feature pairs over multiple processes.

The Team

Parenclitic project started by Krivonosov Mikhail in 2018 in Lobachevsky State University based on many works of M. Zanin, A. Zaikin.

Acknowledgements

This work was supported by the megagrant "Digital personalized medicine for healthy aging (CPM-aging): network analysis of Large multi-omics data to search for new diagnostic, predictive and therapeutic goals" № 074-02-2018-330 (1).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.7

Mar 9, 2020

0.1.5

Dec 26, 2019

0.1.4

Jul 19, 2019

0.1.2

Jun 5, 2019

0.1.1

Apr 29, 2019

0.1

Apr 29, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parenclitic-0.1.7.tar.gz (13.6 kB view details)

Uploaded Mar 9, 2020 Source

File details

Details for the file parenclitic-0.1.7.tar.gz.

File metadata

Download URL: parenclitic-0.1.7.tar.gz
Upload date: Mar 9, 2020
Size: 13.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.1

File hashes

Hashes for parenclitic-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`cdc1601ff225b20611130e2462ea6075abc8a2353e61514d9b481d4a1ca740c3`
MD5	`a8f9e09f816ff0e7e97ea1b22a99baa7`
BLAKE2b-256	`57b053c7635ead25806a077d294f6767965946e67c6904868056bcd48b8f9995`

See more details on using hashes here.

parenclitic 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Parenclitic Network Generalized Algorithm implementation

More About parenclitic

Installation

Getting started

Parallel computation

The Team

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes