Skip to main content

A tool for finding and summarizing modules of highly correlated observations in compositional data

Project description

PyPI Build Status

SCNIC

Sparse Cooccurrence Network Investigation for Compositional data Pronounced 'scenic'.

NOTE: The most up to date version of the SCNIC repo is here!

SCNIC is a package for the generation and analysis of cooccurrence (positive correlation) networks with compositional data. Data generated by many next gen sequencing experiments is compositional (is a subsampling of the total community) which violates assumptions of typical cooccurence network analysis techniques. 16S sequencing data is often very compositional in nature so methods such as SparCC (https://bitbucket.org/yonatanf/sparcc) have been developed for studying correlations microbiome data. SCNIC is designed with compositional data in mind and so provides multiple correlation measures including SparCC.

Running SCNIC is possible via two different methods. SCNIC is packaged with scripts to allow running it on the command line but also is avaliable as a Qiime2 plugin (https://www.github.com/lozuponelab/q2-SCNIC). Either method is valid but usage of the Qiime2 plugin provides easier access when working within the Qiime2 ecosystem.

Overview

Within

The 'within' method takes as input BIOM formatted files and forms cooccurrence networks using a user specified correlation metric.

Modules

From the correlation network generated as part of the within step, SCNIC finds modules of cooccurring observations by finding groups of observations which all have a minimum pairwise correlation value. Modules are summarized and a new biom table with observations contained in modules collapsed into single observations are returned. This biom table along with a list of modules and their contents are output. A gml file of the network that can be opened using network visualization tools such as cytoscape is created which contains all observation metadata provided in the input biom file as well as module information. Please be aware that the networks output by this analysis will only include positive correlations as only positive correlations are used in module finding and summarization.

Between

The 'between' method takes two biom tables as input and calculates all pairwise correlations between the tables using a selection of correlation metrics. A gml correlation network is output as well as a file containing statistics and p-values of all correlations.

Installation

Installing using environment.yaml

We recommend using mamba (or conda) and our environment.yaml file to install the full environment in one step. See the mamba documentation for mamba installation steps.

wget https://raw.githubusercontent.com/lozuponelab/SCNIC/master/environment.yml
mamba env create -n SCNIC2 --file environment.yml

# Optional cleanup
rm environment.yml

If not using mamba, you can install using conda env create in place of mamba env create. However, conda will be slower and may struggle to solve the dependencies.

ARM architecture

Users with Apple M1/M2 chips or other ARM architecture should pass CONDA_SUBDIR=osx-64 at the beginning of the env create command, as can be seen in the following:

CONDA_SUBDIR=osx-64 mamba env create -n SCNIC3 --file environment.yml

Multi-step installation directly from Conda + pip

Step 1

It is recommended to install all of SCNIC's dependencies via conda in a new conda environment. To do this you only need to create a new environment with SCNIC installed. However, since conda has not always accepted the latest version of SCNIC please manually install SCNIC into your conda environment via PIP. On some computers, there are SciPy conflicts when installing via conda, so we recommend installing SciPy via pip.

conda create -n SCNIC python=3 SCNIC
conda activate SCNIC

Step 2 (Pip)

To download the latest release from PyPI install using this command:

pip install "scipy>=1.9.0,<=1.10.1"
pip install SCNIC

Dependencies

SCNIC depends on a variety of software all of which can be install via conda and most of which can be installed by pip. The recommended installation method is to install via pip but you must also install fastspar and parallel and have them in your path. If using the environment.yaml installation, this should be unnecessary.

To do so you can create a conda environment below, then install both fastspar and parallel.

ex:

$conda install -c bioconda -c conda-forge fastspar

Install the latest version from github

To download the lastest changes to the repository use the following commands:

git clone https://github.com/lozuponelab/SCNIC.git
cd SCNIC/
python setup.py install

NOTE: This latest code may not be functional and should only be used if you want to play around with the code this is based on.

Example usage:

'within' mode:

SCNIC_analysis.py within -i example_table.biom -o within_output/ -m sparcc

'modules' mode:

SCNIC_analysis.py modules -i within_output/correls.txt -o modules_output/ --min_r .35 --table example_table.biom

NOTE: We use a minimum R value of .3 when running SparCC with 16S data as a computationally demanding bootstrapping procedure must be run to determine p-values. We have run SparCC with 1 million bootstraps on a variety of datasets and found that a R value of between .3 and .35 will always return FDR adjusted p-values less than .05 and .1 respectively.

'between' mode:

SCNIC_analysis.py between -1 example_table1.biom -2 example_table2.biom -o output_folder/ -m spearman --min_p .05

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scnic-0.6.6.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

SCNIC-0.6.6-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file scnic-0.6.6.tar.gz.

File metadata

  • Download URL: scnic-0.6.6.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for scnic-0.6.6.tar.gz
Algorithm Hash digest
SHA256 2e9609939202ab130809cf9bd95f6908a3fc3d81d8370a352b7fe477fe5d2d93
MD5 34d415ecbd9f4d52c5a2d49b823d0ff3
BLAKE2b-256 894bc98fd0247e9874af37d43e51805083c41c1841a873d8627e5ddd0b37d2ab

See more details on using hashes here.

File details

Details for the file SCNIC-0.6.6-py3-none-any.whl.

File metadata

  • Download URL: SCNIC-0.6.6-py3-none-any.whl
  • Upload date:
  • Size: 24.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for SCNIC-0.6.6-py3-none-any.whl
Algorithm Hash digest
SHA256 83c988019abcaeccb23d101ced0c66cb91331193499fdee6642e51e146667c7f
MD5 343d66a2d259969cffa8f6e800ab7eaf
BLAKE2b-256 dd10ad036a60156009d8be159aac7c123575cd9cb5900d4b55cd8c420040dd7c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page