Skip to main content

preprocess microbiome data

Project description

Preprocessing for 16S values.

The input file for the preprocessing should contain detailed unnormalized OTU/Feature values as a biom table, the appropriate taxonomy as a tsv file, and a possible tag file, with the class of each sample. The tag file is not used for the preprocessing, but is used to provide some statistics on the relation between the features and the class. You can also run the preprocessing without a tag file.

input

Here is an example of how the input OTU file should look like : (file example)

drawing

Parameters to the preprocessing

Now you will have to select the parameters for the preprocessing.

  1. The taxonomy level used - taxonomy sensitive dimension reduction by grouping the bacteria at a given taxonomy level. All features with a given representation at a given taxonomy level will be grouped and merged using three different methods: Average, Sum or Merge (using PCA then followed by normalization).
  2. Normalization - after the grouping process, you can apply two different normalization methods. the first one is the log (10 base)scale. in this method
    x → log10(x + ɛ),where ɛ is a minimal value to prevent log of zero values.
    The second methos is to normalize each bacteria through its relative frequency.

If you chose the Log normalization, you now have four standardization
possibilities:
a) No standardization
b) Z-score each sample
c) Z-score each bacteria
d) Z-score each sample, and Z-score each bacteria (in this order)
When performing relative normalization, we either dont standardize the results or performe only a standardization on the bacteria.

  1. Dimension reduction - after the grouping, normalization and standardization you can choose from two Dimension reduction method: PCA or ICA. If you chose to apply a Dimension reduction method, you will also have to decide the number of dimensions you want to leave.

How to use

use MIPMLP.preprocess(input_df) ####parameters: taxonomy_level 4-7 , default is 7
taxnomy_group : sub PCA, mean, sum, default is mean
epsilon: 0-1
z_scoring: row, col, both, No, default is No
pca: (0, 'PCA') second element always PCA. first is 0/1
normalization: log, relative, default is log
norm_after_rel: No, relative, default is No

output

The output is the processed file.

drawing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MIPMLP-1.1.6.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

MIPMLP-1.1.6-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file MIPMLP-1.1.6.tar.gz.

File metadata

  • Download URL: MIPMLP-1.1.6.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for MIPMLP-1.1.6.tar.gz
Algorithm Hash digest
SHA256 b199aef6161af2d78e17b1fbf49a08bc2416f5f3afb60d785f304dbf46fd4012
MD5 ca4ad56ab425cccbcb4ec565457ad4eb
BLAKE2b-256 d1a8171a705ce696577b457858d646b9c6edd4375c76540e95fb83ae4db01749

See more details on using hashes here.

File details

Details for the file MIPMLP-1.1.6-py3-none-any.whl.

File metadata

  • Download URL: MIPMLP-1.1.6-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for MIPMLP-1.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 27fd59072c429162fa04dd23dc1b4c102556d70e3c4e442cd9d6734cfbf1e008
MD5 80259a0e91e0d3a5b2079737794a5d3d
BLAKE2b-256 785924d3d7fa5d624fe1c96dbee5bce571ecf6b05e15b0c5f25bc730e37a0fc5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page