Skip to main content

Python toolkit for working with metabolomics data

Project description

MetabolomicsToolKit

MetabolomicsToolKit (metabotk) is a Python library for working with metabolomics data. It provides a variety of tools and methods for importing, exploring, manipulating and analyzing metabolomics data.

The library is designed to be easy to use and provides a simple interface for working with metabolomics data. It is also highly extensible and allows users to easily add new methods and tools.

Features

  • Import and export metabolomics data from various sources (e.g. Metabolon) and formats (e.g. excel files, tables)
  • Explore and manipulate metabolomics data
  • Impute missing values
  • Normalize data
  • Perform feature selection and dimensionality reduction
  • Plot data (e.g. metabolite abundance, PCA)

How to install

Install with pip:

pip install metabotk

Dependencies

  • pandas
  • numpy
  • math
  • scikit-learn
  • seaborn
  • boruta_py for feature selection using Boruta
  • skloess for normalization with LOESS
  • pyserrf for normalization with SERRF

Why metabotk?

Metabolomics data requires the following information:

  • Peak values / abundance data for each metabolite over each sample
  • Sample metadata
  • Chemical annotation of the metabolites

Constantly coordinating these three sources of information is time consuming and prone to errors, especially for explorative analyses.
Additionally, many analyses are repetitive and can benefit from a standardized procedure. For example, building and plotting a PCA or obtaining statistics about the metabolites/samples is something that is often done before and after modifying or removing data; updating the metadata or statistics after every modification can be automatized.
With metabotk, the three sources are constantly updated based on changes to the dataset; for example, removing a metabolite or sample from the data will also remove it from the chemical annotation or sample metadata.

metabotk provides a centralized interface for working with metabolomics data from the first to the last analysis step.

The following modules constitute metabotk:

  • interface - the main point of access for the end user, with the MetaboTK class; all other modules can be accessed from here
  • dataset manager - the main module for manipulating, importing and saving datasets
  • providers - functions to read data from different providers into metabotk
  • statistics - functions for obtaining statistics at the metabolite or sample level, from mean/std to total sum abundance (TSA) and coefficient of variation (CV), correlations, number of missing or outlier values.
  • dimensionality reduction - functions to perform dimensionality reduction such as PCA
  • imputation - functions to impute missing data
  • normalization - functions to normalize data
  • models - functions to build models from the data (i.e linear models) and remove variable effects
  • feature selection - functions to perform feature selection and identify metabolites discriminating between groups
  • visualization - functions to plot distribution of metabolites, PCA and other kinds of visualization.

All modules can be extended with different kind of analyses and methods; this is a work in progress aiming to provide a baseline which can be tailored based on the needs of the users.

How to use

!!! Documentation is still a work in progress !!!

In-depth documentation about each module can be found here:
https://metabolomics-toolkit.readthedocs.io/en/latest/

Import the library and initiate the class

from metabotk import MetaboTK

dataset = MetaboTK(data_provider="metabolon", sample_id_column='Sample', metabolite_id_column="CHEM_ID",)

Import the data in tabular or excel format

#TABLES
dataset.import_tables(data="data.tsv", sample_metadata="samples.tsv", chemical_annotation="config/metabolites.tsv")

#EXCEL -> the sheet names for data, sample metadata and chemical annotation must be specified
dataset.import_excel(file_path="dataset.xlsx", sample_metadata = "sample_metadata", chemical_annotation = "chemical_annotation", data_sheet = "peak_data",

Get some statistics about the dataset

#SAMPLE LEVEL STATS
sample_stats = dataset.stats.sample_stats()

#METABOLITE LEVEL STATS
metabolite_stats = dataset.stats.metabolite_stats()

Get a PCA from the data and plot it

pca=dataset.dimensionality_reduction.get_pca(n_components=3)

datset.visualization.plot_pca(pca=pca, hue='treatment')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metabotk-0.1.4.3.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metabotk-0.1.4.3-py3-none-any.whl (30.3 kB view details)

Uploaded Python 3

File details

Details for the file metabotk-0.1.4.3.tar.gz.

File metadata

  • Download URL: metabotk-0.1.4.3.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Linux/6.8.0-49-generic

File hashes

Hashes for metabotk-0.1.4.3.tar.gz
Algorithm Hash digest
SHA256 d38a40858723295686ca41ef526b8991cea2fc84864dc42b1e93e36277f73890
MD5 03aea8a5cb3e166cc5c7005c1b7a2559
BLAKE2b-256 8c9ee935ecbc5cf378866b853d6f119c3bcd2257438ccf2d65c0bd70c9418f9f

See more details on using hashes here.

File details

Details for the file metabotk-0.1.4.3-py3-none-any.whl.

File metadata

  • Download URL: metabotk-0.1.4.3-py3-none-any.whl
  • Upload date:
  • Size: 30.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Linux/6.8.0-49-generic

File hashes

Hashes for metabotk-0.1.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 37e4d9092e9647254cdc30a30a1f429c9fcd636ccec29673cb4e0043d36f0a8c
MD5 8ce3c9e6fbce51ad220d7aa99be36262
BLAKE2b-256 ef153f93aa941e7f5d696ea757a2aca2396f8f032e4547c16c355e79b65f1035

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page