Skip to main content

[I]nteractive [R]etention [T]ime vi[S]ualization for gas chromatography.

Project description

Workflow Documentation Status codecov pre-commit DOI

STARLINGrt : [I]nteractive [R]etention [T]ime vi[S]ualization for gas chromatography

STARLINGrt is a tool for analyzing retention times from gas chromatogaphy mass spectrometry (GCMS). It can be used to determine a consensus value for compounds by visualizing a collection of results. Compound identification(s) made at a given retention time are assumed to be provided by a separate code which analyzes the mass spectrometry data collected at that time. Currently, STARLINGrt is configured to work with the outputs from MassHunter(TM) but is extensible by subclassing "data._SampleBase" (see samples.py for an example). The code produces an interactive HTML file using Bokeh which can be modified interactively, saved, exported and shared easily between different users. The name "starling" was selected as a reverse acronym of the tool's purpose.

Installation

We recommend creating a virtual environment or, e.g., a conda environment then installing startlingrt with pip:

$ conda create -n starlingrt-env python=3.10
$ conda activate starlingrt-env
$ pip install startlingrt

You can also install from this GitHub repo source:

$ git clone git@github.com:mahynski/startlingrt.git
$ cd startlingrt
$ conda create -n starlingrt-env python=3.10
$ conda activate starlingrt-env
$ pip install .
$ python -m pytest # Optional unittests

To install this into a Jupyter kernel:

$ conda activate starlingrt-env
$ python -m ipykernel install --user --name starlingrt-kernel --display-name "starlingrt-kernel"

Use Cases

Imagine you have multiple GCMS output files which have been used to identify chemicals at different retention times, e.g., using some sort of library. In principle, these could correspond to analyses of a range of different mixtures; regardless, an individual component should elute at the same time regardless of what it is combined with. However, natural variations in:

  • the retention times can cause confusion when other compounds coelute or elute at very similar times,
  • the mass spectrometry peak location(s) at a given retention time can cause the identification routine to identify the same compound differently.

Given these uncertainties we would like to learn things like:

  1. What is a consensus value, or at least a natural range, of retention times for each compound identified?
  2. What compounds elute at similar points and are commonly confused with each other?
  3. Are there any analyses that identify a compound at a retention time far away from its consensus value (data cleaning)?
  4. What is a natural "gap" in retention times that can be used to "ideally" divide all compounds from their "neighbors"?

This visualization tool helps users answer these questions by exploring their data with interactive graphs. The output of this tool is an HTML file that acts as a self-contained summary of your data, how you cleaned / modified it, and an be easily shared between users.

Example

Here is a simple example (see docs/_static/example.py):

import os
import starlingrt

from starlingrt import sample, data, functions, visualize

def load_mass_hunter(input_directory):
    """
    Parameters   
    ---------
    input_directory : str
        Directory to seach for raw folders are in.

    Returns
    -------
    samples : list(sample.MassHunterSample)
        List of Samples collected from all directories in `input_directory`.
    """
    ...
    return samples

top_entries = starlingrt.data.Utilities.select_top_entries(
    starlingrt.data.Utilities.create_entries(
        load_mass_hunter(
            "path/to/data/"
        )
    )
)

starlingrt.visualize.make(
    top_entries=top_entries, 
    width=1200,
    threshold=starlingrt.functions.estimate_threshold(starlingrt.functions.get_dataframe(top_entries)[0]),
    output_filename='summary.html',
)

Documentation

Documentation is hosted at https://starlingrt.readthedocs.io/ via readthedocs.

The logo was generated using Google Gemini with the prompt "Design a logo involving a starling and gas chromatography" on Nov. 9, 2024.

Contributors

This code was developed during a collaboration with:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

starlingrt-0.1.0.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

starlingrt-0.1.0-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file starlingrt-0.1.0.tar.gz.

File metadata

  • Download URL: starlingrt-0.1.0.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for starlingrt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7531d46e55f8e5d455742dfeed357d2d53e24426a6e780e776bf95b3d91e3ff5
MD5 46b705cf26d4162f9fc80077edca6d6a
BLAKE2b-256 cad0149e45026cc680c1e7c316906ef824c5939d3e8fc1dbd60ecb06d7162e60

See more details on using hashes here.

File details

Details for the file starlingrt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: starlingrt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for starlingrt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 23c413c51a98cc84d58490af3f999efd8196346f9d2fe295b7d0e3b826ec5935
MD5 17505e4f8ebea031e8c20aa7e60a2933
BLAKE2b-256 37bf90fc0247e9d613737f35e3b06eab945ff269763a6f0e0eaa69303cdd2ff7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page