Skip to main content

Breath analysis in python

Project description



A python library for breath gas biomarker profiling


BreathPy depends on python >=3.6, but does not yet support python==3.9, as several dependencies are not yet available for python 3.9. It is available through pip. Make sure to activate your local virtual environment or use anaconda. To render decision trees we depend on the graphviz executable. Either install into your current environment using pip install breathpy or create, activate a new anaconda environment "breath" and install breathpy and graphviz:

conda create --name breath python=3.8 pip graphviz -y
conda activate breath
pip install breathpy

If you want to use the tutorial jupyter notebooks - you also need to install jupyter conda install jupyter.


First prepare the example dataset by creating a subdirectory data and then linking the example files there.

from pathlib import Path
from urllib.request import urlretrieve
from zipfile import ZipFile

# download example zip-archive
url = ''
zip_dst = Path("data/")
dst_dir = Path("data/small_candy_anon/")
dst_dir.mkdir(parents=True, exist_ok=True)
urlretrieve(url, zip_dst)

# unzip archive into data subdirectory
with ZipFile(zip_dst, "r") as archive_handle:

Then run the example analysis like so:

# import required functions
from breathpy.model.BreathCore import construct_default_parameters, construct_default_processing_evaluation_steps
from breathpy.model.CoreTest import run_start_to_end_pipeline

# define file prefix and default parameters
file_prefix = folder_name = 'small_candy_anon'

# assuming the data directory is in the current directory
plot_parameters, file_parameters = construct_default_parameters(file_prefix, folder_name, make_plots=True)

# create default parameters for preprocessing and evaluation
preprocessing_steps, evaluation_params_dict = construct_default_processing_evaluation_steps()

# call start
run_start_to_end_pipeline(plot_parameters, file_parameters, preprocessing_steps, evaluation_params_dict)

For more complete examples see,' or 'CoreTest.run_start_to_end_pipeline and CoreTest.run_resume_analysis. Example data is available at

Usage GC-MS

Now with experimental support for GC/MS + LC/MS data through pyOpenMS

Download and extract the example datasets into the current data subdirectory:

# handle imports
from urllib.request import urlretrieve
from pathlib import Path
from zipfile import ZipFile

# download and extract data into data/algae directory
url = ''
zip_dst = Path("data/")
dst_dir = Path("data/algae/")
dst_dir.mkdir(parents=True, exist_ok=True)
urlretrieve(url, zip_dst)

# unzip archive into data subdirectory
with ZipFile(zip_dst, "r") as archive_handle:
import os
from pathlib import Path
from breathpy.model.BreathCore import construct_default_parameters,construct_default_processing_evaluation_steps
from breathpy.model.ProcessingMethods import GCMSPeakDetectionMethod, PerformanceMeasure
from breathpy.model.GCMSTest import run_gcms_platform_multicore
from breathpy.generate_sample_data import generate_train_test_set_helper

Runs analysis of the algae sample set (Sun M, Yang Z and Wawrik B (2018) Metabolomic Fingerprints 
of Individual Algal Cells Using the Single-Probe Mass Spectrometry Technique. 
Front. Plant Sci. 9:571. doi: 10.3389/fpls.2018.00571)

19 samples from four conditions - light, dark, nitrogen-limited and replete (post nitrogen-limited)
Samples originated from single-probe mass spectrometry files - we import created featureXML files.

:param cross_val_num:
# or use your local path to a dataset here
source_dir = Path("data/algae")
target_dir = Path("data")

# will delete previous split and rewrite data
train_df, test_df = generate_train_test_set_helper(source_dir, target_dir, cross_val_num=cross_val_num)
train_dir = Path(target_dir)/"train_algae"

# prepare analysis
set_name = "train_algae"
make_plots = True

# generate parameters
plot_parameters, file_parameters = construct_default_parameters(set_name, set_name, make_plots=make_plots)
preprocessing_params_dict = {GCMSPeakDetectionMethod.ISOTOPEWAVELET: {"hr_data": True}}
_, evaluation_params_dict = construct_default_processing_evaluation_steps(cross_val_num)

# running the full analysis takes less than 30 minutes of computation time using 6 cores - in this example most if not all computations are single core though
		evaluation_parms=evaluation_params_dict, num_cores=6)

Also see model/ for reference.


BreathPy is licensed under GPLv3, but contains binaries for PEAX, which is a free software for academic use only. See

A modular computational framework for automated peak extraction from ion mobility spectra, 2014, D’Addario et. al


If you run into difficulties using BreathPy, please open an issue at our GitHub repository. Alternatively you can write an email to Philipp Weber.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Built Distribution

breathpy-0.9.6-py3-none-any.whl (9.9 MB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page