Skip to main content

WT2 Mass spectrometry data processing library

Project description

1. Installation

WT_2 requires Python 3.10. We recommend creating a virtual environment using conda:

conda create --name WT2 python=3.10
conda activate WT2
pip install WT_2

2. Usage Guide

2.1 Data Preparation

  • Each sample requires 10 raw MGF files and MS-DIAL processing results.
  • Place all MGF files and MS-DIAL results for the same sample into a folder named after the sample.
  • Multiple samples should be organized in separate folders.
  • Refer to the sample1 folder in test_data as an example.

2.2 Peak Process

from WT_2 import MultiprocessingManager

sample_folder = "./test_data/sample1" 
out_dir = "./test_data/sample1" 

manager = MultiprocessingManager(
    outer_max_workers=1,
    inner_max_workers=8,
    mgf_folder=sample_folder,
    out_dir=out_dir,
)
manager.process_mgf_files()
  • Parameters:
    • process_mgf_files() will automatically create a result folder in out_dir to store peak extraction and clustering results.
    • sample_folder: Sample directory path.
    • out_dir: Output directory path.
    • outer_max_workers: Number of outer processes for MGF file processing (default: 1).
    • inner_max_workers: Number of inner processes for m/z processing (default: 8).
    • RT_start: Left boundary of retention time (RT) range (seconds).
    • RT_end: Right boundary of RT range (seconds).
    • fp_wid: Peak detection window width (default: 6).
    • fp_sigma: Peak detection sigma (default: 2).
    • fp_min_noise: Noise threshold for peak detection (default: 200).
    • group_wid: Peak clustering window width (default: 6).
    • group_sigma: Peak clustering sigma (default: 0.5).

2.3 Peak Deduplication

from WT_2 import Deduplicator


sample_name = os.path.basename(sample_folder)
msdial_path = "./test_data/sample1/CRL_SIF_1_Q1_peak_df.csv"


deduplicator = Deduplicator(
    peak_result_dir=os.path.join(out_dir, "result"),
    msdial_out_path=msdial_path,
    sample_name=sample_name,
    useHrMs1=False,
    HrMs1model_path=None
)
deduplicator.remove_msdial_duplicate()
peak_outpath, group_outpath = deduplicator.filter_p3_group()
  • Parameters:
    • peak_result_dir: Directory containing peak extraction results (default: result folder in out_dir).
    • msdial_out_path: Path to MS-DIAL input file.
    • sample_name: Sample name (default: folder name).
    • useHrMs1: Whether to use high-resolution MS1 data (True for high-res, False for low-res, default: False).
    • HrMs1model_path: Path to high-resolution MS1 prediction model. If None, the pretrained model will be downloaded automatically. Pretrained model available at test_data/models/HrMs1.pth or Hugging Face

2.4 Metabolite Identification

from WT_2 import MspGenerator, MspFileLibraryMatcher
import pandas as pd

# Generate MSP file from deduplicated P3 group results
df = pd.read_csv(group_outpath)
out_msp_path = os.path.join(os.path.dirname(group_outpath), sample_name + ".msp")
msp_generator = MspGenerator(df, out_msp_path, useHrMs1=True)
msp_generator.generate()

# Library matching (requires MSP-format library)
out_match_path = os.path.join(os.path.dirname(group_outpath), sample_name + "_match_library_out.csv")

library_matcher = MspFileLibraryMatcher(
    query_msp_path=out_msp_path,
    library_msp_path="./test_data/library_msp",
    out_path=out_match_path
)
library_matcher.match()
  • Parameters:
    • query_msp_path: MSP file generated from P3 group results (standard MSP format).
    • library_msp_path: Path to reference MSP library.
    • out_path: Output path for matching results.

2.5 Metabolite quantification

from WT_2 import SampleQuantity


quantity_folder = "./test_data/quantity_prepared"


quantifier = SampleQuantity(
    quantity_folder=quantity_folder,
    quantity_out_path=quantity_folder,
    ref_file=None,
    useHrMs1=True,
    uesSampleAligmentmodel=True,
    SampleAligmentmodel_path=None
)
quantifier.quantity_processor()
  • Parameters:
    • quantity_folder: Directory containing deduplicated P3 results for all samples.
    • quantity_out_path: Output directory.
    • ref_file: Reference sample file (uses first sample as reference if None).
    • useHrMs1: Whether to use high-resolution MS1 data (requires prior high-res processing).
    • uesSampleAligmentmodel: Enable sample alignment model.
    • SampleAligmentmodel_path: Path to sample alignment model. If None, downloads pretrained model. Pretrained model available at test_data/models/samplealigment.pth or Hugging Face

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wt_2-0.0.1.tar.gz (31.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wt_2-0.0.1-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file wt_2-0.0.1.tar.gz.

File metadata

  • Download URL: wt_2-0.0.1.tar.gz
  • Upload date:
  • Size: 31.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for wt_2-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b468435256db9faaef89085fc8b13d082f4d919033670d2357aeb1f3081af1e2
MD5 84680b29e8fb1f3f073856b630a946a9
BLAKE2b-256 b0824a516eb147ff71590989af1ef5f2788a1724eecfdc1f0999bb54257fbc32

See more details on using hashes here.

File details

Details for the file wt_2-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: wt_2-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 36.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for wt_2-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 782a835523f17b98eaa48efbfe19a1886c37ebb3385afa30bb363e6b59338661
MD5 056b8e5ed1f6f8b07da11e23f38c857e
BLAKE2b-256 1c61d102908512e657f3a2d5abc91e528d13f5902d02dc4d2860a8af9309dde7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page