Skip to main content

WT2 Mass spectrometry data processing library

Project description

1. Installation

WT_2 requires Python 3.10. We recommend creating a virtual environment using conda:

conda create --name WT2 python=3.10
conda activate WT2
pip install WT_2

2. Usage Guide

2.1 Data Preparation

  • Each sample requires 10 raw MGF files and MS-DIAL processing results.
  • Place all MGF files and MS-DIAL results for the same sample into a folder named after the sample.
  • Multiple samples should be organized in separate folders.
  • Refer to the sample1 folder in test_data as an example.

2.2 Peak Process

from WT_2 import MultiprocessingManager

sample_folder = "./test_data/sample1" 
out_dir = "./test_data/sample1" 

manager = MultiprocessingManager(
    outer_max_workers=1,
    inner_max_workers=8,
    mgf_folder=sample_folder,
    out_dir=out_dir,
)
manager.process_mgf_files()
  • Parameters:
    • process_mgf_files() will automatically create a result folder in out_dir to store peak extraction and clustering results.
    • sample_folder: Sample directory path.
    • out_dir: Output directory path.
    • outer_max_workers: Number of outer processes for MGF file processing (default: 1).
    • inner_max_workers: Number of inner processes for m/z processing (default: 8).
    • RT_start: Left boundary of retention time (RT) range (seconds).
    • RT_end: Right boundary of RT range (seconds).
    • fp_wid: Peak detection window width (default: 6).
    • fp_sigma: Peak detection sigma (default: 2).
    • fp_min_noise: Noise threshold for peak detection (default: 200).
    • group_wid: Peak clustering window width (default: 6).
    • group_sigma: Peak clustering sigma (default: 0.5).

2.3 Peak Deduplication

from WT_2 import Deduplicator


sample_name = os.path.basename(sample_folder)
msdial_path = "./test_data/sample1/sample1_Q1_peak_df.csv"


deduplicator = Deduplicator(
    peak_result_dir=os.path.join(out_dir, "result"),
    msdial_out_path=msdial_path,
    sample_name=sample_name,
    useHrMs1=True,
    HrMs1model_path=None
)
deduplicator.remove_msdial_duplicate()
peak_outpath, group_outpath = deduplicator.filter_p3_group()
  • Parameters:
    • peak_result_dir: Directory containing peak extraction results (default: result folder in out_dir).
    • msdial_out_path: Path to MS-DIAL input file.
    • sample_name: Sample name (default: folder name).
    • useHrMs1: Whether to use high-resolution MS1 model (True for high-res, False for low-res, default: False).
    • HrMs1model_path: Path to high-resolution MS1 prediction model. If None, the pretrained model will be downloaded automatically. Pretrained model available at test_data/models/HrMs1.pth or Hugging Face

2.4 Metabolite Identification

from WT_2 import MspGenerator, MspFileLibraryMatcher
import pandas as pd

# Generate MSP file from deduplicated P3 group results
df = pd.read_csv(group_outpath)
out_msp_path = os.path.join(os.path.dirname(group_outpath), sample_name + ".msp")
msp_generator = MspGenerator(df, out_msp_path, useHrMs1=False)


# Library matching (requires MSP-format library)
out_match_path = os.path.join(os.path.dirname(group_outpath), sample_name + "_match_library_out.csv")

library_matcher = MspFileLibraryMatcher(
    query_msp_path=out_msp_path,
    library_msp_path="./test_data/library_msp",
    out_path=out_match_path,
    num=1
)
library_matcher.calculateCosineBoth()
  • MspGenerator Parameters:

    • df: Deduplicated P3 group dataframe.
    • out_msp_path: Output path for MSP file.
    • useHrMs1: Whether to use high-resolution MS1 data (True for high-res, False for low-res, default: False). If set to true, ensure that the high-resolution MS1 model is used during the Peak Deduplication step.
  • MspFileLibraryMatcher Parameters:

    • query_msp_path: MSP file generated from P3 group results (standard MSP format).
    • library_msp_path: Path to reference MSP library.
    • out_path: Output path for matching results.
    • num: Number of top matches to keep (default: 1).

2.5 Metabolite quantification

from WT_2 import SampleQuantity


quantity_folder = "./test_data/quantity_prepared"


quantifier = SampleQuantity(
    quantity_folder=quantity_folder,
    quantity_out_path=quantity_folder,
    ref_file=None,
    useHrMs1=True,
    uesSampleAligmentmodel=True,
    SampleAligmentmodel_path=None
)
quantifier.quantity_processor()
  • Parameters:
    • quantity_folder: Directory containing deduplicated P3 results for all samples.
    • quantity_out_path: Output directory.
    • ref_file: Reference sample file (uses first sample as reference if None).
    • useHrMs1: Whether to use high-resolution MS1 data (requires prior high-res processing).
    • uesSampleAligmentmodel: Enable sample alignment model.
    • SampleAligmentmodel_path: Path to sample alignment model. If None, downloads pretrained model. Pretrained model available at test_data/models/samplealigment.pth or Hugging Face

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wt_2-0.0.2.tar.gz (31.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wt_2-0.0.2-py3-none-any.whl (36.6 kB view details)

Uploaded Python 3

File details

Details for the file wt_2-0.0.2.tar.gz.

File metadata

  • Download URL: wt_2-0.0.2.tar.gz
  • Upload date:
  • Size: 31.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for wt_2-0.0.2.tar.gz
Algorithm Hash digest
SHA256 97009d18970fe65743c24c2234f7f4ae62c5ef150f581c79a670478bddcbf5a5
MD5 ec998fe9693256c1667692f46f06e5d5
BLAKE2b-256 851d94c7f9b6c72bf2881700d5d89d29f5fcd4da1dd894eb319b7372c5bfbbe9

See more details on using hashes here.

File details

Details for the file wt_2-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: wt_2-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 36.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for wt_2-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f7ba7290a9cc5d2103265968151ac0aa9b06eaf02c83f6c907c46469159be853
MD5 0f936d448c5633bf22645e884a0b1b7a
BLAKE2b-256 908f8d36c3beadd117221168136425636888e00805db913821cf642007c16f5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page