Skip to main content

Extract MS2, MGF, and mzML files from Bruker timsTOF .d folders

Project description

tdfextractor

A Python package to extract MS/MS spectra from Bruker TimsTOF .D folders and convert them to standard formats (MS2, MGF, and mzML).

Installation

pip install tdfextractor

Usage

tdfextractor provides two command-line tools for extracting spectra:

MS2 Extraction

Extract MS2 format files (compatible with MS-GF+, Comet, etc.):

ms2-extractor /path/to/sample.d

# shorthand
ms2-ex 
ms2-ex /path/to/sample.d --output custom_output.ms2 --min-intensity 100 --min-charge 2
ms2-ex /path/to/directory_with_multiple_d_folders --output /path/to/output_directory

MGF Extraction

Extract MGF format files

mgf-extractor /path/to/sample.d

#shorthand
mgf-ex
mgf-ex /path/to/sample.d --casanovo  # Optimized for Casanovo de novo sequencing
mgf-ex /path/to/directory_with_multiple_d_folders --output /path/to/output_directory

mzML Extraction

Extract mzML format files (includes both MS1 and MS2 PASEF spectra):

mzml-extractor /path/to/sample.d

# shorthand
mzml-ex /path/to/sample.d
mzml-ex /path/to/sample.d --no-ms1  # MS2 spectra only
mzml-ex /path/to/sample.d --mz-compression zstd --intensity-encoding 32
mzml-ex /path/to/directory_with_multiple_d_folders --output /path/to/output_directory

Output Options

Both extractors support flexible output options:

  1. No output specified: Files are created within each .D folder with auto-generated names
  2. Specific file path: Use -o filename.ms2 or -o filename.mgf for single .D folder processing
  3. Output directory: Use -o /path/to/output_dir for batch processing multiple .D folders
  4. Overwrite protection: Use --overwrite to replace existing output files

Batch Processing

When processing multiple .D folders, the extractors will:

  • Automatically find all .D folders in the specified directory
  • Create output files with names matching the .D folder names
  • Skip existing files unless --overwrite is specified
  • Create the output directory if it doesn't exist

Command Line Arguments

Both MS2 and MGF extractors share the same arguments, with only a few format-specific options:

Argument Type Default Description
analysis_dir str - Path to the .D analysis directory or directory containing .D folders
-o, --output str <analysis_dir_name>.<ext> Output file path or directory
--remove-precursor flag False Remove precursor peaks from MS/MS spectra
--precursor-peak-width float 2.0 Width around precursor m/z to remove (Da)
--batch-size int 100 Batch size for processing spectra
--top-n-peaks int None Keep only top N most intense peaks per spectrum
--min-spectra-intensity float None Minimum intensity threshold for MS/MS peaks (absolute or 0.0-1.0 for percentage)
--max-spectra-intensity float None Maximum intensity threshold for MS/MS peaks (absolute or 0.0-1.0 for percentage)
--min-spectra-mz float None Minimum m/z filter for MS/MS peaks
--max-spectra-mz float None Maximum m/z filter for MS/MS peaks
--min-precursor-intensity float None Minimum precursor intensity filter
--max-precursor-intensity float None Maximum precursor intensity filter
--min-precursor-charge int None Minimum precursor charge state filter
--max-precursor-charge int None Maximum precursor charge state filter
--min-precursor-mz float None Minimum precursor m/z filter
--max-precursor-mz float None Maximum precursor m/z filter
--min-precursor-rt float None Minimum precursor retention time filter (seconds)
--max-precursor-rt float None Maximum precursor retention time filter (seconds)
--min-precursor-ccs float None Minimum precursor CCS filter
--max-precursor-ccs float None Maximum precursor CCS filter
--min-precursor-neutral-mass float None Minimum precursor neutral mass filter
--max-precursor-neutral-mass float None Maximum precursor neutral mass filter
--mz-precision int 5 Number of decimal places for m/z values
--intensity-precision int 0 Number of decimal places for intensity values
--keep-empty-spectra flag False Write empty spectra to output file
--overwrite flag False Overwrite existing output files
--workers int 1 Number of worker threads for processing multiple .d folders
-v, --verbose flag False Enable verbose logging

Format-Specific Arguments

MS2 Extractor Only:

  • --ip2: Use IP2 preset settings (sets min charge to 2, top 500 peaks)

MGF Extractor Only:

  • --casanovo: Use Casanovo preset settings (enables precursor removal, top-150 peaks, min intensity 0.01, m/z range 50-2500, min charge 2)

mzML Extractor Only:

Argument Type Default Description
--no-ms1 flag False Skip MS1 spectra; write only MS2 PASEF spectra
--mz-compression str zlib Compression for m/z arrays (none, zlib, zstd, numpress-linear, numpress-slof, numpress-pic)
--intensity-compression str zlib Compression for intensity arrays
--mobility-compression str zlib Compression for per-peak ion mobility arrays (MS1)
--mz-encoding int 64 Bit width for m/z values (32 or 64)
--intensity-encoding int 32 Bit width for intensity values (32 or 64)
--centroid-noise-filter str none Noise filter before centroiding (none, mad, percentile, histogram, baseline, iterative_median)
--centroid-mz-tolerance float 8.0 m/z tolerance for centroiding
--centroid-mz-tolerance-type str ppm Unit for m/z tolerance (ppm or da)
--centroid-im-tolerance float 0.05 Ion mobility tolerance for centroiding
--centroid-im-tolerance-type str relative Unit for ion mobility tolerance (relative or absolute)
--centroid-min-peaks int 5 Minimum raw peaks required to form a centroided peak

Performance Options

The --workers argument allows parallel processing of multiple .d folders:

# Process multiple .d folders with 4 worker threads
mgf-ex /path/to/directory_with_multiple_d_folders --workers 4

Note: Workers only affect processing when multiple .d folders are being processed simultaneously. Each worker processes one complete .d folder independently.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tdfextractor-0.4.0.tar.gz (31.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tdfextractor-0.4.0-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file tdfextractor-0.4.0.tar.gz.

File metadata

  • Download URL: tdfextractor-0.4.0.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tdfextractor-0.4.0.tar.gz
Algorithm Hash digest
SHA256 25fdd3fc99c83960f4166514a333404e8f116a6edbb76e3f6cbfa81724e062c4
MD5 f02acd1242ddff9a6a287b8a9c078e8b
BLAKE2b-256 2894610de6b3647130de5b6501f489c2d80ca6d735bc81806b130bc36f3ad0b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for tdfextractor-0.4.0.tar.gz:

Publisher: python-publish.yml on tacular-omics/tdfextractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tdfextractor-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: tdfextractor-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tdfextractor-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 02855fb6515b29b427087322b82c6d032990c8f746c879cf5c76d23e09f6ed93
MD5 0f9482db970a77fa53601fdf4d927d2b
BLAKE2b-256 017a5f1836c798c5e2ed9385b2ff624a5f22086e9ed4b9f8d10f4b803c1e3c8e

See more details on using hashes here.

Provenance

The following attestation bundles were made for tdfextractor-0.4.0-py3-none-any.whl:

Publisher: python-publish.yml on tacular-omics/tdfextractor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page