Extract MS2, MGF, and mzML files from Bruker timsTOF .d folders
Project description
tdfextractor
A Python package to extract MS/MS spectra from Bruker TimsTOF .D folders and convert them to standard formats (MS2, MGF, and mzML).
Installation
pip install tdfextractor
Usage
tdfextractor provides two command-line tools for extracting spectra:
MS2 Extraction
Extract MS2 format files (compatible with MS-GF+, Comet, etc.):
ms2-extractor /path/to/sample.d
# shorthand
ms2-ex
ms2-ex /path/to/sample.d --output custom_output.ms2 --min-intensity 100 --min-charge 2
ms2-ex /path/to/directory_with_multiple_d_folders --output /path/to/output_directory
MGF Extraction
Extract MGF format files
mgf-extractor /path/to/sample.d
#shorthand
mgf-ex
mgf-ex /path/to/sample.d --casanovo # Optimized for Casanovo de novo sequencing
mgf-ex /path/to/directory_with_multiple_d_folders --output /path/to/output_directory
mzML Extraction
Extract mzML format files (includes both MS1 and MS2 PASEF spectra):
mzml-extractor /path/to/sample.d
# shorthand
mzml-ex /path/to/sample.d
mzml-ex /path/to/sample.d --no-ms1 # MS2 spectra only
mzml-ex /path/to/sample.d --mz-compression zstd --intensity-encoding 32
mzml-ex /path/to/directory_with_multiple_d_folders --output /path/to/output_directory
Output Options
Both extractors support flexible output options:
- No output specified: Files are created within each .D folder with auto-generated names
- Specific file path: Use
-o filename.ms2or-o filename.mgffor single .D folder processing - Output directory: Use
-o /path/to/output_dirfor batch processing multiple .D folders - Overwrite protection: Use
--overwriteto replace existing output files
Batch Processing
When processing multiple .D folders, the extractors will:
- Automatically find all .D folders in the specified directory
- Create output files with names matching the .D folder names
- Skip existing files unless
--overwriteis specified - Create the output directory if it doesn't exist
Command Line Arguments
Both MS2 and MGF extractors share the same arguments, with only a few format-specific options:
| Argument | Type | Default | Description |
|---|---|---|---|
analysis_dir |
str | - | Path to the .D analysis directory or directory containing .D folders |
-o, --output |
str | <analysis_dir_name>.<ext> |
Output file path or directory |
--remove-precursor |
flag | False | Remove precursor peaks from MS/MS spectra |
--precursor-peak-width |
float | 2.0 | Width around precursor m/z to remove (Da) |
--batch-size |
int | 100 | Batch size for processing spectra |
--top-n-peaks |
int | None | Keep only top N most intense peaks per spectrum |
--min-spectra-intensity |
float | None | Minimum intensity threshold for MS/MS peaks (absolute or 0.0-1.0 for percentage) |
--max-spectra-intensity |
float | None | Maximum intensity threshold for MS/MS peaks (absolute or 0.0-1.0 for percentage) |
--min-spectra-mz |
float | None | Minimum m/z filter for MS/MS peaks |
--max-spectra-mz |
float | None | Maximum m/z filter for MS/MS peaks |
--min-precursor-intensity |
float | None | Minimum precursor intensity filter |
--max-precursor-intensity |
float | None | Maximum precursor intensity filter |
--min-precursor-charge |
int | None | Minimum precursor charge state filter |
--max-precursor-charge |
int | None | Maximum precursor charge state filter |
--min-precursor-mz |
float | None | Minimum precursor m/z filter |
--max-precursor-mz |
float | None | Maximum precursor m/z filter |
--min-precursor-rt |
float | None | Minimum precursor retention time filter (seconds) |
--max-precursor-rt |
float | None | Maximum precursor retention time filter (seconds) |
--min-precursor-ccs |
float | None | Minimum precursor CCS filter |
--max-precursor-ccs |
float | None | Maximum precursor CCS filter |
--min-precursor-neutral-mass |
float | None | Minimum precursor neutral mass filter |
--max-precursor-neutral-mass |
float | None | Maximum precursor neutral mass filter |
--mz-precision |
int | 5 | Number of decimal places for m/z values |
--intensity-precision |
int | 0 | Number of decimal places for intensity values |
--keep-empty-spectra |
flag | False | Write empty spectra to output file |
--overwrite |
flag | False | Overwrite existing output files |
--workers |
int | 1 | Number of worker threads for processing multiple .d folders |
-v, --verbose |
flag | False | Enable verbose logging |
Format-Specific Arguments
MS2 Extractor Only:
--ip2: Use IP2 preset settings (sets min charge to 2, top 500 peaks)
MGF Extractor Only:
--casanovo: Use Casanovo preset settings (enables precursor removal, top-150 peaks, min intensity 0.01, m/z range 50-2500, min charge 2)
mzML Extractor Only:
| Argument | Type | Default | Description |
|---|---|---|---|
--no-ms1 |
flag | False | Skip MS1 spectra; write only MS2 PASEF spectra |
--mz-compression |
str | zlib |
Compression for m/z arrays (none, zlib, zstd, numpress-linear, numpress-slof, numpress-pic) |
--intensity-compression |
str | zlib |
Compression for intensity arrays |
--mobility-compression |
str | zlib |
Compression for per-peak ion mobility arrays (MS1) |
--mz-encoding |
int | 64 |
Bit width for m/z values (32 or 64) |
--intensity-encoding |
int | 32 |
Bit width for intensity values (32 or 64) |
--centroid-noise-filter |
str | none |
Noise filter before centroiding (none, mad, percentile, histogram, baseline, iterative_median) |
--centroid-mz-tolerance |
float | 8.0 |
m/z tolerance for centroiding |
--centroid-mz-tolerance-type |
str | ppm |
Unit for m/z tolerance (ppm or da) |
--centroid-im-tolerance |
float | 0.05 |
Ion mobility tolerance for centroiding |
--centroid-im-tolerance-type |
str | relative |
Unit for ion mobility tolerance (relative or absolute) |
--centroid-min-peaks |
int | 5 |
Minimum raw peaks required to form a centroided peak |
Performance Options
The --workers argument allows parallel processing of multiple .d folders:
# Process multiple .d folders with 4 worker threads
mgf-ex /path/to/directory_with_multiple_d_folders --workers 4
Note: Workers only affect processing when multiple .d folders are being processed simultaneously. Each worker processes one complete .d folder independently.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tdfextractor-0.4.0.tar.gz.
File metadata
- Download URL: tdfextractor-0.4.0.tar.gz
- Upload date:
- Size: 31.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25fdd3fc99c83960f4166514a333404e8f116a6edbb76e3f6cbfa81724e062c4
|
|
| MD5 |
f02acd1242ddff9a6a287b8a9c078e8b
|
|
| BLAKE2b-256 |
2894610de6b3647130de5b6501f489c2d80ca6d735bc81806b130bc36f3ad0b0
|
Provenance
The following attestation bundles were made for tdfextractor-0.4.0.tar.gz:
Publisher:
python-publish.yml on tacular-omics/tdfextractor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tdfextractor-0.4.0.tar.gz -
Subject digest:
25fdd3fc99c83960f4166514a333404e8f116a6edbb76e3f6cbfa81724e062c4 - Sigstore transparency entry: 1273000772
- Sigstore integration time:
-
Permalink:
tacular-omics/tdfextractor@5d3c3d6dfea05fe54e09eaa984753d5bbc8d367a -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/tacular-omics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5d3c3d6dfea05fe54e09eaa984753d5bbc8d367a -
Trigger Event:
release
-
Statement type:
File details
Details for the file tdfextractor-0.4.0-py3-none-any.whl.
File metadata
- Download URL: tdfextractor-0.4.0-py3-none-any.whl
- Upload date:
- Size: 29.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02855fb6515b29b427087322b82c6d032990c8f746c879cf5c76d23e09f6ed93
|
|
| MD5 |
0f9482db970a77fa53601fdf4d927d2b
|
|
| BLAKE2b-256 |
017a5f1836c798c5e2ed9385b2ff624a5f22086e9ed4b9f8d10f4b803c1e3c8e
|
Provenance
The following attestation bundles were made for tdfextractor-0.4.0-py3-none-any.whl:
Publisher:
python-publish.yml on tacular-omics/tdfextractor
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tdfextractor-0.4.0-py3-none-any.whl -
Subject digest:
02855fb6515b29b427087322b82c6d032990c8f746c879cf5c76d23e09f6ed93 - Sigstore transparency entry: 1273001008
- Sigstore integration time:
-
Permalink:
tacular-omics/tdfextractor@5d3c3d6dfea05fe54e09eaa984753d5bbc8d367a -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/tacular-omics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5d3c3d6dfea05fe54e09eaa984753d5bbc8d367a -
Trigger Event:
release
-
Statement type: