Minimal library for writing mzTab 1.0 proteomics files
Project description
mztabwriter
A minimal, dependency-free Python library for writing mzTab 1.0 proteomics files.
mzTab specification (1.0 Proteomics Release) · Format examples · Russian README
Features
- Generates mzTab 1.0 files (proteomics mode)
- No mandatory runtime dependencies — pure Python 3.10+
- Supports both
CompleteandSummarymodes - Supports both
QuantificationandIdentificationtypes - Handles label-free, iTRAQ, and SILAC experiments
- Full metadata coverage: instruments, contacts, publications, samples, URIs
- Optional pandas integration for bulk loading from DataFrames
to_string()andto_file()output methods
Installation
pip install mztabwriter
With optional pandas support:
pip install mztabwriter[pandas]
mzTab 1.0 File Structure (Proteomics)
An mzTab file consists of tab-separated sections, each identified by a row-type prefix:
| Prefix | Section | Description |
|---|---|---|
MTD |
Metadata | Experiment description, instruments, software, ms_runs, assays, modifications |
PRH |
Protein Header | Column names for the protein table |
PRT |
Protein | One row per identified protein |
PSH |
PSM Header | Column names for the PSM table |
PSM |
PSM | One row per peptide-spectrum match |
COM |
Comment | Ignored by parsers, human-readable notes |
MTD — Metadata (required)
Key metadata fields:
| Key | Description | Example |
|---|---|---|
mzTab-version |
Format version | 1.0.0 |
mzTab-mode |
Complete or Summary |
Complete |
mzTab-type |
Quantification or Identification |
Quantification |
description |
Free-text experiment description | |
ms_run[N]-location |
URI of raw data file | file:///data/run1.mzML |
assay[N]-quantification_reagent |
CV param of label/reagent | [MS, MS:1002038, unlabeled sample, ] |
assay[N]-ms_run_ref |
Which ms_run this assay uses | ms_run[1] |
study_variable[N]-assay_refs |
Assays grouped by condition | assay[1],assay[2] |
study_variable[N]-description |
Condition description | heat shock control |
fixed_mod[N] |
Fixed search modification (UNIMOD CV) | [UNIMOD, UNIMOD:4, Carbamidomethyl, ] |
variable_mod[N] |
Variable search modification | [UNIMOD, UNIMOD:35, Oxidation, ] |
protein_search_engine_score[N] |
Score type for proteins | [MS, MS:1001171, Mascot:score, ] |
psm_search_engine_score[N] |
Score type for PSMs | [MS, MS:1001171, Mascot:score, ] |
quantification_method |
Quantification strategy | [MS, MS:1001835, SILAC, ] |
Optional:
| Key | Description |
|---|---|
title |
Experiment title |
mzTab-ID |
Repository identifier |
instrument[N]-name/source/analyzer/detector |
MS instrument details |
software[N] |
Analysis software |
publication[N] |
pubmed:XXXXXXX or doi:... |
contact[N]-name/affiliation/email |
Contact person |
uri[N] |
Link to data repository |
sample[N]-species/cell_type/disease/tissue |
Sample description |
PRT — Protein rows
Each protein row contains:
| Column | Type | Description |
|---|---|---|
accession |
str | Database identifier (e.g. P63017) |
description |
str|null | Protein description |
taxid |
int|null | NCBI Taxonomy ID |
species |
str|null | Species name |
database |
str|null | Database name (e.g. UniProtKB) |
database_version |
str|null | Database version |
search_engine |
CvParam|null | Search engine |
best_search_engine_score[1] |
float|null | Best score across all runs |
search_engine_score[1]_ms_run[N] |
float|null | Score per run |
num_psms_ms_run[N] |
int|null | Number of PSMs per run |
num_peptides_distinct_ms_run[N] |
int|null | Distinct peptides per run |
num_peptides_unique_ms_run[N] |
int|null | Unique peptides per run |
ambiguity_members |
str|null | Comma-separated accessions of ambiguity group |
modifications |
str|null | Detected modifications (e.g. 12-UNIMOD:35) |
protein_coverage |
float|null | Sequence coverage fraction (0.0–1.0) |
protein_abundance_assay[N] |
float|null | Abundance per assay |
protein_abundance_study_variable[N] |
float|null | Mean abundance per condition |
protein_abundance_stdev_study_variable[N] |
float|null | Std deviation per condition |
protein_abundance_std_error_study_variable[N] |
float|null | Std error per condition |
PSM — Peptide-Spectrum Match rows
| Column | Type | Description |
|---|---|---|
sequence |
str | Peptide amino acid sequence |
PSM_ID |
int | Unique PSM identifier within the file |
accession |
str | Protein accession |
unique |
0|1|null | 1 if peptide is unique to this protein |
database |
str|null | Database name |
database_version |
str|null | Database version |
search_engine |
CvParam|null | Search engine |
search_engine_score[1] |
float|null | Score |
modifications |
str|null | Modifications (e.g. 0-UNIMOD:214, 9-UNIMOD:4) |
spectra_ref |
str|null | Spectrum reference, e.g. ms_run[1]:scan=1296 |
retention_time |
float|null | Retention time in seconds |
charge |
int|null | Precursor charge state |
exp_mass_to_charge |
float|null | Experimental m/z |
calc_mass_to_charge |
float|null | Theoretical m/z |
pre |
str|null | Amino acid before the peptide N-terminus (- = protein N-term) |
post |
str|null | Amino acid after the peptide C-terminus |
start |
int|null | 1-based start position in protein |
end |
int|null | 1-based end position in protein |
API Reference
CvParam(cv_label, accession, name, value="")
A Controlled Vocabulary parameter — the basic annotation unit in mzTab.
from mztabwriter import CvParam
CvParam("MS", "MS:1001207", "Mascot")
# → [MS, MS:1001207, Mascot, ]
CvParam("UNIMOD", "UNIMOD:4", "Carbamidomethyl")
# → [UNIMOD, UNIMOD:4, Carbamidomethyl, ]
CvParam("PRIDE", "PRIDE:0000131", "Instrument model", "Micromass Q-TOF I")
# → [PRIDE, PRIDE:0000131, Instrument model, Micromass Q-TOF I]
Modification(position, cv_accession)
A peptide/protein modification at a specific position.
from mztabwriter import Modification
Modification(0, "UNIMOD:214") # → 0-UNIMOD:214
Modification(12, "UNIMOD:35") # → 12-UNIMOD:35
Modification(None, "UNIMOD:4") # → -UNIMOD:4
MzTabDocument(mode, type_, version, title, description, mztab_id)
The main document class.
| Parameter | Type | Default | Description |
|---|---|---|---|
mode |
"Complete" | "Summary" |
"Complete" |
File mode |
type_ |
"Quantification" | "Identification" |
"Quantification" |
Data type |
version |
str | "1.0.0" |
mzTab format version |
title |
str | None | None |
Experiment title |
description |
str | None | None |
Experiment description |
mztab_id |
str | None | None |
Repository ID |
Metadata methods
| Method | Returns | Description |
|---|---|---|
add_ms_run(location, format=None, id_format=None) |
MsRun |
Add a raw data file reference |
add_sample(description, species, cell_type, disease, tissue, custom) |
Sample |
Add sample description |
add_assay(ms_run, quantification_reagent, sample=None, quantification_mods=None) |
Assay |
Add assay (run + label) |
add_study_variable(description, assays) |
StudyVariable |
Group assays into a condition |
set_quantification_method(cv) |
None |
Set experiment-level quantification method |
set_protein_quantification_unit(cv) |
None |
Set abundance unit |
add_software(cv) |
None |
Add analysis software |
add_publication(ref) |
None |
Add pubmed:XXXXXXX or doi:... |
add_contact(name, affiliation=None, email=None) |
None |
Add contact person |
add_uri(uri) |
None |
Add data repository URI |
add_instrument(name, source, analyzer, detector) |
None |
Add MS instrument description |
add_fixed_mod(cv, site=None, position=None) |
SearchModification |
Add fixed search modification |
add_variable_mod(cv, site=None, position=None) |
SearchModification |
Add variable search modification |
add_protein_search_engine_score(cv) |
SearchEngineScore |
Register protein score type |
add_psm_search_engine_score(cv) |
SearchEngineScore |
Register PSM score type |
Data methods
| Method | Returns | Description |
|---|---|---|
add_protein(accession, ...) |
ProteinRow |
Add a protein row |
add_psm(sequence, psm_id, accession, ...) |
PsmRow |
Add a PSM row |
add_proteins_from_dataframe(df) |
None |
Bulk-load proteins from pandas DataFrame |
add_psms_from_dataframe(df) |
None |
Bulk-load PSMs from pandas DataFrame |
Output methods
| Method | Returns | Description |
|---|---|---|
to_string() |
str |
Return the complete mzTab document as a string |
to_file(path) |
None |
Write the document to a file (UTF-8) |
Examples
Label-free quantification (2 conditions × 3 replicates)
from mztabwriter import MzTabDocument, CvParam, Modification
doc = MzTabDocument(
mode="Complete",
type_="Quantification",
title="LFQ heat shock experiment",
description="Label-free quantification of heat shock proteins, 2 conditions",
)
# Raw data files
r1 = doc.add_ms_run("file:///data/ctrl_rep1.mzML")
r2 = doc.add_ms_run("file:///data/ctrl_rep2.mzML")
r3 = doc.add_ms_run("file:///data/ctrl_rep3.mzML")
r4 = doc.add_ms_run("file:///data/treat_rep1.mzML")
r5 = doc.add_ms_run("file:///data/treat_rep2.mzML")
r6 = doc.add_ms_run("file:///data/treat_rep3.mzML")
reagent = CvParam("MS", "MS:1002038", "unlabeled sample")
a1 = doc.add_assay(r1, reagent)
a2 = doc.add_assay(r2, reagent)
a3 = doc.add_assay(r3, reagent)
a4 = doc.add_assay(r4, reagent)
a5 = doc.add_assay(r5, reagent)
a6 = doc.add_assay(r6, reagent)
doc.add_study_variable("control", [a1, a2, a3])
doc.add_study_variable("heat shock treatment", [a4, a5, a6])
# Scores and modifications
doc.add_protein_search_engine_score(CvParam("MS", "MS:1001171", "Mascot:score"))
doc.add_psm_search_engine_score(CvParam("MS", "MS:1001171", "Mascot:score"))
doc.add_fixed_mod(CvParam("UNIMOD", "UNIMOD:4", "Carbamidomethyl"), site="C", position="Anywhere")
doc.add_variable_mod(CvParam("UNIMOD", "UNIMOD:35", "Oxidation"), site="M", position="Anywhere")
doc.set_quantification_method(CvParam("MS", "MS:1002038", "unlabeled sample"))
doc.set_protein_quantification_unit(CvParam("PRIDE", "PRIDE:0000393", "Relative quantification unit"))
# Proteins
doc.add_protein(
accession="P63017",
description="Heat shock cognate 71 kDa protein",
taxid=10090,
species="Mus musculus",
database="UniProtKB",
database_version="2013_08",
search_engine=CvParam("MS", "MS:1001207", "Mascot"),
best_search_engine_score=46.0,
search_engine_scores={"ms_run[1]": 46, "ms_run[2]": 26, "ms_run[3]": 36,
"ms_run[4]": -3, "ms_run[5]": -1, "ms_run[6]": None},
num_psms={"ms_run[1]": 1, "ms_run[2]": 1, "ms_run[3]": 1,
"ms_run[4]": 1, "ms_run[5]": 1, "ms_run[6]": 0},
num_peptides_distinct={"ms_run[1]": 1, "ms_run[2]": 1, "ms_run[3]": 1,
"ms_run[4]": 1, "ms_run[5]": 1, "ms_run[6]": 0},
num_peptides_unique={"ms_run[1]": 1, "ms_run[2]": 1, "ms_run[3]": 1,
"ms_run[4]": 1, "ms_run[5]": 1, "ms_run[6]": 0},
protein_coverage=0.34,
protein_abundance_assay={
"assay[1]": 34.3, "assay[2]": 40.4, "assay[3]": 41.1,
"assay[4]": 267.0, "assay[5]": 234.4, "assay[6]": 271.0,
},
protein_abundance_study_variable={"study_variable[1]": 38.6, "study_variable[2]": 257.5},
protein_abundance_stdev_study_variable={"study_variable[1]": 3.8, "study_variable[2]": 20.1},
protein_abundance_std_error_study_variable={"study_variable[1]": 2.2, "study_variable[2]": 11.6},
)
# PSMs
doc.add_psm(
sequence="QTQTFTTYSDNQPGVL",
psm_id=1,
accession="P63017",
unique=1,
database="UniProtKB",
database_version="2013_08",
search_engine=CvParam("MS", "MS:1001207", "Mascot"),
search_engine_score=46.0,
modifications=[Modification(0, "UNIMOD:214")],
spectra_ref="ms_run[1]:scan=1296",
retention_time=1336.62,
charge=3,
exp_mass_to_charge=600.6218923,
calc_mass_to_charge=600.6197,
pre="K",
post="I",
start=424,
end=439,
)
print(doc.to_string())
doc.to_file("lfq_experiment.mzTab")
iTRAQ quantification
from mztabwriter import MzTabDocument, CvParam
doc = MzTabDocument(mode="Complete", type_="Quantification")
run = doc.add_ms_run("file:///data/itraq_run1.mzML")
a1 = doc.add_assay(run, CvParam("PRIDE", "PRIDE:0000114", "iTRAQ reagent 114"))
a2 = doc.add_assay(run, CvParam("PRIDE", "PRIDE:0000115", "iTRAQ reagent 115"))
a3 = doc.add_assay(run, CvParam("PRIDE", "PRIDE:0000116", "iTRAQ reagent 116"))
a4 = doc.add_assay(run, CvParam("PRIDE", "PRIDE:0000117", "iTRAQ reagent 117"))
doc.add_study_variable("t=0", [a1])
doc.add_study_variable("t=1", [a2])
doc.add_study_variable("t=2", [a3])
doc.add_study_variable("t=3", [a4])
doc.set_quantification_method(CvParam("PRIDE", "PRIDE:0000313", "iTRAQ"))
doc.add_fixed_mod(CvParam("UNIMOD", "UNIMOD:214", "iTRAQ4plex"), site="K", position="Anywhere")
doc.add_fixed_mod(CvParam("UNIMOD", "UNIMOD:214", "iTRAQ4plex"), site="N-term", position="Any N-term")
SILAC quantification
from mztabwriter import MzTabDocument, CvParam
doc = MzTabDocument(mode="Complete", type_="Quantification")
run = doc.add_ms_run("file:///data/silac.mzML")
light = CvParam("PRIDE", "PRIDE:0000326", "SILAC light")
heavy = CvParam("PRIDE", "PRIDE:0000325", "SILAC heavy")
heavy_mods = [
CvParam("UNIMOD", "UNIMOD:267", "Label:13C(6)15N(4)"),
CvParam("UNIMOD", "UNIMOD:259", "Label:13C(6)15N(2)"),
]
a_light = doc.add_assay(run, light)
a_heavy = doc.add_assay(run, heavy, quantification_mods=heavy_mods)
doc.add_study_variable("control", [a_light])
doc.add_study_variable("treatment", [a_heavy])
doc.set_quantification_method(CvParam("MS", "MS:1001835", "SILAC"))
Loading from pandas DataFrame
import pandas as pd
from mztabwriter import MzTabDocument, CvParam
doc = MzTabDocument(mode="Complete", type_="Quantification")
# ... (add ms_runs, assays, study_variables, scores first) ...
df_proteins = pd.DataFrame([
{
"accession": "P63017",
"description": "Heat shock cognate 71 kDa protein",
"taxid": 10090,
"species": "Mus musculus",
"database": "UniProtKB",
"database_version": "2013_08",
"search_engine": CvParam("MS", "MS:1001207", "Mascot"),
"best_search_engine_score": 46.0,
"protein_coverage": 0.34,
"protein_abundance_assay[1]": 34.3,
"protein_abundance_assay[2]": 266.9,
"protein_abundance_study_variable[1]": 34.3,
"protein_abundance_study_variable[2]": 266.9,
"protein_abundance_stdev_study_variable[1]": 3.8,
"protein_abundance_stdev_study_variable[2]": 20.1,
"protein_abundance_std_error_study_variable[1]": 2.2,
"protein_abundance_std_error_study_variable[2]": 11.6,
},
])
doc.add_proteins_from_dataframe(df_proteins)
doc.to_file("output.mzTab")
File Structure Summary
MTD mzTab-version 1.0.0
MTD mzTab-mode Complete
MTD mzTab-type Quantification
MTD description ...
MTD ms_run[1]-location file:///data/run1.mzML
MTD assay[1]-quantification_reagent [MS, MS:1002038, unlabeled sample, ]
MTD assay[1]-ms_run_ref ms_run[1]
MTD study_variable[1]-assay_refs assay[1],assay[2],assay[3]
MTD study_variable[1]-description control
...
PRH accession description ... protein_abundance_assay[1] ...
PRT P63017 Heat shock… ... 34.3 ...
...
PSH sequence PSM_ID accession ... spectra_ref ...
PSM QTQTFTT… 1 P63017 ... ms_run[1]:scan=1296 ...
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mztabwriter-0.1.0.tar.gz.
File metadata
- Download URL: mztabwriter-0.1.0.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2be8dc53b2e6d76243087c572b2bda33f981b9b15bbf462d05ffc8294850b1bc
|
|
| MD5 |
3e23d8f17d673c7a120acc0608464d1e
|
|
| BLAKE2b-256 |
17cdadbb6d2dde7f375eee811a300cca8fd2d27c22e37da57320ef8eaea56072
|
File details
Details for the file mztabwriter-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mztabwriter-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6bfeed85e8abfb7f495fe4ae50e4af00d98eb02c6da87f730acfd3a7905ce2a
|
|
| MD5 |
733dd87db3206db3871ba4b94761e782
|
|
| BLAKE2b-256 |
a0ff0551236f5b06c1a804780ffc4dcfd18680197412482a25c94a2af0836468
|