Skip to main content

mzidentml-reader uses pyteomics (https://pyteomics.readthedocs.io/en/latest/index.html) to parse mzIdentML files (v1.2.0) and extract crosslink information. Results are written to a relational database (PostgreSQL or SQLite) using sqlalchemy.

Project description

mzidentml-reader

python-app License

mzidentml-reader processes mzIdentML 1.2.0 and 1.3.0 files with the primary aim of extracting crosslink information. It has three use cases:

  1. to validate mzIdentML files against the criteria given here: https://www.ebi.ac.uk/pride/markdownpage/crosslinking
  2. to extract information on crosslinked residue pairs and output it in a form more easily used by modelling software
  3. to populate the database that is accessed by crosslinking-api

It uses the pyteomics library (https://pyteomics.readthedocs.io/en/latest/index.html) as the underlying parser for mzIdentML. Results are written into a relational database (PostgreSQL or SQLite) using sqlalchemy.

Requirements

  • Python 3.10
  • pipenv
  • SQLite3 for validation and residue pair extraction
  • PostgreSQL or SQLite3 for crosslinking-api database creation

Installation

Development Setup

Clone the repository and set up the development environment:

git clone https://github.com/Rappsilber-Laboratory/mzidentml-reader.git
cd mzidentml-reader
pipenv install --python 3.10 --dev
pipenv shell

Production Installation

Install via PyPI:

pip install mzidentml-reader

PyPI project: https://pypi.org/project/mzidentml-reader/

For more installation details, see: https://packaging.python.org/en/latest/tutorials/installing-packages/

Usage

proceess_dataset.py is the entry point and running it with the -h option will give a list of options.

python parser.py -h;

alternative:

python -m parser -h;

1. Validate a dataset

Run processdataset.py with the -v option to validate a dataset, the argument is the path to a specific mzIdentML file or to a directory conatining multiple mzIdentML files, in which case all of them will be validated. To pass, all the peaklist files referenced must be in the same directory as the mzIdentML file(s). The converter will create an sqlite database in the temporary folder which is used in the validation process, the temporary folder can be specified with the -t option.

Examples:

python parser.py -v ~/mydata
python parser.py -v ~/mydata/mymzid.mzid -t ~/mytempdir

The result is written to the console. If the data fails validation but the error message is not informative, please open an issue on the github repository: https://github.com/Rappsilber-Laboratory/mzidentml-reader/issues

2. Extract summary of crosslinked residue pairs

Run processdataset.py with the --seqsandresiduepairs option to extract a summary of search sequences and crosslinked residue pairs. The output is json which is written to the console. The argument is the path to an mZIdentML file or a directory containing multiple mzIdentML files, in which case all of them will be processed.

Examples:

python parser.py --seqsandresiduepairs ~/mydata -t ~/mytempdir
python parser.py --seqsandresiduepairs ~/mydata/mymzid.mzid

It can also be accessed programitically by using the json_sequences_and_residue_pairs(filepath, tmpdir) function in parser.py.

3. populate the crosslinking-api database

Create the database

sudo su postgres;
psql;
create database crosslinking;
create user xiadmin with login password 'your_password_here';
grant all privileges on database crosslinking to xiadmin;
\connect crosslinking;
GRANT ALL PRIVILEGES ON SCHEMA public TO xiadmin;

find the hba.conf file in the postgresql installation directory and add a line to allow the xiadmin role to access the database: e.g.

sudo nano /etc/postgresql/13/main/pg_hba.conf

then add the line: local crosslinking xiadmin md5

then restart postgresql:

sudo service postgresql restart

Configure the python environment for the file parser

edit the file mzidentml-reader/config/database.ini to point to your postgressql database. e.g. so its content is:

[postgresql]
host=localhost
database=crosslinking
user=xiadmin
password=your_password_here
port=5432

Create the database schema

run create_db_schema.py to create the database tables:

python parser/database/create_db_schema.py

Populate the database

To parse a test dataset:

python parser.py -d ~/PXD038060

The command line options that populate the database are -d, -f and -p. Only one of these can be used. The -d option is the directory to process files from, the -f option is the path to an ftp directory conatining mzIdentML files, the -p option is a ProteomeXchange identifier or a list of ProteomeXchange identifiers separated by spaces.

The -i option is the project identifier to use in the database. It will default to the PXD accession or the name of the directory containing the mzIdentML file.

Development

Code Quality

This project uses standardized code quality tools:

# Format code
pipenv run black .

# Sort imports
pipenv run isort .

# Check style and syntax
pipenv run flake8

Testing

Make sure the test database user is available:

psql -p 5432 -c "create role ximzid_unittests with password 'ximzid_unittests';"
psql -p 5432 -c 'alter role ximzid_unittests with login;'
psql -p 5432 -c 'alter role ximzid_unittests with createdb;'
psql -p 5432 -c 'GRANT pg_signal_backend TO ximzid_unittests;'

Run tests with coverage:

pipenv run pytest  # Run tests with coverage (80% threshold)
pipenv run pytest --cov-report=html  # Generate HTML coverage report
pipenv run pytest -m "not slow"  # Skip slow tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mzidentml_reader-0.4.5.tar.gz (99.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mzidentml_reader-0.4.5-py3-none-any.whl (116.5 kB view details)

Uploaded Python 3

File details

Details for the file mzidentml_reader-0.4.5.tar.gz.

File metadata

  • Download URL: mzidentml_reader-0.4.5.tar.gz
  • Upload date:
  • Size: 99.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for mzidentml_reader-0.4.5.tar.gz
Algorithm Hash digest
SHA256 69e44002721a4ec738d182f12da6c125010f6100d45fb32ed6bbfa209e82c93d
MD5 fee91755ab77b2568e1b9687d86d1172
BLAKE2b-256 e6c75c15b1c85f94d5065a0df47c5df6ce7629796a055163831a62a4862e4a1f

See more details on using hashes here.

File details

Details for the file mzidentml_reader-0.4.5-py3-none-any.whl.

File metadata

File hashes

Hashes for mzidentml_reader-0.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8e5ae211a1c9bd7b82290497de786d5ba3c1c986cd4c5d7ffe6830871164554c
MD5 fc4074052df3e2202caae098a1f1e214
BLAKE2b-256 315e98ad727d2fb95d1cd745f8351a2aca2dcbd90fdc06980c1dd1386a05e14f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page