xi-mzidentml-converter uses pyteomics (https://pyteomics.readthedocs.io/en/latest/index.html) to parse mzIdentML files (v1.2.0) and extract crosslink information. Results are written to a relational database (PostgreSQL or SQLite) using sqlalchemy.

These details have not been verified by PyPI

Project links

Homepage

Project description

xi-mzidentml-converter

python-app

xi-mzidentml-converter processes mzIdentML 1.2.0 and 1.3.0 files with the primary aim of extracting crosslink information. It has three use cases:

to validate mzIdentML files against the criteria given here: https://www.ebi.ac.uk/pride/markdownpage/crosslinking
to extract information on crosslinked resiude pairs and output it in a form more easily used by modelling software
to populate the database that is accessed by xiview-api

It uses the pyteomics library (https://pyteomics.readthedocs.io/en/latest/index.html) as the underlying parser for mzIdentML. Results are written into a relational database (PostgreSQL or SQLite) using sqlalchemy.

Requirements:

python3.10

pipenv

sqlite3 for validation and residue pair extraction. postgresql or sqlite3 for creation of xiview-api dtabase (the instructions below use posrgresql)

Installation

Clone git repository and set up python envorment or install via PYPI:

git clone https://github.com/Rappsilber-Laboratory/xi-mzidentml-converter.git
cd x-mzidentml-converter
pipenv install --python 3.10

PYPI project: https://pypi.org/project/xi-mzidentml-converter/

PYPI instructions: https://packaging.python.org/en/latest/tutorials/installing-packages/

Usage

proceess_dataset.py is the entry point and running it with the -h option will give a list of options.

python process_dataset.py -h

1. Validate a dataset

Run processdataset.py with the -v option to validate a dataset, the argument is the path to a specific mzIdentML file or to a directory conatining multiple mzIdentML files, in which case all of them will be validated. To pass, all the peaklist files referenced must be in the same directory as the mzIdentML file(s). The converter will create an sqlite database in the temporary folder which is used in the validation process, the temporary folder can be specified with the -t option.

Examples:

python process_dataset.py -v ~/mydata

python process_dataset.py -v ~/mydata/mymzid.mzid -t ~/mytempdir

The result is written to the console. If the data fails validation but the error message is not informative, please open an issue on the github repository: https://github.com/Rappsilber-Laboratory/xi-mzidentml-converter/issues

2. Extract summary of crosslinked residue pairs

Run processdataset.py with the --seqsandresiduepairs option to extract a summary of search sequences and crosslinked residue pairs. The output is json which is written to the console. The argument is the path to an mZIdentML file or a directory containing multiple mzIdentML files, in which case all of them will be processed.

Examples:

python process_dataset.py --seqsandresiduepairs ~/mydata -t ~/mytempdir

python process_dataset.py --seqsandresiduepairs ~/mydata/mymzid.mzid

It can also be accessed programitically by using the json_sequences_and_residue_pairs(filepath, tmpdir) function in process_dataset.py.

3. populate the xiview-api database

Create the database

sudo su postgres
psql
create database xiview;
create user xiadmin with login password 'your_password_here';
grant all privileges on database xiview to xiadmin;

find the hba.conf file in the postgresql installation directory and add a line to allow the xiadmin role to access the database: e.g.

sudo nano /etc/postgresql/13/main/pg_hba.conf

then add the line: local xiview xiadmin md5

then restart postgresql:

sudo service postgresql restart

Configure the python environment for the file parser

edit the file xi-mzidentml-converter/config/database.ini to point to your postgressql database. e.g. so its content is:

[postgresql]
host=localhost
database=xitest
user=xiadmin
password=your_password_here
port=5432

Create the database schema

run create_db_schema.py to create the database tables:

python database/create_db_schema.py

Populate the database

To parse a test dataset:

python process_dataset.py -d ~/PXD038060

The command line options that populate the database are -d, -f and -p. Only one of these can be used. The -d option is the directory to process files from, the -f option is the path to an ftp directory conatining mzIdentML files, the -p option is a ProteomeXchange identifier or a list of ProteomeXchange identifiers separated by spaces.

The -i option is the project identifier to use in the database. It will default to the PXD accession or the name of the directory containing the mzIdentML file.

To run tests

Make sure we have the right db user available

psql -p 5432 -c "create role ximzid_unittests with password 'ximzid_unittests';"
psql -p 5432 -c 'alter role ximzid_unittests with login;'
psql -p 5432 -c 'alter role ximzid_unittests with createdb;'
psql -p 5432 -c 'GRANT pg_signal_backend TO ximzid_unittests;'

run the tests

pipenv run pytest

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.6

Nov 7, 2024

This version

0.3.5

Nov 7, 2024

0.3.4

Nov 7, 2024

0.3.3

Nov 7, 2024

0.3.2

Nov 7, 2024

0.3.1

Nov 5, 2024

0.3.0

Nov 5, 2024

0.2.7

Nov 4, 2024

0.2.4

Aug 9, 2024

0.2.3

Aug 5, 2024

0.2.2

Aug 1, 2024

0.2.0

Jul 29, 2024

0.1.29

Apr 30, 2024

0.1.28

Apr 30, 2024

0.1.27

Apr 29, 2024

0.1.26

Apr 29, 2024

0.1.25

Apr 25, 2024

0.1.24

Apr 23, 2024

0.1.23

Apr 8, 2024

0.1.22

Apr 5, 2024

0.1.21

Apr 4, 2024

0.1.20

Apr 4, 2024

0.1.19

Feb 20, 2024

0.1.18

Feb 19, 2024

0.1.17

Feb 6, 2024

0.1.16

Jan 30, 2024

0.1.15

Jan 30, 2024

0.1.14

Jan 30, 2024

0.1.13

Jan 30, 2024

0.1.12

Jan 26, 2024

0.1.11

Jan 26, 2024

0.1.10

Jan 26, 2024

0.1.9

Jan 25, 2024

0.1.7

Jan 24, 2024

0.1.6

Jan 24, 2024

0.1.5

Jan 24, 2024

0.1.4

Jan 24, 2024

0.1.2

Jan 24, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xi_mzidentml_converter-0.3.5.tar.gz (70.7 kB view details)

Uploaded Nov 7, 2024 Source

Built Distribution

xi_mzidentml_converter-0.3.5-py3-none-any.whl (87.8 kB view details)

Uploaded Nov 7, 2024 Python 3

File details

Details for the file xi_mzidentml_converter-0.3.5.tar.gz.

File metadata

Download URL: xi_mzidentml_converter-0.3.5.tar.gz
Upload date: Nov 7, 2024
Size: 70.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for xi_mzidentml_converter-0.3.5.tar.gz
Algorithm	Hash digest
SHA256	`f9bcf6e450bcfa8e6b6d31eaf0bdbbeeb31bc0a00855b014f4a95e849de7d174`
MD5	`f69d60d42e8954aceef63bff65ab8f49`
BLAKE2b-256	`f6024d88fea0f08b88af55aef28ddf6f8a04261e80ccedf9568934640b6f6dbf`

See more details on using hashes here.

File details

Details for the file xi_mzidentml_converter-0.3.5-py3-none-any.whl.

File metadata

Download URL: xi_mzidentml_converter-0.3.5-py3-none-any.whl
Upload date: Nov 7, 2024
Size: 87.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for xi_mzidentml_converter-0.3.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f77287cfe4f7af57bd2af9617ee32a8ccd439ef819ef3791169bcfe8a553f5bb`
MD5	`d0ce878cf7ecb891ea45c91c72b40170`
BLAKE2b-256	`25c65f7cb985ff9eeaf54dbc391f7d2e9585a5feb5a4be4386c71034c15affd9`