Skip to main content

Find Lysine Acylations & other Modification Sites

Project description

FLAMS: Find Lysine Acylations & other Modification Sites

A bioinformatics tool to analyze the conservation of post-translational modifications (PTMs), by means of a position-based search against the Compendium of Protein Lysine Modifications (CPLM database) v.4 and the experimental PTM sites in dbPTM. FLAMS is available as command-line tool and as a web service.

Table of contents

  1. Introduction
  2. System requirements
    1. General dependencies
    2. Third-party dependencies
  3. Installation
  4. Usage
    1. Example use case
  5. Output
  6. Supported PTMs
    1. Supported PTM databases
    2. Supported PTM types
    3. Local CPLM and dbPTM install
  7. Contact
  8. References
  9. License

Introduction

FLAMS is a bioinformatics tool to analyze the conservation of post-translational modifications, by means of a position-based search against the CPLM database v.4 (Zhang, W. et al. Nucleic Acids Research. 2021, 44 (5): 243–250.) and the part of the dbPTM database with experimental support (Li, Z. et al. Nucleic Acids Research. 2022, 50:D471–D479.). FLAMS can be used (i) to quickly verify whether modifications in a specific protein have been reported before, (ii) to assess whether findings in one species might translate to other species, and (iii) to systematically assess the novelty and conservation of reported modification sites.

The tool takes as input a protein (identifier or sequence) and the position of an amino acid. This repository contains the command-line tool FLAMS, which obtains an overview of the previously reported post-translational modifications matching your query, by using the following scripts:

  • input.py: processing the user-provided input
  • cplmv4.py, dbptm.py and setup.py: downloading and preparing the modification-specific databases
  • run_blast.py: searching your query against the databases of proteins with post-translational modifications
  • display.py: formatting the list of conserved post-translational modifications to a tab delimited output file
  • utils.py: dealing with OS-dependent directory systems

FLAMS is also available as a web service at https://www.biw.kuleuven.be/m2s/cmpg/research/CSB/tools/flams/ .

System requirements

Linux 64-bit, Windows and Mac OS supported.

General dependencies

  • Python3 (>=3.10, <3.12)

Third-party dependencies

Installation

The recommended installation for Mac OS and Linux is through conda:

conda install -c conda-forge -c bioconda flams

It is also possible to install FLAMS through pip (recommended installation for Windows):

pip install flams

Please note that the pip install requires users to have BLAST+ installed locally and available in PATH. For more information on how to install BLAST+ on Windows, click here .

Usage

Run the tool:

FLAMS [-h] (--in inputFilePath | --id UniProtID | --batch batchFilePath) [-p position] [--range errorRange] [-o outputFilePath] [-d dataDir] [-t threadsBLAST] [-e evalueBLAST] [-m modification [modification ...]]

Required argument:

  • one of:
    • inputFilePath, used with --in, is the path to a .fasta file with the protein you wish to query against. (has to contain only 1 protein)
    • UniProtID, used with --id, is the UniProt ID of the protein you wish to query against.
    • batchFilePath, used with --batch, is the path to a tab seperated file for batch runs. The file should contain 1 entry per line, with UniProt IDs in the 1st column, and positions in the 2nd column.

Required argument when running FLAMS with --id/--in:

  • position is the position of a lysine in the protein, which you want to query against.

Optional arguments:

  • errorRange is an number of positions before and after pos to also search for modifications. [default: 0]
  • outputFilePath is the path to where the result will be saved (in a .tsv file format). [default: out.tsv] If FLAMS is run with --batch, the specified -o/--output is used as preposition, followed by '$UniProtID$position.tsv'. [default: '']
  • dataDir is the path to the directory where intermediate files (the UniProt sequence files) are stored. [default: $PWD/data]
  • threadsBLAST is a BLAST parameter, allows you to speed up the search by multithreading. [default: 1]
  • evalueBLAST is a BLAST parameter, allows you to filter out low quality BLAST hits. [default: 0.01]
  • modification is a space-seperated list of modifications (all lower case) to search for at the given position. Possible values are any (combinations) of the CPLM and dbPTM modifications. We also provide aggregated combinations for each amino acid ($AA-All), and the CPLM combinations. For a full list of all supported PTMs, and how they are named, see the Supported PTM types section of the README. In general, PTMs are written all lowercase, and spaces within a PTM name are replaced by underscores. [default: K-All]

Example use case

We provide two example use cases for FLAMS:

With the following command, you search whether the TatA (UniProt ID: A0A916NWA0) acetylation on K66 in Dehalococcoide mccartyi strain CBDB1, as described by Greiner-Haas (2021), had been previously detected.

FLAMS --in A0A916NWA0.fa -p 66 -m acetylation -o tatA.tsv

With the following command, you search whether the Mycobabcterium smegmatis' FadD2 (UniProt ID: A0QQ22) K537 is known to carry any modifications of the 'acylations' category, similar to what was reported by Xu (2020).

FLAMS --id A0QQ22 -p 537 -m CPLM-Acylations -o FadD2.tsv

You can find the example input and output data in the folder test_data.

For more example use cases, see the Supplementary information of the paper.

Output

The output file is a .tsv containing one row per modification that matched the query, i.e., a modification aligning (within the user-specified range) to the query position, in a protein similar to the query protein. In case of batch jobs (ran with --batch), one output file per query (= a single line in the batch job file) will be generated.

The output file contains 14 columns:

  • UniProt ID: UniProt identifier of the matched protein
  • Protein name: protein name of the matched protein
  • Modification: the type of modification found in the matched protein
  • $AA location: the location of this matched modification in the matched protein
  • $AA window: the local sequence containing the conserved modification (window of five amino acids before and after°)
  • Species: the textual description of the species of the matched protein
  • BLAST E Value: E value of BLASTp search of the matched protein against your query protein
  • BLAST identity: % identity of BLASTp search of the matched protein against your query protein
  • BLAST coverage: % coverage of BLASTp search of the matched protein against your query protein
  • CPLM ID: CPLM ID of matched protein modification (if found in CPLM, otherwise empty)
  • CPLM evidence code: CPLM evidence code of matched protein modification. Can be Exp(erimental), Dat(abase) or both. (if found in CPLM, otherwise empty)
  • CPLM evidence links: CPLM evidence link of matched protein modification. Can be PubMed ID (for Exp.), or a database identified (for Dat) or both. (if found in CPLM, otherwise empty)
  • dbPTM evidence code: dbPTM evidence code of matched protein modification (if found in dbPTM, otherwise empty)
  • dbPTM evidence links: dbPTM evidence link of matched protein modification. Refers to PubMed IDs. (if found in dbPTM, otherwise empty)

°: window can be smaller than the [-5;+5] window if the sequence alignment ends sooner, which can happen for modified sites near the start/end of the protein

Supported PTMs

Supported PTM databases

FLAMS updates its search databases regularly. To get an overview of the supported databases, see the table below.

FLAMS version CPLM version dbPTM version database available for download
v1.1 v4 2023_November yes
v1.0 v4 yes

Please note that only part of dbPTM is integrated into FLAMS, namely the PTM sites with experimental evidence, as found here. Moreover, proteins that carry a UniProt ID that was obsolete or missing at the time of database creation are also not included in the database.

Instructions on how to download the CPLM and dbPTM database yourself are in section 'Local CPLM and dbPTM install'. This is not recommended, as it takes multiple hours to generate some databases.

Supported PTM types

FLAMS allows searches for all PTM types included in CPLM, and for those with experimental evidence in dbPTM. An overview of the PTM types, how to call them in FLAMS, how they are called in CPLM and/or dbPTM, and on which amino acid they can be found is given in the table below. This table can also be found as a tab seperated file named FLAMS_supported_ptms_v11.txt .

FLAMS PTM name CPLM name dbPTM name A (Ala) C (Cys) D (Asp) E (Glu) F (Phe) G (Gly) H (His) I (Ile) K (Lys) L (Leu) M (Met) N (Asn) P (Pro) Q (Gln) R (Arg) S (Ser) T (Thr) V (Val) W (Trp) Y (Tyr) CPLM-Acylations CPLM-Ubs CPLM-Others CPLM-All
acetylation Acetylation Acetylation X X X X X X X X X X X X X X X
adp-ribosylation ADP-ribosylation X X X X X X X X X X
amidation Amidation X X X X X X X X X X X X X X X X X X X X
ampylation AMPylation X X X
benzoylation Benzoylation X X X
beta-hydroxybutyrylation β-Hydroxybutyrylation X X X
biotinylation Biotinylation Biotinylation X X X
blocked_amino_end Blocked amino end X X X X X X X X X X X X X X X X
butyrylation Butyrylation Butyrylation X X X
carbamidation Carbamidation X
carboxyethylation Carboxyethylation Carboxyethylation X X X
carboxylation Carboxylation Carboxylation X X X
carboxymethylation Carboxymethylation X X X
cholesterol_ester Cholesterol ester X
citrullination Citrullination X
crotonylation Crotonylation Crotonylation X X X
c-linked_glycosylation C-linked Glycosylation X
deamidation Deamidation X X
deamination Deamination X
decanoylation Decanoylation X X
decarboxylation Decarboxylation X X
dietylphosphorylation Dietylphosphorylation X X X
d-glucuronylation D-glucuronoylation X
farnesylation Farnesylation X
formation_of_an_isopeptide_bond Formation of an isopeptide bond X X
formylation Formylation Formylation X X X X X
gamma-carboxyglutamic_acid Gamma-carboxyglutamic acid X
geranylgeranylation Geranylgeranylation X
glutarylation Glutarylation Glutarylation X X X
glutathionylation Glutathionylation X
glycation Glycation X X X
gpi-anchor GPI-anchor X X X X X X X
hmgylation HMGylation X X X
hydroxyceramide_ester Hydroxyceramide ester X
hydroxylation Hydroxylation Hydroxylation X X X X X X X X X X X X X X X X X X
iodination Iodination X
lactoylation Lactoylation X
lactylation Lactylation Lactylation X X X
lipoylation Lipoylation Lipoylation X X X
malonylation Malonylation Malonylation X X X
methylation Methylation Methylation X X X X X X X X X X X X X X X X X X X X X
mgcylation MGcylation X X X
mgylation MGylation X X X
myristoylation Myristoylation X X X
neddylation Neddylation Neddylation X X X
nitration Nitration X
n-carbamoylation N-carbamoylation X
n-linked_glycosylation N-linked Glycosylation X X X X X X X X X
n-palmitoylation N-palmitoylation X X X
octanoylation Octanoylation X X
oxidation Oxidation X X X X X
o-linked_glycosylation O-linked Glycosylation X X X X X
o-palmitoleoylation O-palmitoleoylation X
o-palmitoylation O-palmitoylation X X
phosphatidylethanolamine_amidation Phosphatidylethanolamine amidation X
phosphoglycerylation Phosphoglycerylation X X X
phosphorylation Phosphorylation X X X X X X X X X X X X X X X X X X X
propionylation Propionylation Propionylation X X X
pupylation Pupylation X X X
pyrrolidone_carboxylic_acid Pyrrolidone carboxylic acid X X
pyrrolylation Pyrrolylation X
pyruvate Pyruvate X X
serotonylation Serotonylation X
stearoylation Stearoylation X
succinylation Succinylation Succinylation X X X X X
sulfation Sulfation X X X X
sulfhydration Sulfhydration X
sulfoxidation Sulfoxidation X
sumoylation Sumoylation Sumoylation X X X
s-archaeol S-archaeol X
s-carbamoylation S-carbamoylation X
s-cyanation S-Cyanation X
s-cysteinylation S-cysteinylation X
s-diacylglycerol S-diacylglycerol X
s-linked_glycosylation S-linked Glycosylation X
s-nitrosylation S-nitrosylation X
s-palmitoylation S-palmitoylation X
thiocarboxylation Thiocarboxylation X
ubiquitination Ubiquitination Ubiquitination X X X X X X
umpylation UMPylation X X X
2-hydroxyisobutyrylation 2-Hydroxyisobutyrylation X X X

Local CPLM and dbPTM install

It is possible to install the CPLM and dbPTM databases directly, instead of using the pre-generated databases that are hosted on Zenodo. This is however not recommended as the download takes several hours for larger databases, such as phosphorylation, ubiquitination and acetylation.

However, if desired, find your local FLAMS installation and follow these instructions to modify the scripts:

  • on a fresh install:

    • go to src/flams/databases/setup.py
    • comment out lines 495-501 (function _generate_blastdb_if_not_up_to_date)
    • uncomment line 505 (function _generate_blastdb_if_not_up_to_date)
  • on a FLAMS version with previously generated BLAST databases:

    • go to src/flams/databases/setup.py
    • change the version numbers of the databases you wish to update on lines 67-453. E.g.:

    "2-hydroxyisobutyrylation": ModificationType( "2-hydroxyisobutyrylation", 1.0, [ModificationDatabase(cplmv4, "2-Hydroxyisobutyrylation")], ["K"] ),

    becomes

    "2-hydroxyisobutyrylation": ModificationType( "2-hydroxyisobutyrylation", 2.0, [ModificationDatabase(cplmv4, "2-Hydroxyisobutyrylation")], ["K"] ),

    • comment out lines 495-501 (function _generate_blastdb_if_not_up_to_date)
    • uncomment line 505 (function _generate_blastdb_if_not_up_to_date)

Contact

Laboratory of Computational Systems Biology, KU Leuven.

References

If you use FLAMS in your work, please cite us.

In addition, FLAMS relies on third-party software & databases:

Altschul, S.F. et al (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.

Li, Z. et al (2022) "dbPTM in 2022: an updated database for exploring regulatory networks and functional associations of protein post-translational modifications." Nucleic Acids Research. 50:D471–D479.

Zhang, W. et al (2021) "CPLM 4.0: an updated database with rich annotations for protein lysine modifications." Nucleic Acids Research. 44(5):243–250.

License

FLAMS is freely available under an MIT license.

Use of the third-party software, libraries or code referred to in the References section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flams-1.1.2.tar.gz (33.2 kB view details)

Uploaded Source

Built Distribution

flams-1.1.2-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file flams-1.1.2.tar.gz.

File metadata

  • Download URL: flams-1.1.2.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for flams-1.1.2.tar.gz
Algorithm Hash digest
SHA256 b1a62f8ba443800afcaa8bed70b4d93b1c0e2631d0d6a7eae9064a8368403b0e
MD5 a947d89fd3340ce681ba5a0ac942bed4
BLAKE2b-256 07179ec3cbcc210a5b811059ab9b3d65b16dabe794c0e781bda4353d999cdb39

See more details on using hashes here.

File details

Details for the file flams-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: flams-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 31.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for flams-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 75fe90288a864a3b86f4dec7fab6582ae1082c6e52e9f3cd3b8937dba5878784
MD5 8e7bc3ad68105ed65232753e3632770a
BLAKE2b-256 bfc59ce90820c197fbdd79c341d0bfe4a5866642eae08f0e04fce38e44a2d7e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page