Skip to main content

Find Lysine Acylations & other Modification Sites

Project description

FLAMS: Find Lysine Acylations & other Modification Sites

A bioinformatics tool to analyze the conservation of post-translational modifications (PTMs), by means of a position-based search against the Compendium of Protein Lysine Modifications (CPLM database) v.4 and the experimental PTM sites in dbPTM. FLAMS is available as command-line tool and as a web service.

Table of contents

  1. Introduction
  2. System requirements
    1. General dependencies
    2. Third-party dependencies
  3. Installation
  4. Usage
    1. Example use case
  5. Output
  6. Supported PTMs
    1. Supported PTM databases
    2. Supported PTM types
    3. Local CPLM and dbPTM install
  7. Contact
  8. References
  9. License

Introduction

FLAMS is a bioinformatics tool to analyze the conservation of post-translational modifications, by means of a position-based search against the CPLM database v.4 (Zhang, W. et al. Nucleic Acids Research. 2021, 44 (5): 243–250.) and the part of the dbPTM database with experimental support (Li, Z. et al. Nucleic Acids Research. 2022, 50:D471–D479.). FLAMS can be used (i) to quickly verify whether modifications in a specific protein have been reported before, (ii) to assess whether findings in one species might translate to other species, and (iii) to systematically assess the novelty and conservation of reported modification sites.

The tool takes as input a protein (identifier or sequence) and the position of an amino acid. This repository contains the command-line tool FLAMS, which obtains an overview of the previously reported post-translational modifications matching your query, by using the following scripts:

  • input.py: processing the user-provided input
  • cplmv4.py, dbptm.py and setup.py: downloading and preparing the modification-specific databases
  • run_blast.py: searching your query against the databases of proteins with post-translational modifications
  • display.py: formatting the list of conserved post-translational modifications to a tab delimited output file
  • utils.py: dealing with OS-dependent directory systems

FLAMS is also available as a web service at https://www.biw.kuleuven.be/m2s/cmpg/research/CSB/tools/flams/ .

System requirements

Linux 64-bit, Windows and Mac OS supported.

General dependencies

  • Python3 (>=3.10, <3.12)

Third-party dependencies

Installation

The recommended installation for Mac OS and Linux is through conda:

conda install -c conda-forge -c bioconda flams

It is also possible to install FLAMS through pip (recommended installation for Windows):

pip install flams

Please note that the pip install requires users to have BLAST+ installed locally and available in PATH. For more information on how to install BLAST+ on Windows, click here .

Usage

Run the tool:

FLAMS [-h] (--in inputFilePath | --id UniProtID | --batch batchFilePath) [-p position] [--range errorRange] [-o outputFilePath] [-d dataDir] [-t threadsBLAST] [-e evalueBLAST] [-m modification [modification ...]]

Required argument:

  • one of:
    • inputFilePath, used with --in, is the path to a .fasta file with the protein you wish to query against. (has to contain only 1 protein)
    • UniProtID, used with --id, is the UniProt ID of the protein you wish to query against.
    • batchFilePath, used with --batch, is the path to a tab seperated file for batch runs. The file should contain 1 entry per line, with UniProt IDs in the 1st column, and positions in the 2nd column.

Required argument when running FLAMS with --id/--in:

  • position is the position of a lysine in the protein, which you want to query against.

Optional arguments:

  • errorRange is an number of positions before and after position to also search for modifications. [default: 0]
  • outputFilePath is the path to where the result will be saved (in a .tsv file format). [default: out.tsv] If FLAMS is run with --batch, the specified -o/--output is used as preposition, followed by '_$UniProtID_$position.tsv'. [default: '']
  • dataDir is the path to the directory where intermediate files (the UniProt sequence files) are stored. [default: $PWD/data]
  • threadsBLAST is a BLAST parameter, allows you to speed up the search by multithreading. [default: 1]
  • evalueBLAST is a BLAST parameter, allows you to filter out low quality BLAST hits. [default: 0.01]
  • modification is a space-separated list of modifications (all lower case) to search for at the given position. Possible values are any (combinations) of the CPLM and dbPTM modifications. We also provide aggregated combinations for each amino acid ($AA-All), and the CPLM combinations. For a full list of all supported PTMs, and how they are named, see the Supported PTM types section of the README. In general, PTMs are written all lowercase, and spaces within a PTM name are replaced by underscores. [default: K-All]

Example use case

We provide two example use cases for FLAMS:

With the following command, you search whether the TatA (UniProt ID: A0A916NWA0) acetylation on K66 in Dehalococcoide mccartyi strain CBDB1, as described by Greiner-Haas (2021), had been previously detected.

FLAMS --in A0A916NWA0.fa -p 66 -m acetylation -o tatA.tsv

With the following command, you search whether the Mycobabcterium smegmatis' FadD2 (UniProt ID: A0QQ22) K537 is known to carry any modifications of the 'acylations' category, similar to what was reported by Xu (2020).

FLAMS --id A0QQ22 -p 537 -m CPLM-Acylations -o FadD2.tsv

You can find the example input and output data in the folder test_data. The output data is organized in folders reflecting the FLAMS version used to generate it, as the output can vary depending on the exact FLAMS version (due to FLAMS database updates).

For more example use cases, see the Supplementary information of the paper.

Output

The output file is a .tsv containing one row per modification that matched the query, i.e., a modification aligning (within the user-specified range) to the query position, in a protein similar to the query protein. In case of batch jobs (ran with --batch), one output file per query (= a single line in the batch job file) will be generated.

The output file contains 14 columns:

  • UniProt ID: UniProt identifier of the matched protein
  • Protein name: protein name of the matched protein
  • Modification: the type of modification found in the matched protein
  • $AA location: the location of this matched modification in the matched protein
  • $AA window: the local sequence containing the conserved modification (window of five amino acids before and after°)
  • Species: the textual description of the species of the matched protein
  • BLAST E Value: E value of BLASTp search of the matched protein against your query protein
  • BLAST identity: % identity of BLASTp search of the matched protein against your query protein
  • BLAST coverage: % coverage of BLASTp search of the matched protein against your query protein
  • CPLM ID: CPLM ID of matched protein modification (if found in CPLM, otherwise empty)
  • CPLM evidence code: CPLM evidence code of matched protein modification. Can be Exp(erimental), Dat(abase) or both. (if found in CPLM, otherwise empty)
  • CPLM evidence links: CPLM evidence link of matched protein modification. Can be PubMed ID (for Exp.), or a database identified (for Dat) or both. (if found in CPLM, otherwise empty)
  • dbPTM evidence code: dbPTM evidence code of matched protein modification (if found in dbPTM, otherwise empty)
  • dbPTM evidence links: dbPTM evidence link of matched protein modification. Refers to PubMed IDs. (if found in dbPTM, otherwise empty)

°: window can be smaller than the [-5;+5] window if the sequence alignment ends sooner, which can happen for modified sites near the start/end of the protein

Supported PTMs

Supported PTM databases

FLAMS updates its search databases regularly. To get an overview of the supported databases, see the table below.

FLAMS version CPLM version dbPTM version database available for download UniProt release
v1.1.4 v4 2024_April yes 2024_02
v1.1.0-3 v4 2023_November yes 2023_05
v1.0 v4 yes NA

Please note that only part of dbPTM is integrated into FLAMS, namely the PTM sites with experimental evidence, as found here. As dbPTM does not store complete protein sequences, these are fetched during database creation based on UniProt identifiers reported in dbPTM and the UniProt release available at the time of database creation. As a consequence, FLAMS database updates can change the content of the PTM databases, beyond the simple addition of new dbPTM and/or CPLM entries, reflecting changes in UniProt. The most common UniProt changes affecting FLAMS databases are removed UniProt entries (leading to the removal of PTM entries on the affected protein in our database) and sequence updates. We are aware of this issue, impacting the completeness and interpretation of FLAMS' results, and will consider solutions in future FLAMS releases.

Instructions on how to download the CPLM and dbPTM database yourself are in section 'Local CPLM and dbPTM install'. This is not recommended, as it takes multiple hours to generate some databases.

Supported PTM types

FLAMS allows searches for all PTM types included in CPLM, and for those with experimental evidence in dbPTM. An overview of the PTM types, how to call them in FLAMS, how they are called in CPLM and/or dbPTM, and on which amino acid they can be found is given in the table below. This table can also be found as a tab seperated file named FLAMS_supported_ptms_v11.txt .

FLAMS PTM name CPLM name dbPTM name A (Ala) C (Cys) D (Asp) E (Glu) F (Phe) G (Gly) H (His) I (Ile) K (Lys) L (Leu) M (Met) N (Asn) P (Pro) Q (Gln) R (Arg) S (Ser) T (Thr) V (Val) W (Trp) Y (Tyr) CPLM-Acylations CPLM-Ubs CPLM-Others CPLM-All
acetylation Acetylation Acetylation X X X X X X X X X X X X X X X
adp-ribosylation ADP-ribosylation X X X X X X X X X X
amidation Amidation X X X X X X X X X X X X X X X X X X X X
ampylation AMPylation X X X
benzoylation Benzoylation X X X
beta-hydroxybutyrylation β-Hydroxybutyrylation X X X
biotinylation Biotinylation Biotinylation X X X
blocked_amino_end Blocked amino end X X X X X X X X X X X X X X X X
butyrylation Butyrylation Butyrylation X X X
carbamidation Carbamidation X
carboxyethylation Carboxyethylation Carboxyethylation X X X
carboxylation Carboxylation Carboxylation X X X
carboxymethylation Carboxymethylation X X X
cholesterol_ester Cholesterol ester X
citrullination Citrullination X
crotonylation Crotonylation Crotonylation X X X
c-linked_glycosylation C-linked Glycosylation X
deamidation Deamidation X X
deamination Deamination X
decanoylation Decanoylation X X
decarboxylation Decarboxylation X X
dephosphorylation Dephosphorylation X X X
dietylphosphorylation Dietylphosphorylation X X X
disulfide_bond Disulfide bond X
d-glucuronylation D-glucuronoylation X
farnesylation Farnesylation X
formation_of_an_isopeptide_bond Formation of an isopeptide bond X X
formylation Formylation Formylation X X X X X
gamma-carboxyglutamic_acid Gamma-carboxyglutamic acid X
geranylgeranylation Geranylgeranylation X
glutarylation Glutarylation Glutarylation X X X
glutathionylation Glutathionylation X
glycation Glycation X X X
gpi-anchor GPI-anchor X X X X X X X
hmgylation HMGylation X X X
hydroxyceramide_ester Hydroxyceramide ester X
hydroxylation Hydroxylation Hydroxylation X X X X X X X X X X X X X X X X X X
iodination Iodination X
lactoylation Lactoylation X
lactylation Lactylation Lactylation X X X
lipoylation Lipoylation Lipoylation X X X
malonylation Malonylation Malonylation X X X
methylation Methylation Methylation X X X X X X X X X X X X X X X X X X X X X
mgcylation MGcylation X X X
mgylation MGylation X X X
myristoylation Myristoylation X X X
neddylation Neddylation Neddylation X X X
nitration Nitration X
n-carbamoylation N-carbamoylation X
n-linked_glycosylation N-linked Glycosylation X X X X X X X X X
n-palmitoylation N-palmitoylation X X X
octanoylation Octanoylation X X
oxidation Oxidation X X X X X
o-linked_glycosylation O-linked Glycosylation X X X X X
o-palmitoleoylation O-palmitoleoylation X
o-palmitoylation O-palmitoylation X X
phosphatidylethanolamine_amidation Phosphatidylethanolamine amidation X
phosphoglycerylation Phosphoglycerylation X X X
phosphorylation Phosphorylation X X X X X X X X X X X X X X X X X X X
propionylation Propionylation Propionylation X X X
pupylation Pupylation X X X
pyrrolidone_carboxylic_acid Pyrrolidone carboxylic acid X X
pyrrolylation Pyrrolylation X
pyruvate Pyruvate X X
serotonylation Serotonylation X
stearoylation Stearoylation X
succinylation Succinylation Succinylation X X X X X
sulfation Sulfation X X X X
sulfhydration Sulfhydration X
sulfoxidation Sulfoxidation X
sumoylation Sumoylation Sumoylation X X X
s-archaeol S-archaeol X
s-carbamoylation S-carbamoylation X
s-cyanation S-Cyanation X
s-cysteinylation S-cysteinylation X
s-diacylglycerol S-diacylglycerol X
s-linked_glycosylation S-linked Glycosylation X
s-nitrosylation S-nitrosylation X
s-palmitoylation S-palmitoylation X
thiocarboxylation Thiocarboxylation X
ubiquitination Ubiquitination Ubiquitination X X X X X X
umpylation UMPylation X X X
2-hydroxyisobutyrylation 2-Hydroxyisobutyrylation X X X

Local CPLM and dbPTM install

It is possible to install the CPLM and dbPTM databases directly, instead of using the pre-generated databases that are hosted on Zenodo. This is however not recommended as the download takes several hours for larger databases, such as phosphorylation, ubiquitination and acetylation.

However, if desired, follow these instructions to modify the scripts:

  1. Make sure you are working in an environment with the correct dependencies:
  • on Linux/MacOS: create a FLAMS conda environment, get all dependencies by installing FLAMS as specified for Linux/MacOS
  • on Windows: create a FLAMS conda environment, get all dependencies by installing FLAMS as specified for Windows. Make sure BLAST+ is correctly installed.
  1. Download the latest FLAMS version from GitHub.

  2. Adapt the scripts:

  • on a fresh install (= never ran FLAMS before, so no FLAMS databases yet):

    • go to src/flams/databases/setup.py
    • comment out lines 503-508 (function _generate_blastdb_if_not_up_to_date - the try/except _get_fasta_from_zenodo)
    • uncomment line 512 (function _generate_blastdb_if_not_up_to_date - the _get_fasta_for_blast)
  • on a FLAMS version with previously generated BLAST databases:

    • go to src/flams/databases/setup.py
    • change the version numbers of the databases you wish to update on lines 75-458. E.g.:

    "2-hydroxyisobutyrylation": ModificationType( "2-hydroxyisobutyrylation", 1.0, [ModificationDatabase(cplmv4, "2-Hydroxyisobutyrylation")], ["K"] ),

    becomes

    "2-hydroxyisobutyrylation": ModificationType( "2-hydroxyisobutyrylation", 2.0, [ModificationDatabase(cplmv4, "2-Hydroxyisobutyrylation")], ["K"] ),

    • comment out lines 503-508 (function _generate_blastdb_if_not_up_to_date - the try/except _get_fasta_from_zenodo)
    • uncomment line 512 (function _generate_blastdb_if_not_up_to_date - the _get_fasta_for_blast)
  1. Install your adapted FLAMS version locally:

    python -m pip install ./PathToLocalFLAMS

Contact

Laboratory of Computational Systems Biology, KU Leuven.

References

If you use FLAMS in your work, please cite:

Longin, H. et al (2024) "FLAMS: Find Lysine Acylations and other Modification Sites." Bioinformatics. 40(1):btae005.

In addition, FLAMS relies on third-party software & databases:

Altschul, S.F. et al (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.

Li, Z. et al (2022) "dbPTM in 2022: an updated database for exploring regulatory networks and functional associations of protein post-translational modifications." Nucleic Acids Research. 50:D471–D479.

Zhang, W. et al (2021) "CPLM 4.0: an updated database with rich annotations for protein lysine modifications." Nucleic Acids Research. 44(5):243–250.

License

FLAMS is freely available under an MIT license.

Use of the third-party software, libraries or code referred to in the References section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flams-1.1.4.tar.gz (34.5 kB view hashes)

Uploaded Source

Built Distribution

flams-1.1.4-py3-none-any.whl (32.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page