Skip to main content

Predict biogeochemical cycles from protein fasta files.

Project description

bigecyhmm: Biogeochemical cycle HMMs search

This is a package to search for genes associated with biogeochemical cycles in protein sequence fasta files. The HMMs come from METABOLIC article, KEGG, PFAM, TIGR.

Dependencies

bigecyhmm is developed to be as minimalist as possible. It requires:

  • PyHMMER: to perform HMM search.
  • Pillow: to create biogeochemical cycle diagrams.

The HMMs used are stored inside the package as a zip file (hmm_files.zip). It makes this python package a little heavy (around 15 Mb) but in this way, you do not have to download other files and can directly use it.

Installation

It can be installed with pip by cloning the repository:

git clone https://github.com/ArnaudBelcour/bigecyhmm.git

cd bigecyhmm

pip install -e .

Run bigecyhmm

You can used the tools with two calls:

  • by giving as input a protein fasta file:
bigecyhmm -i protein_sequence.faa -o output_dir
  • by giving as input a folder containing multiple fasta files:
bigecyhmm -i protein_sequences_folder -o output_dir

There is one option:

  • -c to indicate the number of core used. It is only useful if you have multiple protein fasta files as the added cores will be used to run another HMM search on a different protein fasta files.

Output

It gives as output:

  • a folder hmm_results: one tsv files showing the hits for each protein fasta file.
  • function_presence.tsv a tsv file showing the presence/absence of generic functions associated with the HMMs that matched.
  • a folder diagram_input, the necessary input to create Carbon, Nitrogen, Sulfur and other cycles with the R script modified from the METABOLIC repository using the following command: Rscript draw_biogeochemical_cycles.R bigecyhmm_output_folder/diagram_input_folder/ diagram_output TRUE. This script requires the diagram package that could be installed in R with install.packages('diagram').
  • a folder diagram_figures contains biogeochemical diagram figures drawn from template situated in bigecyhmm/templates.

bigecyhmm_visualisation

There is a second command associated with bigecyhmm (bigecyhmm_visualisation), to create visualisation of the results.

To create the associated figures, there are other dependencies:

  • seaborn
  • pandas
  • plotly
  • kaleido

Four inputs are expected:

  • --esmecata: esmecata output folder associated with the run (as the visualisation works on esmecata results).
  • --bigecyhmm: bigecyhmm output folder associated with the run.
  • --abundance-file: abundance file indicating the abundance for each organisms selected by EsMeCaTa.
  • -o: an output folder.

Citation

If you have used bigecyhmm in an article, please cite:

  • this github repository for bigecyhmm.

  • PyHMMER for the search on the HMMs:

Martin Larralde and Georg Zeller. PyHMMER: a python library binding to HMMER for efficient sequence analysis. Bioinformatics, 39(5):btad214, May 2023. https://doi.org/10.1093/bioinformatics/btad214

  • HMMer website for the search on the HMMs:

HMMER. http://hmmer.org. Accessed: 2022-10-19.

  • the following articles for the creation of the custom HMMs:

Zhou, Z., Tran, P.Q., Breister, A.M. et al. METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks. Microbiome 10, 33 (2022). https://doi.org/10.1186/s40168-021-01213-8

Anantharaman, K., Brown, C., Hug, L. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat Commun 7, 13219 (2016). https://doi.org/10.1038/ncomms13219

  • the following article for KOfam HMMs:

Takuya Aramaki, Romain Blanc-Mathieu, Hisashi Endo, Koichi Ohkubo, Minoru Kanehisa, Susumu Goto, Hiroyuki Ogata, KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold, Bioinformatics, Volume 36, Issue 7, April 2020, Pages 2251–2252, https://doi.org/10.1093/bioinformatics/btz859

  • the following article for TIGRfam HMMs:

Jeremy D. Selengut, Daniel H. Haft, Tanja Davidsen, Anurhada Ganapathy, Michelle Gwinn-Giglio, William C. Nelson, Alexander R. Richter, Owen White, TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes, Nucleic Acids Research, Volume 35, Issue suppl_1, 1 January 2007, Pages D260–D264, https://doi.org/10.1093/nar/gkl1043

  • the following article for Pfam HMMs:

Robert D. Finn, Alex Bateman, Jody Clements, Penelope Coggill, Ruth Y. Eberhardt, Sean R. Eddy, Andreas Heger, Kirstie Hetherington, Liisa Holm, Jaina Mistry, Erik L. L. Sonnhammer, John Tate, Marco Punta, Pfam: the protein families database, Nucleic Acids Research, Volume 42, Issue D1, 1 January 2014, Pages D222–D230, https://doi.org/10.1093/nar/gkt1223

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigecyhmm-0.1.4.tar.gz (15.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bigecyhmm-0.1.4-py3-none-any.whl (15.9 MB view details)

Uploaded Python 3

File details

Details for the file bigecyhmm-0.1.4.tar.gz.

File metadata

  • Download URL: bigecyhmm-0.1.4.tar.gz
  • Upload date:
  • Size: 15.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for bigecyhmm-0.1.4.tar.gz
Algorithm Hash digest
SHA256 680524b3a55d31c5cdf18db565c1b16322947f27266f2d2ae98421fb8c46b3c8
MD5 964f04dc14200bbe4ef6ce9d9bd9f885
BLAKE2b-256 5f9158861c73f29b553b9b67f3d74ff157d44e496731c47d84009f088f642359

See more details on using hashes here.

Provenance

The following attestation bundles were made for bigecyhmm-0.1.4.tar.gz:

Publisher: python-publish.yml on ArnaudBelcour/bigecyhmm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bigecyhmm-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: bigecyhmm-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 15.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for bigecyhmm-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2f7606d2f83f13a39dc32de4539b629bfe3e64e0dff9d5a33b1007cdba9e9ea3
MD5 99a00c5b54eabfaecc32461ced898d2d
BLAKE2b-256 bc14a76fc0b3f4b6aaa3c29284b3a7ce5888d1fd410066ae444dd78c7bc67c04

See more details on using hashes here.

Provenance

The following attestation bundles were made for bigecyhmm-0.1.4-py3-none-any.whl:

Publisher: python-publish.yml on ArnaudBelcour/bigecyhmm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page