Skip to main content

updated standalone version of dbCAN annotation tool for automated CAZyme annotation

Project description

dbCAN-logo

run_dbcan - Standalone Tool of dbCAN3

GitHub Repo stars PyPI - Version Conda Downloads Read the Docs GitHub Issues or Pull Requests
PyPI - Python Version GitHub Release GitHub License GitHub Actions Workflow Status GitHub Actions Workflow Status

Announcement

Update 5/5/2026: The server issue caused by the recent cyberattack has been resolved by ITS and Revanth, and all services are now back to normal. Users can download the dbCAN databases from either the default server or AWS S3; both sources provide the same database files.

⚠️ Important Notice (5/5/2026: Fixed):
Due to a recent cyberattack, our primary dbCAN web server is currently offline, and you will not be able to access the online database. Our IT team is actively working to resolve the issue. We apologize for any inconvenience this may cause.

In the meantime, you can still obtain the dbCAN database using our AWS S3 backup. Recommended methods:

1. Use the run_dbcan database command (recommended):

run_dbcan database --db_dir db --aws_s3

This command will download and organize the database files automatically.

2. Download via wget (not for folders):

Please note that wget cannot directly download an entire folder from an S3 bucket. It can only fetch individual files. To download all files, you will need to list the files and download them one by one or use AWS CLI. If you still want to download using wget, you must specify each file’s URL directly, for example:

wget https://dbcan.s3.us-west-2.amazonaws.com/db_v5-2_9-13-2025/some_file

If you want to download the entire folder, please use the AWS CLI as follows:

aws s3 cp s3://dbcan/db_v5-2_9-13-2025/ ./db --recursive

For more details on database downloads, please refer to our documentation.

If you have any questions or need help, feel free to open an issue.

Update

5/5/2026:

  1. HMM search Z parameter: Following feedback from the community (GitHub issue), we corrected how the pyHMMER Z parameter is set for dbCAN HMM searches. This improves statistical calibration and overall CAZyme annotation performance compared with the previous default.
  2. DeepTMHMM topology (optional): CAZyme_annotation can now run a user-installed DeepTMHMM predict.py via --run_deeptmhmm and --deeptmhmm_dir, and merge transmembrane topology into overview.tsv together with SignalP. SignalP 6.0 and DeepTMHMM are not bundled with run_dbcan—see the documentation (SignalP 6.0 and DeepTMHMM) for install and testing notes.

10/20/2025:

  1. SignalP6.0 Topology Annotation: Added support for SignalP6.0 signal peptide prediction. Use --run_signalp flag in CAZyme_annotation command to enable topology annotation. Results are automatically added to the overview.tsv file.
  2. Global Logging System: Implemented comprehensive logging system with --log-level, --log-file, and --verbose options for better debugging and monitoring.
  3. Database Download Command: Added new database command for easy database downloading. Supports both HTTP and AWS S3 sources (use --aws_s3 flag for faster downloads). Use --cgc/--no-cgc to control CGC-related database downloads.
  4. Code Structure Improvements: Continued refactoring with object-oriented programming, improved modularity, and centralized configuration management.

5/12/2025: dev-dbcan branch is used to test new functions and fix issues. After testing, this branch will be merged into the main branch and update docker/conda/pypi. If you want to use those beta functions, please replace the code folder (dbcan) with your current package.

3/16/2025:

  1. Rewrite the structure of run_dbcan 4.0 (suggested by Haidong), using object-oriented programming (OOP) to improve maintainability and readability.
  2. Added new function: cgc_circle, which can visualize CGC in genome.

Future plans Add prediction of food consumption through CAZyme. If you have new suggestions, please contact Dr. Yanbin Yin (yyin@unl.edu), Xinpeng Zhang (xzhang55@huskers.unl.edu), and Dr. Haidong Yi (hyi@stjude.org).

Introduction

Notice

This is the updated version of run_dbcan 4.0. Many changes have been made and described in https://run-dbcan.readthedocs.io/en/latest/. From now on, this repo is the official run_dbcan site, and the site at run_dbcan 4.0 will be no longer maintained.

run_dbcan is the standalone version of the dbCAN3 annotation tool for automated CAZyme annotation. This tool, known as run_dbcan, incorporates pyHMMER (replacing HMMER for better performance), Diamond, and dbCAN_sub for annotating CAZyme families, and integrates CAZyme Gene Clusters (CGCs) and substrate predictions.

Main Commands

The tool provides the following main commands:

  • database - Download dbCAN databases (supports HTTP and AWS S3)
  • CAZyme_annotation - Annotate CAZymes using Diamond, pyHMMER, and dbCAN-sub
  • gff_process - Generate GFF files for CGC identification
  • cgc_finder - Identify CAZyme Gene Clusters (CGCs)
  • substrate_prediction - Predict substrate specificities of CGCs
  • cgc_circle_plot - Generate circular plots for CGCs
  • easy_CGC - Complete CGC analysis pipeline (annotation + GFF processing + CGC identification)
  • easy_substrate - Complete CGC analysis with substrate prediction
  • Pfam_null_cgc - Annotate null genes in CGCs using Pfam

All commands support global logging options: --log-level, --log-file, and --verbose.

For usage discussions, visit our issue tracker. To learn more, read the dbcan doc. If you're interested in contributing, whether through issues or pull requests, please review our contribution guide.

Reference

Please cite the following dbCAN publications if you use run_dbcan in your research:

dbCAN3: automated carbohydrate-active enzyme and substrate annotation

Jinfang Zheng, Qiwei Ge, Yuchen Yan, Xinpeng Zhang, Le Huang, Yanbin Yin,

Nucleic Acids Research, 2023;, gkad328, doi: 10.1093/nar/gkad328.

dbCAN2: a meta server for automated carbohydrate-active enzyme annotation

Han Zhang, Tanner Yohe, Le Huang, Sarah Entwistle, Peizhi Wu, Zhenglu Yang, Peter K Busk, Ying Xu, Yanbin Yin

Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W95–W101, doi: 10.1093/nar/gky418.

dbCAN-seq: a database of carbohydrate-active enzyme (CAZyme) sequence and annotation

Le Huang, Han Zhang, Peizhi Wu, Sarah Entwistle, Xueqiong Li, Tanner Yohe, Haidong Yi, Zhenglu Yang, Yanbin Yin

Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D516–D521, doi: 10.1093/nar/gkx894*.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbcan-5.2.9.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbcan-5.2.9-py3-none-any.whl (154.9 kB view details)

Uploaded Python 3

File details

Details for the file dbcan-5.2.9.tar.gz.

File metadata

  • Download URL: dbcan-5.2.9.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dbcan-5.2.9.tar.gz
Algorithm Hash digest
SHA256 65f50dd24d8ec779b63b918fb1199b69b40e4daf329dc420c5ac4263091e3c41
MD5 1c29bdbaf1faab1adac9a64e65413644
BLAKE2b-256 6a1ed2da0d35a59735bd4ebb8de75f459dbc203cac3c6794eb437df759282729

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbcan-5.2.9.tar.gz:

Publisher: pypi_release.yml on bcb-unl/run_dbcan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dbcan-5.2.9-py3-none-any.whl.

File metadata

  • Download URL: dbcan-5.2.9-py3-none-any.whl
  • Upload date:
  • Size: 154.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dbcan-5.2.9-py3-none-any.whl
Algorithm Hash digest
SHA256 daf39033e9921d116f46a374714f6095b71394eb6438035f1754354d7e20d8d2
MD5 b74c14af92df4c89147fe6235c0f781a
BLAKE2b-256 7ff77e33a08b626e496559db4fc1f51d0d5d8e28cf7fa881ab2afb0bb20329f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbcan-5.2.9-py3-none-any.whl:

Publisher: pypi_release.yml on bcb-unl/run_dbcan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page