Skip to main content

updated standalone version of dbCAN annotation tool for automated CAZyme annotation

Project description

dbCAN-logo

run_dbcan - Standalone Tool of dbCAN3

GitHub Repo stars PyPI - Version Conda Downloads Read the Docs GitHub Issues or Pull Requests
PyPI - Python Version GitHub Release GitHub License GitHub Actions Workflow Status GitHub Actions Workflow Status

Announcement

⚠️ Important Notice:
Due to a recent cyberattack, our primary dbCAN web server is currently offline, and you will not be able to access the online database. Our IT team is actively working to resolve the issue. We apologize for any inconvenience this may cause.

In the meantime, you can still obtain the dbCAN database using our AWS S3 backup. Recommended methods:

1. Use the run_dbcan database command (recommended):

run_dbcan database --db_dir db --aws_s3

This command will download and organize the database files automatically.

2. Download via wget (not for folders):

Please note that wget cannot directly download an entire folder from an S3 bucket. It can only fetch individual files. To download all files, you will need to list the files and download them one by one or use AWS CLI. If you still want to download using wget, you must specify each file’s URL directly, for example:

wget https://dbcan.s3.us-west-2.amazonaws.com/db_v5-2_9-13-2025/some_file

If you want to download the entire folder, please use the AWS CLI as follows:

aws s3 cp s3://dbcan/db_v5-2_9-13-2025/ ./db --recursive

For more details on database downloads, please refer to our documentation.

If you have any questions or need help, feel free to open an issue.

Update

10/20/2025:

  1. SignalP6.0 Topology Annotation: Added support for SignalP6.0 signal peptide prediction. Use --run_signalp flag in CAZyme_annotation command to enable topology annotation. Results are automatically added to the overview.tsv file.
  2. Global Logging System: Implemented comprehensive logging system with --log-level, --log-file, and --verbose options for better debugging and monitoring.
  3. Database Download Command: Added new database command for easy database downloading. Supports both HTTP and AWS S3 sources (use --aws_s3 flag for faster downloads). Use --cgc/--no-cgc to control CGC-related database downloads.
  4. Code Structure Improvements: Continued refactoring with object-oriented programming, improved modularity, and centralized configuration management.

5/12/2025: dev-dbcan branch is used to test new functions and fix issues. After testing, this branch will be merged into the main branch and update docker/conda/pypi. If you want to use those beta functions, please replace the code folder (dbcan) with your current package.

3/16/2025:

  1. Rewrite the structure of run_dbcan 4.0 (suggested by Haidong), using object-oriented programming (OOP) to improve maintainability and readability.
  2. Added new function: cgc_circle, which can visualize CGC in genome.

Future plans Add prediction of food consumption through CAZyme. If you have new suggestions, please contact Dr. Yanbin Yin (yyin@unl.edu), Xinpeng Zhang (xzhang55@huskers.unl.edu), and Dr. Haidong Yi (hyi@stjude.org).

Introduction

Notice

This is the updated version of run_dbcan 4.0. Many changes have been made and described in https://run-dbcan.readthedocs.io/en/latest/. From now on, this repo is the official run_dbcan site, and the site at run_dbcan 4.0 will be no longer maintained.

run_dbcan is the standalone version of the dbCAN3 annotation tool for automated CAZyme annotation. This tool, known as run_dbcan, incorporates pyHMMER (replacing HMMER for better performance), Diamond, and dbCAN_sub for annotating CAZyme families, and integrates CAZyme Gene Clusters (CGCs) and substrate predictions.

Main Commands

The tool provides the following main commands:

  • database - Download dbCAN databases (supports HTTP and AWS S3)
  • CAZyme_annotation - Annotate CAZymes using Diamond, pyHMMER, and dbCAN-sub
  • gff_process - Generate GFF files for CGC identification
  • cgc_finder - Identify CAZyme Gene Clusters (CGCs)
  • substrate_prediction - Predict substrate specificities of CGCs
  • cgc_circle_plot - Generate circular plots for CGCs
  • easy_CGC - Complete CGC analysis pipeline (annotation + GFF processing + CGC identification)
  • easy_substrate - Complete CGC analysis with substrate prediction
  • Pfam_null_cgc - Annotate null genes in CGCs using Pfam

All commands support global logging options: --log-level, --log-file, and --verbose.

For usage discussions, visit our issue tracker. To learn more, read the dbcan doc. If you're interested in contributing, whether through issues or pull requests, please review our contribution guide.

Reference

Please cite the following dbCAN publications if you use run_dbcan in your research:

dbCAN3: automated carbohydrate-active enzyme and substrate annotation

Jinfang Zheng, Qiwei Ge, Yuchen Yan, Xinpeng Zhang, Le Huang, Yanbin Yin,

Nucleic Acids Research, 2023;, gkad328, doi: 10.1093/nar/gkad328.

dbCAN2: a meta server for automated carbohydrate-active enzyme annotation

Han Zhang, Tanner Yohe, Le Huang, Sarah Entwistle, Peizhi Wu, Zhenglu Yang, Peter K Busk, Ying Xu, Yanbin Yin

Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W95–W101, doi: 10.1093/nar/gky418.

dbCAN-seq: a database of carbohydrate-active enzyme (CAZyme) sequence and annotation

Le Huang, Han Zhang, Peizhi Wu, Sarah Entwistle, Xueqiong Li, Tanner Yohe, Haidong Yi, Zhenglu Yang, Yanbin Yin

Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D516–D521, doi: 10.1093/nar/gkx894*.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbcan-5.2.7.tar.gz (3.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbcan-5.2.7-py3-none-any.whl (148.1 kB view details)

Uploaded Python 3

File details

Details for the file dbcan-5.2.7.tar.gz.

File metadata

  • Download URL: dbcan-5.2.7.tar.gz
  • Upload date:
  • Size: 3.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbcan-5.2.7.tar.gz
Algorithm Hash digest
SHA256 14e36bb82a088d9c8dfa0ab6c37f58247e30b8666480f6b5999ca8d9567dc9c0
MD5 01fda5c709a339b5d523ca3d4cb069da
BLAKE2b-256 31c876fa92950f384ce43118990be71ce304af7813c7b85fe2b10db362e6c328

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbcan-5.2.7.tar.gz:

Publisher: pypi_release.yml on bcb-unl/run_dbcan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dbcan-5.2.7-py3-none-any.whl.

File metadata

  • Download URL: dbcan-5.2.7-py3-none-any.whl
  • Upload date:
  • Size: 148.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbcan-5.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 372a80d0671310ae6cc125224ddb19b106eee8b6d4cd597d20bd89553bb9b4a8
MD5 8ac8eca59d2cf8a1b6b3feb207c83489
BLAKE2b-256 3652e2d6de5c5e7e089d88f1b30709b98571196d7883dfee907a5d1887415c32

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbcan-5.2.7-py3-none-any.whl:

Publisher: pypi_release.yml on bcb-unl/run_dbcan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page