Skip to main content

Python package for probe-based gene cluster finding in large microbial genome database

Project description

pyGCAP: a (py)thon (G)ene (C)luster (A)nnotation & (P)rofiling

A Python Package for Probe-based Gene Cluster Finding in Large Microbial Genome Database


Introduction

Bacterial gene clusters provide insights into metabolism and evolution, and facilitate biotechnological applications. We developed pyGCAP, a Python package for probe-based gene cluster discovery. This pipeline uses sequence search and analysis tools and public databases (e.g. BLAST, MMSeqs2, UniProt, and NCBI) to predict potential gene clusters by user-provided probe genes. We tested the pipeline with the division and cell wall (dcw) gene cluster, crucial for cell division and peptidoglycan biosynthesis.

To evaluate pyGCAP, we used 17 major dcw genes defined by Megrian et al. [1] as a probe set to search for gene clusters in 716 Lactobacillales genomes. The results were integrated to provide detailed information on gene content, gene order, and types of clusters. While PGCfinder examined the completeness of the gene clusters, it could also suggest novel taxa-specific accessory genes related to dcw clusters in Lactobacillales genomes. The package will be freely available on the Python Package Index, Bioconda, and GitHub.

[1] Megrian, D., et al. Ancient origin and constrained evolution of the division and cell wall gene cluster in Bacteria. Nat Microbiol 7, 2114–2127 (2022).


Pipeline-flow

flowchart


Pre-requirement

  1. Python

  2. conda environment


Usage

  • pypi pygcap (link)

    pip install pygcap
    conda activate ncbi_datasets
    pygcap [WORKING_DIRECTORY] [TAXON] [PROBE_FILE]
    
  • input argument description

    ### usage example
    pygcap . Facklamia pygcap/data/probe_sample.tsv
    pygcap . 66831 pygcap/data/probe_sample.tsv
    
    1. working directory

    2. taxon (both name and taxid are available)

    3. path of probe.tsv (sample file)

      • Probe Name (user defined)
      • Prediction (user defined)
      • Accession (UniProt entry)

Options

  • --skip: Specify steps to skip during the process. Multiple steps can be skipped by using this option multiple times.

    • all: to skip all processes below
    • ncbi: to skip downloading genome data from NCBI
    • mmseqs2: to skip running MMseqs2
    • parsing: to skip parsing genome data
    • uniprot: to skip downloading probe data from Uniprot
    • blastdb: to skip running makeblastdb
    pygcap [WORKING_DIRECTORY] [TAXON] [PROBE_FILE] —-skip or -s [ARG]
    

(WIP)Output

  • A directory with the following structure will be created in your working directory with the name of the TAXON provided as input.
    📦 [TAXON_NAME]
    ├─ data
    │  ├─ assembly_report.tsv
    │  ├─ metadata_target.tsv
    │  └─ ...
    ├─ input
    │  ├─ [GENUS_01]
    │  ├─ [GENUS_02]
    │  └─ ...
    ├─ output
    │  ├─ genus
    │  ├─ img
    │  └─ tsv
    └─ seqlib
       ├─ blast_output.tsv
       ├─ seqlib.tsv
       └─ ...
    

(WIP) example

  • Profiling dcw genes from pan-genomes of Lactobacillales (LAB)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygcap-1.0.1.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

pygcap-1.0.1-py3-none-any.whl (28.9 kB view details)

Uploaded Python 3

File details

Details for the file pygcap-1.0.1.tar.gz.

File metadata

  • Download URL: pygcap-1.0.1.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for pygcap-1.0.1.tar.gz
Algorithm Hash digest
SHA256 29e8817d200539bb67e2ddcaef15f6cec9c726e42c078767e7840922e94fdb76
MD5 5e977ad752ef88be722d592d09e44ad9
BLAKE2b-256 e4a0c25657dbc8336c698aa27acb4641a3a68568c548b363263e320a8db15248

See more details on using hashes here.

File details

Details for the file pygcap-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pygcap-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for pygcap-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7d47c4affdb4ab0de993b3ffa0213db43b05b55f1c3d9fbfda20c5b431115ce7
MD5 2fb545709477da4a1b05b834e7819a3e
BLAKE2b-256 d0191b2a6cabd86e28e64d10def9e098a536089cbae70c3ccdabdb46dac1b9ef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page