Skip to main content

No project description provided

Project description

cblaster

Python package codecov License: MIT PyPI version Documentation Status DOI

Outline

cblaster is a tool for finding clusters of co-located homologous sequences in BLAST searches.

cblaster search workflow

Given a collection of protein sequences, cblaster can search sequence databases remotely (via NCBI BLAST API) or locally (via DIAMOND). Search results are parsed and filtered based on user thresholds for identity, coverage and e-value. The genomic coordinates of remaining hits are obtained from the NCBI's Identical Protein Group (IPG) database (or a local database in local searches). Finally, cblaster scans for instances of collocation and generates visualisations:

cblaster search results

Installation

cblaster can be installed via pip:

$ pip3 install cblaster --user

or by cloning the repository and installing:

$ git clone https://github.com/gamcil/cblaster.git
...
$ cd cblaster/
$ pip3 install .

Additionally, we provide executables for Windows and Mac which can be downloaded from here.

Once installed, make sure you configure cblaster with your email address:

$ cblaster config --email name@domain.com

Dependencies

cblaster is tested on Python 3.6, and its only external Python dependency is the requests module (used for interaction with NCBI APIs). If you want to perform local searches, you should have diamond installed and available on your system $PATH. cblaster will throw an error if a local search is started but it cannot find diamond or diamond-aligner (alias when installed via apt) on the system.

Usage

cblaster accepts FASTA files and collections of valid NCBI sequence identifiers (GIs, accession numbers) as input. A remote search can be performed as simply as:

$ cblaster search --query_file query.fasta

For example, to remotely search the burnettramic acids gene cluster, bua , against the NCBI's nr database:

$ cblaster search -qf bua.fasta

[12:14:17] INFO - Starting cblaster in remote mode
[12:14:17] INFO - Launching new search
[12:14:19] INFO - Request Identifier (RID): WHS0UGYJ015
[12:14:19] INFO - Request Time Of Execution (RTOE): 25s
[12:14:44] INFO - Polling NCBI for completion status
[12:14:44] INFO - Checking search status...
[12:15:44] INFO - Checking search status...
[12:16:44] INFO - Checking search status...
[12:16:46] INFO - Search has completed successfully!
[12:16:46] INFO - Retrieving results for search WHS0UGYJ015
[12:16:51] INFO - Parsing results...
[12:16:51] INFO - Found 3944 hits meeting score thresholds
[12:16:51] INFO - Fetching genomic context of hits
[12:17:14] INFO - Searching for clustered hits across 705 organisms
[12:17:14] INFO - Writing summary to <stdout>

Aspergillus mulundensis DSM 5745
================================
NW_020797889.1
--------------
Query       Subject         Identity  Coverage  E-value    Bitscore  Start    End      Strand
QBE85641.1  XP_026607259.1  75.56     99.5918   0          742       1717881  1719409  -
QBE85642.1  XP_026607260.1  89.916    100       0          667       1719650  1720797  +
QBE85643.1  XP_026607261.1  89.532    83.1169   0          832       1721494  1722934  +
QBE85644.1  XP_026607262.1  64.829    98.9218   6.51e-157  455       1723252  1724467  -
QBE85645.1  XP_026607263.1  69.97     100       6.93e-157  449       1725113  1726277  -
QBE85646.1  XP_026607264.1  82.759    96.8447   0          670       1726892  1728302  +
QBE85647.1  XP_026607265.1  72.674    99.2048   0          764       1729735  1731338  +
QBE85648.1  XP_026607266.1  56.098    98.324    4.24e-64   205       1731701  1732402  -
QBE85649.1  XP_026607267.1  79.623    99.8746   0          6573      1732820  1745289  +

...

A query sequence absence/presence matrix can be generated using the --binary argument:

Organism                                   Scaffold        Start    End      QBE85641.1  QBE85642.1  QBE85643.1  QBE85644.1  QBE85645.1  QBE85646.1  QBE85647.1  QBE85648.1  QBE85649.1
Aspergillus mulundensis DSM 5745           NW_020797889.1  1717881  1745289  1           1           1           1           1           1           1           1           1         
Aspergillus versicolor CBS 583.65          KV878126.1      3162095  3187090  1           1           1           0           1           1           1           1           1         
Pseudomassariella vexata CBS 129021        MCFJ01000004.1  1606356  1628483  1           1           1           0           0           1           0           1           1         
Hypoxylon sp. CO27-5                       KZ112517.1      92119    112957   1           1           1           0           0           0           1           0           1         
Hypoxylon sp. EC38                         KZ111255.1      514739   535366   1           1           1           0           0           0           1           0           1         
Epicoccum nigrum ICMP 19927                KZ107839.1      2116719  2142558  1           1           0           0           0           1           1           0           1         
Aureobasidium subglaciale EXF-2481         NW_013566983.1  700476   718693   1           1           0           0           0           1           1           0           0         
Aureobasidium pullulans EXF-6514           QZBF01000009.1  18721    34295    1           1           0           0           0           1           1           0           0         
Aureobasidium pullulans EXF-5628           QZBI01000512.1  329      13401    1           0           0           0           0           1           1           0           0         

cblaster can also generate fully interactive visualisations of the binary table. To view an example, click here.

For further usage examples and API documentation, please refer to the documentation.

Citation

If you found this tool useful, please cite:

Gilchrist, C.L.M., Booth, T.J., Chooi, Y.-H., 2020. cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters. bioRxiv 2020.11.08.370601. https://doi.org/10.1101/2020.11.08.370601

cblaster makes use of the following tools:

Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).

Acland, A. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 42, 7–17 (2014).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cblaster-1.3.3.tar.gz (171.8 kB view details)

Uploaded Source

Built Distribution

cblaster-1.3.3-py3-none-any.whl (186.8 kB view details)

Uploaded Python 3

File details

Details for the file cblaster-1.3.3.tar.gz.

File metadata

  • Download URL: cblaster-1.3.3.tar.gz
  • Upload date:
  • Size: 171.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for cblaster-1.3.3.tar.gz
Algorithm Hash digest
SHA256 acf2b86ce211a3fdbe24f105aac1799c2e04e946943373f6f7efb9b8b4311d3c
MD5 5fef034663b409c07174b60764548005
BLAKE2b-256 fb6e825d15686a88dabb5d9ad0f3c1b3151a7e4bec2c9447e2bfb0b703eef092

See more details on using hashes here.

File details

Details for the file cblaster-1.3.3-py3-none-any.whl.

File metadata

  • Download URL: cblaster-1.3.3-py3-none-any.whl
  • Upload date:
  • Size: 186.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for cblaster-1.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a4ed46633bc865acb33d133bb00c33dbb0d61706109bc68f3ffe318d026c6795
MD5 7c7d029aea871097d0471ba35d31fc73
BLAKE2b-256 3ed789f48fe9d6f58fbdc482c6b51b0a7973061aa620c7c27b6a2cbae6e82fe2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page