Protein annotation using local PSSM databases from CDD
Project description
local-cd-search
A command-line tool for local protein domain annotation using NCBI's Conserved Domain Database (CDD).
Background
NCBI CD-Search is a widely used tool for functional annotation of proteins. It uses RPS-BLAST to search protein sequences against position-specific scoring matrices (PSSMs) from the CDD database. PSSMs offer higher sensitivity for detecting distant homologs than searches against individual protein sequences, while remaining substantially faster than HMM-based annotation.
While the CD-Search web interface is convenient for small queries, it is not well suited for large-scale annotation. local-cd-search enables local protein annotation and automates the entire workflow: downloading PSSM databases from CDD, running RPS-BLAST, post-processing results with rpsbproc to filter hits using CDD's curated bit-score thresholds.
Installation
The easiest way to install local-cd-search is with Pixi, which will manage dependencies automatically and make local-cd-search available for execution from anywhere.
pixi global install -c conda-forge -c bioconda local-cd-search
Alternatively, you can install it from PyPI. In this case, rpsblast and rpsbproc must be installed separately. To install local-cd-search from PyPI using uv, run:
uv tool install local-cd-search
Quick start
1. Download databases
Download the full CDD reference database (recommended for comprehensive annotation):
local-cd-search download database cdd
Or download specific subsets:
# COG database only (for COG functional annotation)
local-cd-search download database cog
# Multiple databases
local-cd-search download database cog pfam tigr
Available databases:
cddcdd_ncbicogkogpfamprksmarttigr
2. Annotate proteins
local-cd-search annotate proteins.faa results.tsv database
The tool auto-detects which databases are available and uses them for annotation.
Output
local-cd-search produces a tab-separated file with hits filtered by CDD's curated bit-score thresholds:
| Column | Description |
|---|---|
| query | Protein identifier |
| hit_type | Specific, Non-specific, or Superfamily |
| pssm_id | CDD PSSM identifier |
| from | Start position in query |
| to | End position in query |
| evalue | E-value |
| bitscore | Bit score |
| accession | Domain accession |
| short_name | Domain short name (e.g., COG0001) |
Usage
download subcommand
local-cd-search download [OPTIONS] DB_DIR DATABASE...
| Option | Short | Argument | Description | Default |
|---|---|---|---|---|
--force |
flag | Force re-download even if files are already present. | ||
--quiet |
flag | Suppress non-error console output. | ||
--help |
-h |
flag | Show help message and exit. |
annotate subcommand
local-cd-search annotate [OPTIONS] INPUT_FILE OUTPUT_FILE DB_DIR
| Option | Short | Argument | Description | Default |
|---|---|---|---|---|
--evalue |
-e |
FLOAT (≥ 0) |
Maximum allowed E-value for hits. | 0.01 |
--ns |
flag | Include non-specific hits in the output results table. | ||
--sf |
flag | Include superfamily hits in the output results table. | ||
--threads |
INTEGER |
Number of threads to use for rpsblast. |
0 |
|
--data-mode |
-m |
std | rep | full |
Redundancy level of domain hit data passed to rpsbproc: rep (best model per region of the query), std (best model per source per region), full (all models meeting E-value significance). |
std |
--tmp-dir |
DIRECTORY |
Directory to store intermediate files. | tmp |
|
--keep-tmp |
flag | Keep intermediate temporary files. | ||
--quiet |
flag | Suppress non-error console output. | ||
--help |
-h |
flag | Show help message and exit. |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file local_cd_search-0.1.0.tar.gz.
File metadata
- Download URL: local_cd_search-0.1.0.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82adadd422fd62067603b81b08ab3c974015326bba6a8afe2094f2e0c5eddebf
|
|
| MD5 |
5b5683dae8c403cf1e0e38c62de338e9
|
|
| BLAKE2b-256 |
741eea790b3ff13aa45da5a51e15f8e07ed00d20e68df7f472fa014b879f69f6
|
File details
Details for the file local_cd_search-0.1.0-py3-none-any.whl.
File metadata
- Download URL: local_cd_search-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ceb016855306e7f59ddd100d3c08295755a15a3030cee87a6aeebcc5a86c328
|
|
| MD5 |
8b95828899db9c9ab0a4fd87ef875e02
|
|
| BLAKE2b-256 |
555201285a0e2c42b983a97609565370242b6bc45f100108b23ae339deeed5c5
|