Skip to main content

Protein annotation using local PSSM databases from CDD

Project description

local-cd-search

A command-line tool for local protein domain annotation using NCBI's Conserved Domain Database (CDD).

Background

NCBI CD-Search is a widely used tool for functional annotation of proteins. It uses RPS-BLAST to search protein sequences against position-specific scoring matrices (PSSMs) from the CDD database. PSSMs offer higher sensitivity for detecting distant homologs than searches against individual protein sequences, while remaining substantially faster than HMM-based annotation.

While the CD-Search web interface is convenient for small queries, it is not well suited for large-scale annotation. local-cd-search enables local protein annotation and automates the entire workflow: downloading PSSM databases from CDD, running RPS-BLAST, post-processing results with rpsbproc to filter hits using CDD's curated bit-score thresholds.

Installation

The easiest way to install local-cd-search is with Pixi, which will manage dependencies automatically and make local-cd-search available for execution from anywhere.

pixi global install -c conda-forge -c bioconda local-cd-search

Alternatively, you can install it from PyPI. In this case, rpsblast and rpsbproc must be installed separately. To install local-cd-search from PyPI using uv, run:

uv tool install local-cd-search

Quick start

1. Download databases

Download the full CDD reference database (recommended for comprehensive annotation):

local-cd-search download database cdd

Or download specific subsets:

# COG database only (for COG functional annotation)
local-cd-search download database cog

# Multiple databases
local-cd-search download database cog pfam tigr

Available databases:

  • cdd
  • cdd_ncbi
  • cog
  • kog
  • pfam
  • prk
  • smart
  • tigr

2. Annotate proteins

local-cd-search annotate proteins.faa results.tsv database

The tool auto-detects which databases are available and uses them for annotation.

Output

local-cd-search produces a tab-separated file with hits filtered by CDD's curated bit-score thresholds:

Column Description
query Protein identifier
hit_type Specific, Non-specific, or Superfamily
pssm_id CDD PSSM identifier
from Start position in query
to End position in query
evalue E-value
bitscore Bit score
accession Domain accession
short_name Domain short name (e.g., COG0001)

Usage

download subcommand

local-cd-search download [OPTIONS] DB_DIR DATABASE...
Option Short Argument Description Default
--force flag Force re-download even if files are already present.
--quiet flag Suppress non-error console output.
--help -h flag Show help message and exit.

annotate subcommand

local-cd-search annotate [OPTIONS] INPUT_FILE OUTPUT_FILE DB_DIR
Option Short Argument Description Default
--evalue -e FLOAT (≥ 0) Maximum allowed E-value for hits. 0.01
--ns flag Include non-specific hits in the output results table.
--sf flag Include superfamily hits in the output results table.
--threads INTEGER Number of threads to use for rpsblast. 0
--data-mode -m std | rep | full Redundancy level of domain hit data passed to rpsbproc: rep (best model per region of the query), std (best model per source per region), full (all models meeting E-value significance). std
--tmp-dir DIRECTORY Directory to store intermediate files. tmp
--keep-tmp flag Keep intermediate temporary files.
--quiet flag Suppress non-error console output.
--help -h flag Show help message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

local_cd_search-0.1.0.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

local_cd_search-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file local_cd_search-0.1.0.tar.gz.

File metadata

  • Download URL: local_cd_search-0.1.0.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for local_cd_search-0.1.0.tar.gz
Algorithm Hash digest
SHA256 82adadd422fd62067603b81b08ab3c974015326bba6a8afe2094f2e0c5eddebf
MD5 5b5683dae8c403cf1e0e38c62de338e9
BLAKE2b-256 741eea790b3ff13aa45da5a51e15f8e07ed00d20e68df7f472fa014b879f69f6

See more details on using hashes here.

File details

Details for the file local_cd_search-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for local_cd_search-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ceb016855306e7f59ddd100d3c08295755a15a3030cee87a6aeebcc5a86c328
MD5 8b95828899db9c9ab0a4fd87ef875e02
BLAKE2b-256 555201285a0e2c42b983a97609565370242b6bc45f100108b23ae339deeed5c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page