Skip to main content

A Python interface for CD-HIT package.

Project description

py-cdhit

PyPI version Downloads Codacy Badge

A Python package for CD-HIT, clustering protein or nucleotide sequences.

This package provides a Python interface for CD-HIT (Cluster Database at High Identity with Tolerance), which has programs for clustering biological sequences with a very fast speed. Specifically, this package contains functions that run commands and read the output files, thus reducing the overhead of switching between languages and writing parsing code when using Python in the data analysis workflows.

Read the documentation here.

Usage

A simple example on Linux is provided below. See the notebook for more details.

from pycdhit import cd_hit, read_clstr

res = cd_hit(
    i="./docs/examples/apd.fasta",
    o="./docs/examples/out",
    c=0.7,
    d=0,
    sc=1,
)

df_clstr = read_clstr("./docs/examples/out.clstr")

Please visit CD-HIT's documentations for its installation and the options of commands.

Installation

First Install CD-HIT. Mamba is recommended. For example, to create an environment and install:

mamba create -n myenv python=3.10
mamba activate myenv
mamba install -c bioconda cd-hit cd-hit-auxtools

Then install this package from PyPI:

pip install py-cdhit

Development

Install from source after git clone:

cd py-cdhit
pip install -e '.[dev]'
pip install -r docs/requirements.txt
python -m pytest --cov-report term-missing --cov=pycdhit tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_cdhit-1.1.5.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_cdhit-1.1.5-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file py_cdhit-1.1.5.tar.gz.

File metadata

  • Download URL: py_cdhit-1.1.5.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for py_cdhit-1.1.5.tar.gz
Algorithm Hash digest
SHA256 46884fb999c079966e6e1988517637f3d018939bc982aa5269a13521ffc632eb
MD5 891b47d7d69bd52a426d317cedc5c208
BLAKE2b-256 a7b0f2181de9a66faf7cd0749b8b093f497f6e5c6099794099f52262c41adf16

See more details on using hashes here.

File details

Details for the file py_cdhit-1.1.5-py3-none-any.whl.

File metadata

  • Download URL: py_cdhit-1.1.5-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for py_cdhit-1.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 43ba40c9354600fafc27af5a1e1f2cbb59c41f477b7f2a9e71fcd3aea88cb443
MD5 02013119b957fb9b55c30a53d6ebeb2c
BLAKE2b-256 d1a0bbfeaf9c73e0b657344b4912c57441a64d97806e1d99f7a899260c14d4e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page