Skip to main content

CD-HIT cluster parser

Project description

cdhit-parser

CD-HIT file reader.

Examples

Basic usage

input = "cluster.fa.clstr"
for cluster in read_cdhit(input):
    print(f"{cluster.name} refSequence={cluster.refname} size={len(cluster)}")

    for member in cluster.sequences:
        print(f" {member.name} ({member.length}) identity={member.identity}% {'(Reference sequence)' if member.is_ref else ''}")

Load all clusters in to a list:

# Load all clusters to a list
clusters = read_cdhit(input).read_items()

Install

pip install cdhit-reader

Demo applications

Cluster stats

The module ships a demo program called cdhit-reader.py.

cdhit-parser -h

Compare two fasta files

:warning: This requires cd-hit installed and available in the system path.

cdhit-compare allows to compare two fasta files and print the sequences that are in common, those which are only present in one of the files or those which are redundant.

cdhit-compare --help

Example:

cdhit-compare data/input1.faa data/input2.faa

will produce:

input1_ IBJJOHBJ_00007
input2_ IBJJOHBJ_00007
input2_ IBJJOHBJ_00002
both    IBJJOHBJ_00003:_IBJJOHBJ_00003
both    IBJJOHBJ_00005:_IBJJOHBJ_00005
both    IBJJOHBJ_00004:_IBJJOHBJ_00004
dupl    IBJJOHBJ_00001:IBJJOHBJ_000F1

where records starting with file1 or file2 are only present in one of the files, records starting with both are present in both files (one per file), records starting with dupl are duplicates (two in one of the files), and records starting with multi are present multiple times in at least one of the datasets

Author

License

This project is licensed under the MIT License.

Acknowledgments

This module was based on fasta_reader by Danilo Horta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdhit-reader-0.0.6.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

cdhit_reader-0.0.6-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file cdhit-reader-0.0.6.tar.gz.

File metadata

  • Download URL: cdhit-reader-0.0.6.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cdhit-reader-0.0.6.tar.gz
Algorithm Hash digest
SHA256 fd1f5c6737bbcab2fffc71031cd9b48ac0ea11f68c5745ad39067b9afbebcf55
MD5 d1d126307ae4a79cdd3195deb6103cf9
BLAKE2b-256 6f70f21474be06d2480adec172fa4a0311f36591c5f1b109779516c064c98f8a

See more details on using hashes here.

File details

Details for the file cdhit_reader-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: cdhit_reader-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for cdhit_reader-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 dcd7e376467600e59f03c5e7550efe01bd1bacb56e573a6c2a26cb2670a21a36
MD5 02c0df63993e339274128041e8ec6648
BLAKE2b-256 d2450e4947337e52acf5e60eefc0bc64b0d79bf52158f49aaadcfb4a79f568b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page