CD-HIT cluster parser
Project description
cdhit-parser
CD-HIT file reader.
Examples
Basic usage
input = "cluster.fa.clstr"
for cluster in read_cdhit(input):
print(f"{cluster.name} refSequence={cluster.refname} size={len(cluster)}")
for member in cluster.sequences:
print(f" {member.name} ({member.length}) identity={member.identity}% {'(Reference sequence)' if member.is_ref else ''}")
Load all clusters in to a list:
# Load all clusters to a list
clusters = read_cdhit(input).read_items()
Install
pip install cdhit-reader
Demo applications
Cluster stats
The module ships a demo program called cdhit-reader.py
.
Compare two fasta files
:warning: This requires cd-hit installed and available in the system path.
cdhit-compare
allows to compare two fasta files and print the sequences that are in common, those which are only
present in one of the files or those which are redundant.
Example:
cdhit-compare data/input1.faa data/input2.faa
will produce:
input1_ IBJJOHBJ_00007
input2_ IBJJOHBJ_00007
input2_ IBJJOHBJ_00002
both IBJJOHBJ_00003:_IBJJOHBJ_00003
both IBJJOHBJ_00005:_IBJJOHBJ_00005
both IBJJOHBJ_00004:_IBJJOHBJ_00004
dupl IBJJOHBJ_00001:IBJJOHBJ_000F1
where records starting with file1 or file2 are only present in one of the files, records starting with both are present in both files (one per file), records starting with dupl are duplicates (two in one of the files), and records starting with multi are present multiple times in at least one of the datasets
Author
License
This project is licensed under the MIT License.
Acknowledgments
This module was based on fasta_reader by Danilo Horta
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cdhit-reader-0.0.6.tar.gz
.
File metadata
- Download URL: cdhit-reader-0.0.6.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd1f5c6737bbcab2fffc71031cd9b48ac0ea11f68c5745ad39067b9afbebcf55 |
|
MD5 | d1d126307ae4a79cdd3195deb6103cf9 |
|
BLAKE2b-256 | 6f70f21474be06d2480adec172fa4a0311f36591c5f1b109779516c064c98f8a |
File details
Details for the file cdhit_reader-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: cdhit_reader-0.0.6-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dcd7e376467600e59f03c5e7550efe01bd1bacb56e573a6c2a26cb2670a21a36 |
|
MD5 | 02c0df63993e339274128041e8ec6648 |
|
BLAKE2b-256 | d2450e4947337e52acf5e60eefc0bc64b0d79bf52158f49aaadcfb4a79f568b1 |