A way to extract specific information from CAZy
Project description
cazy-parser
A way to extract specific information from the Carbohydrate-Active enZYmes.
Make sure to visit and cite the CAZy website!
- http://www.cazy.org/
- Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The Carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495. [PMID: 24270786].
License: GNU GPLv3
RV Honorato. CAZy-parser a way to extract information from the Carbohydrate-Active enZYmes Database. The Journal of Open Source Software_, 1(8), dec 2016. 10.21105/joss.00053
Introduction
cazy-parser is a tool that extract information from CAZy in a more usable and readable format. Firstly, a script reads the HTML structure and creates a mirror of the database as a tab delimited file. Secondly, information is extracted from the database according to user inputted parameters and presented to the user as a set of accession codes.
Install / Upgrade
pip install --upgrade cazy-parser
Usage (internet connection required)
cazy-parser -h
usage: cazy-parser [-h] [-f FAMILY] [-s SUBFAMILY] [-c CHARACTERIZED] [-v] {GH,GT,PL,CA,AA}
positional arguments:
{GH,GT,PL,CA,AA}
optional arguments:
-h, --help show this help message and exit
-f FAMILY, --family FAMILY
-s SUBFAMILY, --subfamily SUBFAMILY
-c CHARACTERIZED, --characterized CHARACTERIZED
-v, --version show version
Example
Extract all fasta sequences from family 43 of Glycoside Hydrolase subfamily 1
$ cazy-parser GH -f 43 -s 1
[2022-05-26 16:39:21,511 91 INFO] ------------------------------------------
[2022-05-26 16:39:21,511 92 INFO]
[2022-05-26 16:39:21,511 93 INFO] ┌─┐┌─┐┌─┐┬ ┬ ┌─┐┌─┐┬─┐┌─┐┌─┐┬─┐
[2022-05-26 16:39:21,511 94 INFO] │ ├─┤┌─┘└┬┘───├─┘├─┤├┬┘└─┐├┤ ├┬┘
[2022-05-26 16:39:21,511 95 INFO] └─┘┴ ┴└─┘ ┴ ┴ ┴ ┴┴└─└─┘└─┘┴└─ v2.0.1
[2022-05-26 16:39:21,511 96 INFO]
[2022-05-26 16:39:21,511 97 INFO] ------------------------------------------
[2022-05-26 16:39:21,511 183 INFO] Fetching links for Glycoside-Hydrolases, url: http://www.cazy.org/Glycoside-Hydrolases.html
[2022-05-26 16:39:22,454 189 INFO] Only using links of family 43 subfamily 1
[2022-05-26 16:39:23,029 26 INFO] Dowloading 1415 fasta sequences...
[2022-05-26 16:40:32,187 51 INFO] Dumping fasta sequences to file GH43_1_26052022.fasta
This will generate the following file GH43_1_DDMMYYY.fasta
containing the fasta sequences.
To-do and how to contribute
Please refer to CONTRIBUTING 🤓
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cazy_parser-2.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | beff5ec5845e2f1dc45d43b584a003920a68f3cb1c880bd74fd576edb177b9fa |
|
MD5 | 4956403eb79d333861e1a25663787204 |
|
BLAKE2b-256 | d21de3d8748d82c4f995b1599d5a574b169ea7b174c1c2a382bc194f4628db06 |