Complete GenBank parser
Project description
GBcrawler
GBcrawler is a complete GenBank parser in Python 3 meant to be used by other applications.
Generates a Python object of GBcrawler class that stores all relevant information within a GenBank file.
Table of Contents
Features
This script was made using the information from:
Installation
Python 3 is requiered.
Copy GBcrawler.py to the folder of your project and import it as explained in Usage
Usage
To get the GenBank data, create an object with the GenBank filename as a parameter:
import GBcrawler from GB crawler
GBobject = GBcrawler("tth1.gb")
The following data can be adquired using: attributes or methods
Attribute references:
sequenceIDreturns sequence identificationsequenceLengthreturns length of sequencestrandreturns the strand typemoleculeTypereturns molecule typedivisionreturns divison codemodDatereturns datedefinitionreturns definitionaccessionreturns accesionversionreturns versionreferenceListreturns a list, each element is a referencecommentreturns all the comments as a stringfeatureListreturns a list of GBfeatues objectsequenceListreturns the sequence as a list, (see methods to get the sequence as a string)baseCountreturns dictionary with nucleotide counts
Methods:
getSequence()returns sequence as a string
The featureList is composed of GBfeature objects and data can be adquired using the following attribute references:
beginreturns sequence identificationendreturns length of sequencetypereturns the type of the feauture (gene, CDS, ...)qualifierDictreturns a Dictionary with keys and values for each qualifier
Discrepancies
The last Flat File release 220.0 has a set of features that is different from the feature table in INSDC. The release indicates "Any discrepancy between the abbreviated feature table description of these release notes and the complete documentation on the Web should be resolved in favor of the version at the above URL."
At the moment, both sets of features will be used to parse the GenBank files, until a large batch of GenBank files can be tested and check how many files uses the "non-standard" features
Future features
- export to FASTA
- check features for mandatory qualifiers
- faster performance
- tag features by its locus_tag
- use additional info for the features (beyond, between bases, etc)
- create a Reference class to better reference data management
- improve ACCESION to return a list
- improve SOURCE parsing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gbcrawler-0.3.0.tar.gz.
File metadata
- Download URL: gbcrawler-0.3.0.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2f06b4451b25ad891aeaf40e91777603140f289771a6306b1e6b61f37ad2805
|
|
| MD5 |
650f1fc5f24b7f481603a2e38fc48a84
|
|
| BLAKE2b-256 |
51b128e36961e435bf5e5652c1305b7c118834dc270970eced5a83ad62d9f1d2
|
File details
Details for the file gbcrawler-0.3.0-py3-none-any.whl.
File metadata
- Download URL: gbcrawler-0.3.0-py3-none-any.whl
- Upload date:
- Size: 21.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14d07a4949ed1c78a6938ce2d4cc6aa593b939411c96955510543485686eaf6b
|
|
| MD5 |
9af3969bbf67a69a55dc00e183577e3e
|
|
| BLAKE2b-256 |
c8b8a7621192ecf7034a98e610eeb8d0bce6f85bc453bf1bc7e4dfebefc66fa9
|