Complete GenBank parser
Project description
GBcrawler
GBcrawler is a complete GenBank parser in Python 3 meant to be used by other applications.
Generates a Python object of GBcrawler class that stores all relevant information within a GenBank file.
Table of Contents
Features
This script was made using the information from:
Installation
Python 3 is requiered.
Copy GBcrawler.py to the folder of your project and import it as explained in Usage
Usage
To get the GenBank data, create an object with the GenBank filename as a parameter:
import GBcrawler from GB crawler
GBobject = GBcrawler("tth1.gb")
The following data can be adquired using: attributes or methods
Attribute references:
sequenceID
returns sequence identificationsequenceLength
returns length of sequencestrand
returns the strand typemoleculeType
returns molecule typedivision
returns divison codemodDate
returns datedefinition
returns definitionaccession
returns accesionversion
returns versionreferenceList
returns a list, each element is a referencecomment
returns all the comments as a stringfeatureList
returns a list of GBfeatues objectsequenceList
returns the sequence as a list, (see methods to get the sequence as a string)baseCount
returns dictionary with nucleotide counts
Methods:
getSequence()
returns sequence as a string
The featureList is composed of GBfeature objects and data can be adquired using the following attribute references:
begin
returns sequence identificationend
returns length of sequencetype
returns the type of the feauture (gene, CDS, ...)qualifierDict
returns a Dictionary with keys and values for each qualifier
Discrepancies
The last Flat File release 220.0 has a set of features that is different from the feature table in INSDC. The release indicates "Any discrepancy between the abbreviated feature table description of these release notes and the complete documentation on the Web should be resolved in favor of the version at the above URL."
At the moment, both sets of features will be used to parse the GenBank files, until a large batch of GenBank files can be tested and check how many files uses the "non-standard" features
Future features
- export to FASTA
- check features for mandatory qualifiers
- faster performance
- tag features by its locus_tag
- use additional info for the features (beyond, between bases, etc)
- create a Reference class to better reference data management
- improve ACCESION to return a list
- improve SOURCE parsing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for gbcrawler-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14d07a4949ed1c78a6938ce2d4cc6aa593b939411c96955510543485686eaf6b |
|
MD5 | 9af3969bbf67a69a55dc00e183577e3e |
|
BLAKE2b-256 | c8b8a7621192ecf7034a98e610eeb8d0bce6f85bc453bf1bc7e4dfebefc66fa9 |