Skip to main content

Complete GenBank parser

Project description

GBcrawler

GBcrawler is a complete GenBank parser in Python 3 meant to be used by other applications.

Generates a Python object of GBcrawler class that stores all relevant information within a GenBank file.

Table of Contents

Features

Installation

Usage

Discrepancies

Future features

Features

This script was made using the information from:

  • NCBI-GenBank Flat File Release available at NCBI
  • Features table available at INSDC.

Installation

Python 3 is requiered.

Copy GBcrawler.py to the folder of your project and import it as explained in Usage

Usage

To get the GenBank data, create an object with the GenBank filename as a parameter:

import GBcrawler from GB crawler
GBobject = GBcrawler("tth1.gb")

The following data can be adquired using: attributes or methods

Attribute references:

  • sequenceID returns sequence identification
  • sequenceLength returns length of sequence
  • strand returns the strand type
  • moleculeType returns molecule type
  • division returns divison code
  • modDate returns date
  • definition returns definition
  • accession returns accesion
  • version returns version
  • referenceList returns a list, each element is a reference
  • comment returns all the comments as a string
  • featureList returns a list of GBfeatues object
  • sequenceList returns the sequence as a list, (see methods to get the sequence as a string)
  • baseCount returns dictionary with nucleotide counts

Methods:

  • getSequence() returns sequence as a string

The featureList is composed of GBfeature objects and data can be adquired using the following attribute references:

  • begin returns sequence identification
  • end returns length of sequence
  • type returns the type of the feauture (gene, CDS, ...)
  • qualifierDict returns a Dictionary with keys and values for each qualifier

Discrepancies

The last Flat File release 220.0 has a set of features that is different from the feature table in INSDC. The release indicates "Any discrepancy between the abbreviated feature table description of these release notes and the complete documentation on the Web should be resolved in favor of the version at the above URL."

At the moment, both sets of features will be used to parse the GenBank files, until a large batch of GenBank files can be tested and check how many files uses the "non-standard" features

Future features

  • export to FASTA
  • check features for mandatory qualifiers
  • faster performance
  • tag features by its locus_tag
  • use additional info for the features (beyond, between bases, etc)
  • create a Reference class to better reference data management
  • improve ACCESION to return a list
  • improve SOURCE parsing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gbcrawler-0.3.0.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gbcrawler-0.3.0-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file gbcrawler-0.3.0.tar.gz.

File metadata

  • Download URL: gbcrawler-0.3.0.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.0

File hashes

Hashes for gbcrawler-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f2f06b4451b25ad891aeaf40e91777603140f289771a6306b1e6b61f37ad2805
MD5 650f1fc5f24b7f481603a2e38fc48a84
BLAKE2b-256 51b128e36961e435bf5e5652c1305b7c118834dc270970eced5a83ad62d9f1d2

See more details on using hashes here.

File details

Details for the file gbcrawler-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: gbcrawler-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 21.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.0

File hashes

Hashes for gbcrawler-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 14d07a4949ed1c78a6938ce2d4cc6aa593b939411c96955510543485686eaf6b
MD5 9af3969bbf67a69a55dc00e183577e3e
BLAKE2b-256 c8b8a7621192ecf7034a98e610eeb8d0bce6f85bc453bf1bc7e4dfebefc66fa9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page