Skip to main content

A package used to retrieve exon for protein sequences from RefSeqGene database

Project description

CODX


codx is a python package that allow retrieval of exons data from NCBI RefSeqGene database.

Installation

pip install codx

Usage Python Package


The package uses gene id in order to retrieve exons data from NCBI RefSeqGene database. The gene id can be obtained from the Uniprot database using the accession id of the gene. The get_geneids_from_uniprot function can be used to obtain the gene id from RefSeqGene database of NCBI.

# if you only have accession id, you must first use the get_geneids_from_uniprot function to get the gene id from Uniprot
from codx.components import get_geneids_from_uniprot

gene_ids = get_geneids_from_uniprot(["P35568", "P05019", "Q99490", "Q8NEJ0", "Q13322", "Q15323"])
# the result will be a set of gene ids that can be obtained from the Uniprot database using the list of Uniprot accession above
# Import the create_db function to create a sqlite3 database with gene and exon data from NCBI
from codx.components import create_db


# 120892 is the gene id for LRRK2 gene
db = create_db(["120892"], entrez_email="your@email.com") # You need to provide an email address to use the NCBI API

# From the database object, you can retrieve a gene object using its gene name
gene = db.get_gene("LRRK2")

# From the gene objects you can retrieve exons data from the blocks attribute each exon object has its start and end location as well as the associated sequence
for exon in gene.blocks:
    print(exon.start, exon.end, exon.sequence)

# Using the gene object it is also possible to create all possible ordered combinations of exons
# This will be a generator object that yield a SeqRecord object for each combination
# There however may be a lot of combinations so depending on the gene, you may not want to use this with a very large gene unless there are no other options
for exon_combination in gene.shuffle_blocks():
    print(exon_combination)

# To create six frame translation of any sequence, you can use the three_frame_translation function twice, one with and one without the reverse complement option enable
# Each output is a dictionary with the translatable sequence as value and the frame as key
from codx.components import three_frame_translation
for exon_combination in gene.shuffle_blocks():
    three_frame = three_frame_translation(exon_combination.seq, only_start_at_atg=True)
    three_frame_complement = three_frame_translation(exon_combination.seq, only_start_at_atg=True, reverse_complement=True)

Usage Command Line


In addition to the python API, installation also provide a cli interface that can be used for the same purpose

Usage: codx [OPTIONS] IDS

Options:
  -o, --output TEXT              Output file
  -i, --include-intron           Include intron
  -u, --uniprot                  Input is Uniprot accession ids
  -t, --translate                Translate to protein
  -3, --three-frame-translation  Translate to protein in 3 frames
  -6, --six-frame-translation    Translate to protein in 6 frames (3 forward
                                 and 3 reverse complement)
  --help                         Show this message and exit.

Here IDS positional argument are a list of gene ids or uniprot accession ids delimited by ,.

Example usage can be seen below

codx -o output.fasta -u P35568,P05019,Q99490,Q8NEJ0,Q13322,Q15323
codx -o output.fasta 1190,120892

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codx-0.1.2.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

codx-0.1.2-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file codx-0.1.2.tar.gz.

File metadata

  • Download URL: codx-0.1.2.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.0 CPython/3.11.2 Windows/10

File hashes

Hashes for codx-0.1.2.tar.gz
Algorithm Hash digest
SHA256 7c8ff65394575d75b362bfc4f28afd34ed5500ac4dc58d5142bb9566b19f6e33
MD5 e0c482876c0129af57b58a539d1a25a3
BLAKE2b-256 f401e6f2cd11ef72422f02330d05c9bf7fde33e99b4413256046aec3dbe162c4

See more details on using hashes here.

Provenance

File details

Details for the file codx-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: codx-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.0 CPython/3.11.2 Windows/10

File hashes

Hashes for codx-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ea449213c5f99f30c53ccb66dd7ab3dcd0eccaebd1fa1fa0e7460d39049e3158
MD5 f64e3e346d018d9e7d35677b24ecb045
BLAKE2b-256 217f5d8137e9f691a39e40a71caef62bdd2e894ba24f574f0761b5383a43adfa

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page