A package used to retrieve exon for protein sequences from RefSeqGene database
Project description
CODX
codx
is a Python package that allows retrieval of exon data from the NCBI RefSeqGene database.
Installation
pip install codx
Usage Python Package
The package uses gene IDs to retrieve exon data from the NCBI RefSeqGene database. Gene IDs can be obtained from the UniProt database using the accession ID of the gene. The get_geneids_from_uniprot
function can be used to obtain the gene ID from the RefSeqGene database of NCBI.
Example Usage
Retrieve Gene IDs from UniProt
from codx.components import get_geneids_from_uniprot
# Example UniProt accession IDs
accession_ids = ["P35568", "P05019", "Q99490", "Q8NEJ0", "Q13322", "Q15323"]
gene_ids = get_geneids_from_uniprot(accession_ids)
print(gene_ids) # Output: Set of gene IDs
Create a Database and Retrieve Gene Data
from codx.components import create_db
# Create a database with gene and exon data from NCBI
db = create_db(["120892"], entrez_email="your@email.com") # Provide an email address for NCBI API
# Retrieve a gene object using its gene name
gene = db.get_gene("LRRK2")
# Retrieve exon data from the gene object
for exon in gene.blocks:
print(exon.start, exon.end, exon.sequence)
# Generate all possible ordered combinations of exons
for exon_combination in gene.shuffle_blocks():
print(exon_combination)
Six-Frame Translation of Sequences
from codx.components import three_frame_translation
# Generate six-frame translation of exon combinations
for exon_combination in gene.shuffle_blocks():
three_frame = three_frame_translation(exon_combination.seq, only_start_with_codons=["ATG"])
three_frame_complement = three_frame_translation(exon_combination.seq, only_start_with_codons=["ATG"], reverse=True)
print(three_frame)
print(three_frame_complement)
Usage Command Line
In addition to the Python API, the package provides a CLI interface for the same purpose.
CLI Usage
Usage: codx [OPTIONS] IDS
Options:
-o, --output TEXT Output file
-i, --include-intron Include intron
-u, --uniprot Input is UniProt accession IDs
-t, --translate Translate to protein
-3, --three-frame-translation Translate to protein in 3 frames
-6, --six-frame-translation Translate to protein in 6 frames (3 forward and 3 reverse complement)
--help Show this message and exit.
Example CLI Usage
Retrieve data using UniProt accession IDs:
codx -o output.fasta -u P35568,P05019,Q99490,Q8NEJ0,Q13322,Q15323
Retrieve data using gene IDs:
codx -o output.fasta 1190,120892
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
codx-0.1.4.tar.gz
(8.8 kB
view hashes)
Built Distribution
codx-0.1.4-py3-none-any.whl
(10.2 kB
view hashes)