Skip to main content

A package used to retrieve exon for protein sequences from RefSeqGene database

Project description

CODX

codx is a Python package that allows retrieval of exon data from the NCBI RefSeqGene database.

Installation

pip install codx

Usage Python Package

The package uses gene IDs to retrieve exon data from the NCBI RefSeqGene database. Gene IDs can be obtained from the UniProt database using the accession ID of the gene. The get_geneids_from_uniprot function can be used to obtain the gene ID from the RefSeqGene database of NCBI.

Example Usage

Retrieve Gene IDs from UniProt

from codx.components import get_geneids_from_uniprot

# Example UniProt accession IDs
accession_ids = ["P35568", "P05019", "Q99490", "Q8NEJ0", "Q13322", "Q15323"]
gene_ids = get_geneids_from_uniprot(accession_ids)
print(gene_ids)  # Output: Set of gene IDs

Create a Database and Retrieve Gene Data

from codx.components import create_db

# Create a database with gene and exon data from NCBI
db = create_db(["120892"], entrez_email="your@email.com")  # Provide an email address for NCBI API

# Retrieve a gene object using its gene name
gene = db.get_gene("LRRK2")

# Retrieve exon data from the gene object
for exon in gene.blocks:
    print(exon.start, exon.end, exon.sequence)

# Generate all possible ordered combinations of exons
for exon_combination in gene.shuffle_blocks():
    print(exon_combination)

Six-Frame Translation of Sequences

from codx.components import three_frame_translation

# Generate six-frame translation of exon combinations
for exon_combination in gene.shuffle_blocks():
    three_frame = three_frame_translation(exon_combination.seq, only_start_with_codons=["ATG"])
    three_frame_complement = three_frame_translation(exon_combination.seq, only_start_with_codons=["ATG"], reverse=True)
    print(three_frame)
    print(three_frame_complement)

Usage Command Line

In addition to the Python API, the package provides a CLI interface for the same purpose.

CLI Usage

Usage: codx [OPTIONS] IDS

Options:
  -o, --output TEXT              Output file
  -i, --include-intron           Include intron
  -u, --uniprot                  Input is UniProt accession IDs
  -t, --translate                Translate to protein
  -3, --three-frame-translation  Translate to protein in 3 frames
  -6, --six-frame-translation    Translate to protein in 6 frames (3 forward and 3 reverse complement)
  --help                         Show this message and exit.

Example CLI Usage

Retrieve data using UniProt accession IDs:

codx -o output.fasta -u P35568,P05019,Q99490,Q8NEJ0,Q13322,Q15323

Retrieve data using gene IDs:

codx -o output.fasta 1190,120892

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codx-0.1.4.tar.gz (8.8 kB view hashes)

Uploaded Source

Built Distribution

codx-0.1.4-py3-none-any.whl (10.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page