Skip to main content

Package to load genes from GENCODE GTF files

Project description

GENCODEGenes

This package loads genes from GENCODE GTF/GFF files, groups transcripts by gene, and provides methods for transcripts, so you can find exon coordinates, CDS distances and sequences.

Install

pip install gencodegenes

Usage

from gencodegenes import Gencode

gencode = Gencode(GTF_PATH)
# full function arguments are Gencode(gtf_path, fasta_path=None, coding_only=True)
#  - fasta_path: pass in path to fasta file to get gene transcripts with sequence
#  - coding_only: pass in False to include all transcripts, not just protein coding

# get gene by HGNC symbol
gene = gencode['OR5A1']
transcripts = gene.transcripts
canonical = gene.canonical  # picks MANE transcript if available, if none named
                            # as MANE, picks the one tagged as appris_principal
                            # (or longest CDS if multiple), if none tagged, picks
                            # the longest protein coding, if none protein coding,
                            # picks the longest cDNA 
gene.start, gene.end, gene.chrom, gene.strand, gene.symbol # other attributes available


# find gene nearest a genomic position, or overlapping a genomic region
gencode.nearest('chr1', 1000000)
gencode.in_region('chr1', 1000000, 2000000)

# and the transcript has a bunch of methods
tx = gene.canonical
tx.in_exons(pos)                         # check if pos in exons
tx.in_coding_region(pos)                 # check if pos in CDS
tx.get_coding_distance(pos)              # get distance in CDS to CDS start
tx.get_closest_exon(pos)                 # find exon closest to position
tx.get_position_on_chrom(cds_pos)        # convert CDS pos to genomic pos
tx.get_codon_info(pos)                   # get info about codon for a site
tx.get_codon_number_for_cds_pos(cds_pos) # convert CDS pos to codon number
tx.translate(seq)                        # translate DNA to AA (if opened with Fasta)

# the transcript also has associated data fields
tx.name         # transcript ID
tx.chrom        # transcript chromosome
tx.start        # transcript start (TSS)
tx.end          # transcript end
tx.cds_start    # CDS start position
tx.cds_end      # CDS end position 
tx.type         # transcript type e.g. protein_coding
tx.strand       # strand (+ or -)
tx.exons        # list of exon coordinates
tx.cds          # list of CDS coordinates
tx.cds_sequence # get cDNA sequence (if Gencode was opened with fasta)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gencodegenes-1.1.2.tar.gz (313.9 kB view hashes)

Uploaded Source

Built Distributions

gencodegenes-1.1.2-cp312-cp312-win_amd64.whl (548.4 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

gencodegenes-1.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

gencodegenes-1.1.2-cp312-cp312-macosx_10_9_x86_64.whl (561.1 kB view hashes)

Uploaded CPython 3.12 macOS 10.9+ x86-64

gencodegenes-1.1.2-cp311-cp311-win_amd64.whl (547.4 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

gencodegenes-1.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

gencodegenes-1.1.2-cp311-cp311-macosx_10_9_x86_64.whl (560.1 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

gencodegenes-1.1.2-cp310-cp310-win_amd64.whl (546.9 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

gencodegenes-1.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

gencodegenes-1.1.2-cp310-cp310-macosx_10_9_x86_64.whl (559.4 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

gencodegenes-1.1.2-cp39-cp39-win_amd64.whl (547.1 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

gencodegenes-1.1.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

gencodegenes-1.1.2-cp39-cp39-macosx_10_9_x86_64.whl (559.9 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

gencodegenes-1.1.2-cp38-cp38-win_amd64.whl (547.5 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

gencodegenes-1.1.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

gencodegenes-1.1.2-cp38-cp38-macosx_10_9_x86_64.whl (560.4 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page