Skip to main content

access and query the UCSC database with an elegant and human readable Api

Project description

UCSC-Genomic-REST-Api-Wrapper

An open-source python package licensed under the MIT license, the package represents a python Api wrapper on the UCSC genomic database, which makes it much easier for researchers to access and query the database with an elegant and human readable Api

MIT License

About The Package

Project Proposal

Features

  • Expressive Api

  • Easy to use

  • Can be extended

  • Can be reused.

  • No boilerplate

Installation

Install ucsc with pip

  pip install ucsc-genomic-api

Documentation

Quick Introduction for busy developers

There are 6 primary classes in the package:

from ucsc.api import Hub, Genome, Track, TrackSchema, Chromosome, Sequence  

Each class has the following primary method:

# check documentation for required and optional parameters

className.get()  # Returns list of objects of the class

className.find()  # Find object by name

className.findBy()  # Find object by a specified attribute

className.exists()  # Check to see if an object exists

Then you can access the attributes of the object using . notation

className.attributeName # Returns attribute name

Usage guide

List of available hubs as python objects

from ucsc.api import Hub  

hubList = Hub.get()

Find hub by name, the function will return the result as an object or throws a not found exception

from ucsc.api import Hub  

hub = Hub.find('ALFA Hub') 

Find hub by given attribute, the function will return the result as an object or throws a not found exception

from ucsc.api import Hub  

hub = Hub.findBy('hubName','ALFA Hub') 

Get all genomes from specified hub object

from ucsc.api import Hub  

hub = Hub.find('ALFA Hub')

print(hub.genomes) # prints the list of all genomes in the given hub

Get all genomes from all UCSC Database

from ucsc.api import Genome 

genomesList = Genome.get() 

Find genome by name, the function will return the result as an object or throws a not found exception

from ucsc.api import Genome 

genome = Genome.find('ALFA Genome') 

Find genome by given attribute, the function will return the result as an object or throws a not found exception

from ucsc.api import Genome  

genome = Genome.findBy('genomeName','ALFA Genome') 

Check if genome exists in a UCSC database

from ucsc.api import Genome

Genome.exists('hg38') 

List the available tracks of the genome object

from ucsc.api import Genome 

genome = Genome.find('ALFA Genome') 
tracks = genome.tracks 

Find a specific track in a genome by name, the return type is an object of track

from ucsc.api import Track 

track = Track.find('hg38','knownGene') 

Or using a Genome object

from ucsc.api import Genome 

genome.findTrack('knownGene')

Find a specific track using a specific attribute, the return type is an object of track

from ucsc.api import Track

track = Track.findBy('hg38','longLabel','ClinGen curation ') 

Or using a Genome object

from ucsc.api import Genome 

genome.findTrackBy('longLabel','knownGene')

Check if track exists in a genome

from ucsc.api import Track 

Track.exists('hg38','knownGene') 

Or using a Genome object

from ucsc.api import Genome 

genome.isTrackExists('longLabel')

List the schema of specified track from given genome

from ucsc.api import Track 

track = Track.find('hg38','knownGene') 

trackSchema = track.schema('hg38')

Get track data depends on the parameter you will pass to the trackData function, listed below the possible parameter for each use case

from ucsc.api import Track 

track = Track.find('hg38','knownGene') # or you can get the track using the findBy method

# Get track data for specified track in UCSC database genome 

track.trackData(genome='hg38',track='gold',maxItemsOutput=100)

# Get track data for specified track and chromosome in UCSC database genome 

track.trackData(genome='hg38',track='gold',chrom='chrM')

# Get track data for specified track, chromosome and start,end coordinates in UCSC database genome 

track.trackData(genome='hg38',track='gold',chrom='chr1',start=47000,end=48000)

# Get track data for specified track in an assembly hub genome -
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'

track.trackData(genome='CAST_EiJ',track='assembly',hubUrl=hubUrl)


# Get track data for specified track and chromosome in an assembly hub genome 
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'

track.trackData(genome='CAST_EiJ',track='assembly',chrom='chr1',hubUrl=hubUrl)

# Get track data for specified track in a track hub -

hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'

track.trackData(genome='CAST_EiJ',track='ensGene',hubUrl=hubUrl)


# Get track data for specified track and chromosome in a track hub 

hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'

track.trackData(genome='CAST_EiJ',track='ensGene',chrom='chr1',hubUrl=hubUrl)


# Download track data for specified track, chromosome with start and end limits in an assembly hub genome -
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'

track.downloadData(genome='CAST_EiJ',track='ensGene',chrom='chr1',hubUrl=hubUrl,start=4321,end=5678)

# Download track data for specified track in a UCSC database genome 
track.downloadData(genome='galGal6',track='gc5BaseBw',maxItemsOutput=100)

List chromosomes from UCSC database genome

from ucsc.api import Chromosome 

chromosomes = Chromosome.get(genome='hg38')

List chromosomes from specified track in UCSC database genome

from ucsc.api import Chromosome

chromosomes = Chromosome.get(genome='hg38', track='knownGene')

# or 

from ucsc.api import Track,Genome

track = Track.find('hg38','knownGene') 

genome = Genome.find('ALFA Genome')

chromosomes = Chromosome.get(genome, track)

List chromosomes from assembly hub genome

from ucsc.api import Chromosome 

chromosomes = Chromosome.get(hub='ALFA Hub')

List chromosomes from specified track in assembly hub genome # Deprected!

from ucsc.api import Chromosome 

chromosomes = Chromosome.get('hg38', 'ALFA Hub','knownGene')

Find Specific chromosome

from ucsc.api import Chromosome 
chromosome = Chromosome.find(genome)

Find DNA sequence

The get method in Sequence class accepts multiple parameter, which depends on how do you want to retrieve the sequence object

from ucsc.api import Sequence 


# Get DNA sequence from specified chromosome in UCSC database genome -

sequence = Sequence.get(genome = 'hg38',chrom= 'chrM')

print(sequence.dna)

# Get DNA sequence from specified chromosome and start,end coordinates in UCSC database genome -

sequence = Sequence.get(genome= 'hg38',chrom= 'chrM',start=4321,end=5678)

print(sequence.dna)

# Get DNA sequence from a track hub where 'genome' is a UCSC database -

hubUrl = 'http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt';

sequence = Sequence.get(genome= 'mm10',chrom= 'chrM',hubUrl=hubUrl,start=4321,end=5678)

print(sequence.dna)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ucsc-genomic-api-1.0.2.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

ucsc_genomic_api-1.0.2-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file ucsc-genomic-api-1.0.2.tar.gz.

File metadata

  • Download URL: ucsc-genomic-api-1.0.2.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for ucsc-genomic-api-1.0.2.tar.gz
Algorithm Hash digest
SHA256 1cb5b6afc6784aecaf90a0cefd4da8cf5d4d7803d47dcc092dd1170c75227741
MD5 6ad15758556e91471c3b5d3d9db37332
BLAKE2b-256 676f3bbd76773c0427e7febbe1140e865b8eb615989ca30a5a4b22684ae2969b

See more details on using hashes here.

File details

Details for the file ucsc_genomic_api-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: ucsc_genomic_api-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for ucsc_genomic_api-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ac42ff2318fe9f2c2b1315df19696862a1339b33620305cc7cc4bd2149167e39
MD5 f9634407abffd4eeb2f7b0973efc0191
BLAKE2b-256 ad1807586e449e44c17852f7f017794e4bb99bb5fa622a3b9a65a661c1a56c30

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page