access and query the UCSC database with an elegant and human readable Api
Project description
UCSC-Genomic-REST-Api-Wrapper
An open-source python package licensed under the MIT license, the package represents a python Api wrapper on the UCSC genomic database, which makes it much easier for researchers to access and query the database with an elegant and human readable Api
About The Package
Features
-
Expressive Api
-
Easy to use
-
Can be extended
-
Can be reused.
-
No boilerplate
Installation
Install ucsc with pip
pip install ucsc-genomic-api
Documentation
Quick Introduction for busy developers
There are 6 primary classes in the package:
from ucsc.api import Hub, Genome, Track, TrackSchema, Chromosome, Sequence
Each class has the following primary method:
# check documentation for required and optional parameters
className.get() # Returns list of objects of the class
className.find() # Find object by name
className.findBy() # Find object by a specified attribute
className.exists() # Check to see if an object exists
Then you can access the attributes of the object using . notation
className.attributeName # Returns attribute name
Usage guide
List of available hubs as python objects
from ucsc.api import Hub
hubList = Hub.get()
Find hub by name, the function will return the result as an object or throws a not found exception
from ucsc.api import Hub
hub = Hub.find('ALFA Hub')
Find hub by given attribute, the function will return the result as an object or throws a not found exception
from ucsc.api import Hub
hub = Hub.findBy('hubName','ALFA Hub')
Get all genomes from specified hub object
from ucsc.api import Hub
hub = Hub.find('ALFA Hub')
print(hub.genomes) # prints the list of all genomes in the given hub
Get all genomes from all UCSC Database
from ucsc.api import Genome
genomesList = Genome.get()
Find genome by name, the function will return the result as an object or throws a not found exception
from ucsc.api import Genome
genome = Genome.find('ALFA Genome')
Find genome by given attribute, the function will return the result as an object or throws a not found exception
from ucsc.api import Genome
genome = Genome.findBy('genomeName','ALFA Genome')
Check if genome exists in a UCSC database
from ucsc.api import Genome
Genome.exists('hg38')
List the available tracks of the genome object
from ucsc.api import Genome
genome = Genome.find('ALFA Genome')
tracks = genome.tracks
Find a specific track in a genome by name, the return type is an object of track
from ucsc.api import Track
track = Track.find('hg38','knownGene')
Or using a Genome object
from ucsc.api import Genome
genome.findTrack('knownGene')
Find a specific track using a specific attribute, the return type is an object of track
from ucsc.api import Track
track = Track.findBy('hg38','longLabel','ClinGen curation ')
Or using a Genome object
from ucsc.api import Genome
genome.findTrackBy('longLabel','knownGene')
Check if track exists in a genome
from ucsc.api import Track
Track.exists('hg38','knownGene')
Or using a Genome object
from ucsc.api import Genome
genome.isTrackExists('longLabel')
List the schema of specified track from given genome
from ucsc.api import Track
track = Track.find('hg38','knownGene')
trackSchema = track.schema('hg38')
Get track data depends on the parameter you will pass to the trackData function, listed below the possible parameter for each use case
from ucsc.api import Track
track = Track.find('hg38','knownGene') # or you can get the track using the findBy method
# Get track data for specified track in UCSC database genome
track.trackData(genome='hg38',track='gold',maxItemsOutput=100)
# Get track data for specified track and chromosome in UCSC database genome
track.trackData(genome='hg38',track='gold',chrom='chrM')
# Get track data for specified track, chromosome and start,end coordinates in UCSC database genome
track.trackData(genome='hg38',track='gold',chrom='chr1',start=47000,end=48000)
# Get track data for specified track in an assembly hub genome -
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='assembly',hubUrl=hubUrl)
# Get track data for specified track and chromosome in an assembly hub genome
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='assembly',chrom='chr1',hubUrl=hubUrl)
# Get track data for specified track in a track hub -
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='ensGene',hubUrl=hubUrl)
# Get track data for specified track and chromosome in a track hub
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.trackData(genome='CAST_EiJ',track='ensGene',chrom='chr1',hubUrl=hubUrl)
# Download track data for specified track, chromosome with start and end limits in an assembly hub genome -
hubUrl='http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt'
track.downloadData(genome='CAST_EiJ',track='ensGene',chrom='chr1',hubUrl=hubUrl,start=4321,end=5678)
# Download track data for specified track in a UCSC database genome
track.downloadData(genome='galGal6',track='gc5BaseBw',maxItemsOutput=100)
List chromosomes from UCSC database genome
from ucsc.api import Chromosome
chromosomes = Chromosome.get(genome='hg38')
List chromosomes from specified track in UCSC database genome
from ucsc.api import Chromosome
chromosomes = Chromosome.get(genome='hg38', track='knownGene')
# or
from ucsc.api import Track,Genome
track = Track.find('hg38','knownGene')
genome = Genome.find('ALFA Genome')
chromosomes = Chromosome.get(genome, track)
List chromosomes from assembly hub genome
from ucsc.api import Chromosome
chromosomes = Chromosome.get(hub='ALFA Hub')
List chromosomes from specified track in assembly hub genome # Deprected!
from ucsc.api import Chromosome
chromosomes = Chromosome.get('hg38', 'ALFA Hub','knownGene')
Find Specific chromosome
from ucsc.api import Chromosome
chromosome = Chromosome.find(genome)
Find DNA sequence
The get method in Sequence class accepts multiple parameter, which depends on how do you want to retrieve the sequence object
from ucsc.api import Sequence
# Get DNA sequence from specified chromosome in UCSC database genome -
sequence = Sequence.get(genome = 'hg38',chrom= 'chrM')
print(sequence.dna)
# Get DNA sequence from specified chromosome and start,end coordinates in UCSC database genome -
sequence = Sequence.get(genome= 'hg38',chrom= 'chrM',start=4321,end=5678)
print(sequence.dna)
# Get DNA sequence from a track hub where 'genome' is a UCSC database -
hubUrl = 'http://hgdownload.soe.ucsc.edu/hubs/mouseStrains/hub.txt';
sequence = Sequence.get(genome= 'mm10',chrom= 'chrM',hubUrl=hubUrl,start=4321,end=5678)
print(sequence.dna)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ucsc-genomic-api-1.0.2.tar.gz
.
File metadata
- Download URL: ucsc-genomic-api-1.0.2.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1cb5b6afc6784aecaf90a0cefd4da8cf5d4d7803d47dcc092dd1170c75227741 |
|
MD5 | 6ad15758556e91471c3b5d3d9db37332 |
|
BLAKE2b-256 | 676f3bbd76773c0427e7febbe1140e865b8eb615989ca30a5a4b22684ae2969b |
File details
Details for the file ucsc_genomic_api-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: ucsc_genomic_api-1.0.2-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac42ff2318fe9f2c2b1315df19696862a1339b33620305cc7cc4bd2149167e39 |
|
MD5 | f9634407abffd4eeb2f7b0973efc0191 |
|
BLAKE2b-256 | ad1807586e449e44c17852f7f017794e4bb99bb5fa622a3b9a65a661c1a56c30 |