Skip to main content

Python package to quickly download genomes from the UCSC.

Project description

Travis CI build SonarCloud Quality SonarCloud Maintainability Codacy Maintainability Maintainability Pypi project Pypi total project downloads

Python package to quickly download genomes from the UCSC.

How do I install this package?

As usual, just download it using pip:

pip install ucsc_genomes_downloader

Tests Coverage

Since some software handling coverages sometime get slightly different results, here’s three of them:

Coveralls Coverage SonarCloud Coverage Code Climate Coverate

Usage examples

Simply instanziate a new genome

Create a new Genome object for the given genome hg19.

from ucsc_genomes_downloader import Genome
hg19 = Genome("hg19")

Downloading lazily a genome’s chromosome

Download mitochondrial genome “chromosome” for the genome “sacCer3” (downloads the chromosomes only when required).

from ucsc_genomes_downloader import Genome
sacCer3 = Genome("sacCer3")
chrM = sacCer3["chrM"] # Downloads and returns mitochondrial genome

Downloading eagerly a genome

Download all genome’s chromosomes immediately.

from ucsc_genomes_downloader import Genome
sacCer3 = Genome("sacCer3", lazy_download=False)

Loading eagerly a genome

Load (and downloads if necessary) into RAM all the genome’s chromosomes immediately.

from ucsc_genomes_downloader import Genome
sacCer3 = Genome("sacCer3", lazy_load=False)

Testing if a genome is cached

if hg19.is_cached():
    print("Genome is cached!")

Getting gaps regions

If you need a bed file containing the regions with gaps you can use:

all_gaps = hg19.gaps() # Returns gaps for all chromosomes
chrM_gaps = hg19.gaps(chromosomes=["chrM"]) # Returns gaps for chromosome chrM

Getting filled regions

If you need a bed file containing the filled regions you can use:

all_filled = hg19.filled() # Returns filled for all chromosomes
chrM_filled = hg19.filled(chromosomes=["chrM"]) # Returns filled for chromosome chrM

Getting BED sequences

Given a BED-like pandas dataframe, you can get the corresponding sequences as follows:

my_bed = pd.read_csv("path/to/my.bed", sep="\t")
sequences = hg19.bed_to_sequence(my_bed)

Removing genome’s cache

hg19.delete()

Utilities

Retrieving a list of the available genomes

You can get a complete list of the genomes available from the UCSC website with the following method:

from ucsc_genomes_downloader import get_available_genomes
all_genomes = get_available_genomes()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ucsc_genomes_downloader-1.1.0.tar.gz (11.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page