Skip to main content

Genomes in Python

Project description

PyPI version

Easily install and use genomes in Python and elsewhere!

Installation

Via pip, for now.

$ pip install genomepy

Usage

>>> import genomepy
>>> for row in genomepy.search("human"):
...     print "\t".join(row)
...
UCSC    hg38    Human Dec. 2013 (GRCh38/hg38) Genome at UCSC
UCSC    hg19    Human Feb. 2009 (GRCh37/hg19) Genome at UCSC
UCSC    hg18    Human Mar. 2006 (NCBI36/hg18) Genome at UCSC
UCSC    hg17    Human May 2004 (NCBI35/hg17) Genome at UCSC
UCSC    hg16    Human July 2003 (NCBI34/hg16) Genome at UCSC
Ensembl bacteria_102_collection_core_34_87_1    Brucella melitensis (GCA_000988815)
Ensembl bacteria_94_collection_core_34_87_1 Brucella suis (GCA_000875695)
Ensembl bacteria_131_collection_core_34_87_1    Candidatus Paraburkholderia schumannianae
Ensembl homo_sapiens_core_86_38 Human
Ensembl pediculus_humanus_core_34_87_2  Pediculus humanus
>>> genomepy.install_genome("hg38", "UCSC", "/data/genomes")
downloading...
done...
name: hg38
fasta: /data/genomes/hg38/hg38.fa
>>> g = genomepy.genome("hg38", "/data/genomes")
>>> g["chr6"][166502000:166503000]
tgtatggtccctagaggggccagagtcacagagatggaaagtggatggcgggtgccgggggctggggagctactgtgcagggggacagagctttagttctgcaagatgaaacagttctggagatggacggtggggatgggggcccagcaatgggaacgtgcttaatgccactgaactgggcacttaaacgtggtgaaaactgtaaaagtcatgtgtatttttctacaattaaaaaaaATCTGCCACAGAGTTAAAAAAATAACCACTATTTTCTGGAAATGGGAAGGAAAAGTTACAGCATGTAATTAAGATGACAATTTATAATGAACAAGGCAAATCTTTTCATCTTTGCCTTTTGGGCATATTCAATCTTTGCCCAGAATTAAGCACCTTTCAAGATTAATTCTCTAATAATTCTAGTTGAACAACACAACCTTTTCCTTCAAGCTTGCAATTAAATAAGGCTATTTTTAGCTGTAAGGATCACGCTGACCTTCAGGAGCAATGAGAACCGGCACTCCCGGCCTGAGTGGATGCACGGGGAGTGTGTCTAACACACAGGCGTCAACAGCCAGGGCCGCACGAGGAGGAGGAGTGGCAACGTCCACACAGACTCACAACACGGCACTCCGACTTGGAGGGTAATTAATACCAGGTTAACTTCTGGGATGACCTTGGCAACGACCCAAGGTGACAGGCCAGGCTCTGCAATCACCTCCCAATTAAGGAGAGGCGAAAGGGGACTCCCAGGGCTCAGAGCACCACGGGGTTCTAGGTCAGACCCACTTTGAAATGGAAATCTGGCCTTGTGCTGCTGCTCTTGTGGGGAGACAGCAGCTGCGGAGGCTGCTCTCTTCATGGGATTACTCTGGATAAAGTCTTTTTTGATTCTACgttgagcatcccttatctgaaatgcctgaaaccggaagtgtttaggatttggggattttgcaatatttacttatatataatgagatatcttggagatgggccacaa

The genomepy.genome() method returns a pyfaidx.Fasta object, see the documentation for more examples on how to use this.

Command line

$ genomepy

Usage: genomepy [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  genomes    list available genomes
  install    install genome
  providers  list available providers
  search     search for genomes

List available genomes.

$ genomepy genomes -p UCSC
UCSC    hg38    Human Dec. 2013 (GRCh38/hg38) Genome at UCSC
UCSC    hg19    Human Feb. 2009 (GRCh37/hg19) Genome at UCSC
UCSC    hg18    Human Mar. 2006 (NCBI36/hg18) Genome at UCSC
...
UCSC    danRer4 Zebrafish Mar. 2006 (Zv6/danRer4) Genome at UCSC
UCSC    danRer3 Zebrafish May 2005 (Zv5/danRer3) Genome at UCSC

Install a genome.

$ genomepy  install hg38 UCSC /data/genomes/
downloading...
done...
name: hg38
fasta: /data/genomes/hg38/hg38.fa

Todo

  • Tests!

  • Ensembl bacteria

  • Automatic indexing (such as bwa)

  • Caching of UCSC/Ensembl genome listings

  • Configurable default genome installation directory

License

This module is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genomepy-0.0.1.tar.gz (6.0 kB view details)

Uploaded Source

File details

Details for the file genomepy-0.0.1.tar.gz.

File metadata

  • Download URL: genomepy-0.0.1.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for genomepy-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9383ef4d2d7b7c008bf4150674c6058efa434235e4c08f2f1b201b888e424389
MD5 a2f0dfefb1288aa201b23c7d4e2b0945
BLAKE2b-256 c049e4aa0b64eda667bbd4c6fb3d0b6c6fb0d4a0ba03ceda3a4409a60f76d41b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page