Skip to main content

Easily download and store genomes and BLAST DBs from NCBI

Project description

Travis CI build status https://coveralls.io/repos/github/Edinburgh-Genome-Foundry/genome_collector/badge.svg?branch=master

Genome Collector is a Python library to download and manage reference genome data for specific TaxIDs, in particular nucleotide and protein sequences (in fasta/genbank/gff formats), and alignment databases (BLAST, Bowtie1/2).

The data is downloaded automatically on a need-to basis, making it very easy for Python projects to use and re-use reference genomes of E. coli, S. cerevisiae, and so on, without the worry of manually downloading from NCBI.

Examples

Let’s get Biopython records of all protein sequences in E. coli:

from genome_collector import GenomeCollection
collection = GenomeCollection()
records = collection.get_taxid_biopython_records(511145, "protein_fasta")

And that’s it! If the protein data wasn’t already on your machine, Genome Collector downloaded from NCBI, and stored in your “collection” for the next time time you need it.

Now let’s get a path to a local BLAST database for S. cerevisiae:

from genome_collector import GenomeCollection
collection = GenomeCollection()
db_path = collection.get_taxid_blastdb_path(taxid=559292, db_type='nucl')

If there was no cerevisiae database on your machine, Genome Collector downloaded the genome data and built it. It is now in your collection, and you can use the returned db_path to start a BLAST process:

import subprocess
process = subprocess.run([
    'blastn', '-db', db_path, '-query', 'queries.fa', '-out', 'results.txt'
])

Infos

Everyone is welcome to contribute !

More biology software

https://raw.githubusercontent.com/Edinburgh-Genome-Foundry/Edinburgh-Genome-Foundry.github.io/master/static/imgs/logos/egf-codon-horizontal.png

Genome Collector is part of the EGF Codons synthetic biology software suite for DNA design, manufacturing and validation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genome_collector-0.1.3.tar.gz (14.1 kB view details)

Uploaded Source

File details

Details for the file genome_collector-0.1.3.tar.gz.

File metadata

  • Download URL: genome_collector-0.1.3.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.6

File hashes

Hashes for genome_collector-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f43cf28443b9da7437a7916eed7ddb8a7694fa4a2ef35a17b4c0c397892d7740
MD5 c92f0f535b4dd10c15daa49a562fd200
BLAKE2b-256 069c04e443033dc50b28efb160486c538588a4c233c2cd53eb78e382e6a59520

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page