Skip to main content

NCBI Genome downloader

Project description

# Makura: NCBI Genome downloader

## Requirements

  • rsync (linux command, required for downloading NCBI genomes)

  • python 3.8 (or greater)

## Installation

### Rsync ` conda install -c conda-forge rsync # or sudo apt install rsync `

### Python packages

https://pypi.org/project/makura/

install from Pypi

` pip install makura `

install locally ` python setup.py install `

install from docker ` docker pull hunglin59638/makura `

## Usage

Update the assembly summary and taxonomy information while first using. ` makura update ` It’s ok that you don’t run the command, makura will automatically update if the assembly summary is not found.

Download bacteria and fungi genomes with complete assembly level in RefSeq database.

` makura download --group bacteria,fungi --assembly-level complete --assembly-source refseq --out_dir /path/to/dir `

Print the records of genomes with JSON lines format, default is TAB ` makura summary --accession GCF_016700215.2 --as-json-lines `

Download genomes with selected taxids ` makura download --taxid 2209 `

If you have many items to input, input a file contains lines is supported. Example: taxid_list.txt ` 61645 69218 550 `

` makura download --taxid-list taxid_list.txt --out_dir /path/to/dir `

Tips:

Running with multiple downloads in parallel is supported (Default: 4). We set the maximum is 8 to avoid NCBI blocks the downloads. ` makura download --group bacteria,fungi --parallel 4 `

While downloading the genomes, makura can check the MD5 checksum of them. The MD5 values was stored to a file named md5checksums.txt in output directory.

## Developing function Using the RESTful API to get assembly summary 1. run the API server ` docker run --rm -p 5000:5000 hunglin59638/makura:1.1.0 makura api --port 5000 ` 2. get the summary of assembly accessions ` curl http://localhost:5000/summary?accessions=GCA_002287175.1,GCA_000762265.1 ` ## Features in the future - Creating minimap2 and bwa index using downloaded genomes. - Downloading genomes by organism name, biosample, bioproject, etc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

makura-1.2.0.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

makura-1.2.0-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file makura-1.2.0.tar.gz.

File metadata

  • Download URL: makura-1.2.0.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for makura-1.2.0.tar.gz
Algorithm Hash digest
SHA256 9fc0ea7da973f6f7591c1ab65fa6e7d98077751f989feb0926e42b99d833dcb3
MD5 cd394d6db33ed8cb0500c4ef78c7b941
BLAKE2b-256 27726835cc5f6dd34807d94db41d00a2df0e630859d440976096c35ffb346901

See more details on using hashes here.

File details

Details for the file makura-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: makura-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for makura-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9112ac4323bdd39766ca9585c68d2c6293f9c468c968366eefe629083d78f780
MD5 bb6d14891778d31a0e68aa42297141c6
BLAKE2b-256 77449658f0ee196594a9ed2c9739bb53bbc25a569b715dc85e09eb78f5044e8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page