a command-line tool for downloading genome data from UCSC ftp site.
pygenomes (formally named pygenomes) is a Python module and also a command- line tool for downloading genome data from UCSC ftp site.
Using this tool you can easily check available taxa or available groups of taxa by simply typing the name of the tool or add -g flag with a group name.
Download genome or chromosome data with this tool also very simple. You can download data by using commom name, scientific name, or even specific assembly name (e.g. human, Homo sapiens, hg38).
If there is available data in the ftp site, you also can downlaod data via interactive mode, which will allow you to choose specific genome assembly.To do this, you only need to assign a True value for assembly, then according to the available list prompted to you simply input the interested assembly.
The tool only tested on Python2.7, may not work on python3.
- Install a Python module named pygenomes.py
- Install a Python command-line script named pygenomes
- Easily check available taxa and genomes in commend-line
- Simply download genomes via comman, scientific, or assembly name
- Using interactive mode, simply download specific assembly genome
- Easily download genome and chromosome in specific file format
- Pure Python module and without any third-party dependence
You can use pygenomes as a Python module e.g. like in the following interactive Python session (the function’s signature might still change a bit in the future):
>>> from pygenomes import genomes >>> # prints out available taxa in group of mammals >>> genomes(group='mammals') # download human reference genome (hg19) in .2bit format >>> genomes(taxa='human') # interactive mode, ask for inputing and downloading specific assembly >>> genomes(taxa='cow', assembly='1') # try to download chromosome data, per fa.gz file per chromosome >>> genomes(taxa='cow', chrs='1')
In addition there is a script named pygenomes, which can be used more easily from the system command-line like this (you can see many more examples when typing pygenomes -h on the command-line):
$ pygenomes -g -1 # prints out all available taxa $ pygenomes -g mammal # prints out all available taxa in group of mammal $ pygenomes -t yeast # download yeast reference genome in .2bit format $ pygenomes -t yeast -a 1 # interactive mode, ask for inputing and downloading specific assembly $ pygenomes -t cow -f fa.gz # try to download cow reference genome in fa.gz format $ pygenomes -t dog -a 1 -o /tmp # interactive mode, ask for inputing, file will be stored in /tmp $ pygenomes -t cat -c 1 # download cat chromosome data, per chromosome per fa.gz file