Skip to main content

Utility Python package to download Genome-in-a-Bottle data from their index files.

Project description

download_giab

Utility Python package to download Genome-in-a-Bottle (GIAB) data from their index files.

This requires Python 3.6 or later.

To install, run the following:

pip install download_giab

If you're installing on a cluster, this might be more like:

pip install --user download_giab

To use, run something like the following:

download_giab https://raw.githubusercontent.com/genome-in-a-bottle/giab_data_indexes/master/AshkenazimTrio/sequence.index.AJtrio_Illumina300X_wgs_07292015.HG002

This will download everything in the linked index to the directory the utility is run from. It can also download from local index files.

If you want to download lots of data and not have the program hang up upon session disconnect, you can use nohup and &:

nohup download_giab https://raw.githubusercontent.com/genome-in-a-bottle/giab_data_indexes/master/AshkenazimTrio/sequence.index.AJtrio_Illumina300X_wgs_07292015.HG002 &

If you are downloading paired-end reads and want to concatenate all FASTQ files into two files, you can use the --cat-paired flag. This will generate two files per sample: [sample]_1.fastq.gz and [sample]_2.fastq.gz. If a sample ID is not present, the literal text paired will be used.

This will not work for some tools (e.g. bwa mem) if the FASTQ files in a pair-set are of different lengths.

If instead you want to store the read pairs + a suggested common name, use the --store-paired-names flag. This will write to a file called paired_names.txt.

To filter what files are downloaded, the --filter flag can be provided with a case insensitive string or regular expression (in Python syntax.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

download_giab-0.7.0.tar.gz (16.7 kB view hashes)

Uploaded Source

Built Distribution

download_giab-0.7.0-py3-none-any.whl (17.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page