A script for batch-downloading and automatic compression of data from NCBI Sequence Read Archive. Built on SRA-Toolkit.
Project description
sra-downloader
A script for batch-downloading and automatic compression of data from NCBI Sequence Read Archive. Built on SRA-Toolkit.
Features:
- Downloading SRAs using either accession IDs or NCBI generated files
- Organizing sequences by projects that they come from
- Detecting which runs have been already downloaded
Requirements
- python >= 3.6
- sra-toolkit >= 2.9.6
- pigz
Instalation
0. Docker
- Run
docker run wwydmanski/sra-downloader -h
1. Conda (recommended, also downloads sra-toolkit)
- Run
conda install -c bioconda -c bioinf-mcb sra-downloader
Note: if you don't specify the bioconda channel you will get a dependency error.
2. From PyPi
- Run
pip install sra-downloader
3. From sources
- Download a repo into a folder
- Run
pip install .
Usage
usage: sra-downloader [-h] [--fname FILENAME] [--save-dir SAVE_DIRECTORY] [--uncompressed [UNCOMPRESSED]] [--cores [CORES]] [sra_id [sra_id ...]]
Download SRA data and organize them by projects
positional arguments:
sra_id SRA IDs to download
optional arguments:
-h, --help show this help message and exit
--fname FILENAME CSV file with list of SRAs to download. Header must include `Run` and `BioProject`.
--save-dir SAVE_DIRECTORY
a directory that the files will be saved to. (default: ./downloaded)
--uncompressed [UNCOMPRESSED]
if present, the files will not be compressed. (default: False)
--cores [CORES] Cores used for compression. (default is the number of online processors, or 8 if unknown)
Examples
sra-downloader ERR2177760 --uncompressed
docker run -v $(pwd)/downloads:/downloaded wwydmanski/sra-downloader ERR1551967
sra-downloader --fname SraRunTable.txt --save-dir ./SRAs --cores 4
Sample output
└─── save_folder
├── PRJEB14961
│ ├── ERR1551967.sra_1.fastq.gz # - raw read archived files from SRA
│ ├── ERR1551967.sra_2.fastq.gz
│ └── SraRunTable.txt # - original SraRunTable.txt with useful metadata about samples
└── PRJEB20463
├── ERR2177760.sra_1.fastq.gz
├── ERR2177760.sra_2.fastq.gz
├── absent.txt # - entries that were unaccessible due to various reasons
└── SraRunTable.txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sra-downloader-1.0.7.tar.gz.
File metadata
- Download URL: sra-downloader-1.0.7.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1b94ebfc4cd7b55d8ba15c149aa055b0050b36b3d2c0a301e67ab4563f6806b
|
|
| MD5 |
1042732ad3fb274cb0c537f9b6212aa6
|
|
| BLAKE2b-256 |
60cc3b87418232fdb614aab55b265708bc117a0fa3985356b4c154282352f093
|
File details
Details for the file sra_downloader-1.0.7-py3-none-any.whl.
File metadata
- Download URL: sra_downloader-1.0.7-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04cbf8a472b4ec6d1cd84a633ea1fea158dcca540e1445d6115a4cd932a1e990
|
|
| MD5 |
8ab0a441a034426bcd7095fe1c11cf52
|
|
| BLAKE2b-256 |
114e98494ca0554eea20c219fe853029f148382c1fa01aee22e715cd6d857f11
|