Utility to fetch public and private RAW read and assembly files from the ENA
Project description
Microbiome Informatics ENA fetch tool
Set of tools which allows you to fetch RAW read and assembly files from the European Nucleotide Archive (ENA).
How to set up your development environment
We recommend you to use miniconda|conda to manage the environment.
Clone the repo and install the requirements.
$ git clone git@github.com:EBI-Metagenomics/fetch_tool.git
$ cd fetch_tool
$ # activate anv (conda activate xxx)
$ pip install -r requirements-dev.txt
Pre-commit hooks
Setup the git pre-commit hook:
pre-commit install
Why?
pre-commit will run a set of pre-configured tools before allowing you to commit files. You can find the currently configure hooks and configurations in .pre-commit-config.yaml
Tests
This repo uses pytest.
It requires the aspera cli installed in the default location (install-aspera.sh
with no parameters).
To run the test suite:
pytest
Install fetch tool
Using Conda
$ conda create -q -n fetch_tool python=3.8
$ conda activate fetch_tool
Install from requirements file
$ git clone git@github.com:EBI-Metagenomics/fetch_tool.git
$ pip install -r requirements.txt
$ pip install -U fetch_tool/
Install from Git repo
$ pip install git+ssh://git@github.com/EBI-Metagenomics/fetch_tool.git
Install from private Git repo with access token (access token can be found in centralised password file)
$ pip install -U git+https://{access_token}@github.com/EBI-Metagenomics/fetch_tool@master
Configuration file
Setup the configuration file, the template fetchdata-config-template.json for the configuration file.
The required fields are:
- For Aspera
- aspera_bin (the path to ascp, usually in the aspera installation under /cli/bin)
- aspera_cert (the path to the ascp provided cert, usually in the aspera installation under /cli/etc/asperaweb_id_dsa.openssh)
- To pull private ENA data
- ena_api_user
- ena_api_password
Install Aspera
Install
Run the install-aspera.sh
command here, it has only one optional parameter (the installation folder).
./install path/to/installation-i-want
Otherwise it will install it in $PWD/aspera-cli
Fetch read files (amplicon and WGS data)
Usage
$ fetch-read-tool -h
usage: fetch-read-tool [-h] [-p PROJECTS]
[-ru PROJECT_RUNS [PROJECT_RUNS ...]] [-c CONFIG_FILE]
-d DDIR [-f] [-w] [-r] [-l PLIST] [-v] [-o OUTPUT_FILE]
[-i]
Tool to fetch project sequence data from ENA
optional arguments:
-h, --help show this help message and exit
-p PROJECTS, --project PROJECTS
Project accession(s)
-ru PROJECT_RUNS [PROJECT_RUNS ...], --runs PROJECT_RUNS [PROJECT_RUNS ...]
Run accession(s) whitespace separated. That option is
useful if you want to download only certain project
runs
-c CONFIG_FILE, --config CONFIG_FILE
Configuration file [json]
-d DDIR, --dir DDIR Base directory for downloads
-f, --force Force the download, even if it hits cases of submitted
files only. But it will not download the submitted
files.
-w, --use_view Use DB views rather than FTP
-r, --private Data is private
-l PLIST, --project_list PLIST
Project list
-v, --verbose Verbose
-o OUTPUT_FILE, --output OUTPUT_FILE
Output summary [json]
-i, --interactive interactive mode - allows you to skip failed
downloads.
Example
Download amplicon study:
$ fetch-read-tool -p SRP062869 -c fetchdata-config-local.json -v -d /home/<user>/temp/
Fetch assembly files
Usage
$ fetch-assembly-tool -h
usage: fetch-assembly-tool [-h] -p PROJECT
[-as PROJECT_ASSEMBLIES [PROJECT_ASSEMBLIES ...]]
[-c CONFIG_FILE] [-d DDIR] [-s {ftp,filesystem}]
[-v] [-pr {1.0,2.0,3.0,4.0,4.1}] [-i]
Tool to fetch assemblies from ENA
optional arguments:
-h, --help show this help message and exit
-p PROJECT, --project PROJECT
Project accession
-as PROJECT_ASSEMBLIES [PROJECT_ASSEMBLIES ...], --assemblies PROJECT_ASSEMBLIES [PROJECT_ASSEMBLIES ...]
Analysis accession(s) (e.g. ERZ773283) whitespace
separated. That option is useful if you want to
download only certain project analyses
-c CONFIG_FILE, --config CONFIG_FILE
Configuration file [json]
-d DDIR, --dir DDIR Base directory for downloads
-s {ftp,filesystem}, --source {ftp,filesystem}
Source of the RAW files.
-v, --verbose Verbose
-pr {1.0,2.0,3.0,4.0,4.1}, --pipeline-version {1.0,2.0,3.0,4.0,4.1}
Specify pipeline version e.g. 4.1
-i, --interactive interactive mode - allows you to skip failed
downloads.
Example
Download assembly study:
$ fetch-assembly-tool -p ERP111288 -c fetchdata-config-local.json -v -d /home/<user>/temp/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file fetch_tool-0.8.2rc1-py3-none-any.whl
.
File metadata
- Download URL: fetch_tool-0.8.2rc1-py3-none-any.whl
- Upload date:
- Size: 21.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02d5c9c44eaa93c5f75dd843b085acbfa169d654da1cbcc27bd360c9e9a620a8 |
|
MD5 | 3c4af8f975640cc9ad46717a738ac114 |
|
BLAKE2b-256 | c11e1d205bcce2b91f9f481ef67ed54424c8ffc2d883e7beff86e67695d1a38a |