Skip to main content

Utility to fetch public and private RAW read and assembly files from the ENA

Project description

Testing PyPI version Docker Repository on Quay

Microbiome Informatics ENA fetch tool

Set of tools which allows you to fetch RAW read and assembly files from the European Nucleotide Archive (ENA).

How to set up your development environment

We recommend you to use miniconda|conda to manage the environment.

Clone the repo and install the requirements.

$ git clone git@github.com:EBI-Metagenomics/fetch_tool.git
$ cd fetch_tool
$ # activate anv (conda activate xxx)
$ pip install -r requirements-dev.txt

Pre-commit hooks

Setup the git pre-commit hook:

pre-commit install

Why?

pre-commit will run a set of pre-configured tools before allowing you to commit files. You can find the currently configure hooks and configurations in .pre-commit-config.yaml

Tests

This repo uses pytest.

It requires the aspera cli installed in the default location (install-aspera.sh with no parameters).

To run the test suite:

pytest

Install fetch tool

Using Conda

$ conda create -q -n fetch_tool python=3.8
$ conda activate fetch_tool

Install from Pypi

$ pip install fetch-tool

Install from the git repo

$ pip install git+ssh://git@github.com/EBI-Metagenomics/fetch_tool.git

Configuration file

Setup the configuration file, the template fetchdata-config-template.json for the configuration file.

The required fields are:

  • For Aspera
    • aspera_bin (the path to ascp, usually in the aspera installation under /cli/bin)
    • aspera_cert (the path to the ascp provided cert, usually in the aspera installation under /cli/etc/asperaweb_id_dsa.openssh)
  • To pull private ENA data
    • ena_api_user
    • ena_api_password

Install Aspera

Install

Run the install-aspera.sh command here, it has only one optional parameter (the installation folder).

./install path/to/installation-i-want

Otherwise it will install it in $PWD/aspera-cli

Fetch read files (amplicon and WGS data)

Usage

$ fetch-read-tool -h
usage: fetch-read-tool [-h] [-p PROJECTS [PROJECTS ...] | -l PROJECT_LIST] [-d DIR] [-v] [--version] [-f] [--ignore-errors] [--private] [-i] [-c CONFIG_FILE] [--fix-desc-file] [-ru RUNS [RUNS ...]
                       | --run-list RUN_LIST]

optional arguments:
  -h, --help            show this help message and exit
  -p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]
                        Whitespace separated list of project accession(s)
  -l PROJECT_LIST, --project-list PROJECT_LIST
                        File containing line-separated project list
  -d DIR, --dir DIR     Base directory for downloads
  -v, --verbose         Verbose
  --version             Version
  -f, --force           Ignore download errors and force re-download all files
  --ignore-errors       Ignore download errors and continue
  --private             Use when fetching private data
  -i, --interactive     interactive mode - allows you to skip failed downloads.
  -c CONFIG_FILE, --config-file CONFIG_FILE
                        Alternative config file
  --fix-desc-file       Fixed runs in project description file
  -ru RUNS [RUNS ...], --runs RUNS [RUNS ...]
                        Run accession(s), whitespace separated. Use to download only certain project runs
  --run-list RUN_LIST   File containing line-separated run accessions

Example

Download amplicon study:

$ fetch-read-tool -p SRP062869 -c fetchdata-config-local.json -v -d /home/<user>/temp/

Fetch assembly files

Usage

fetch-assembly-tool -h
usage: fetch-assembly-tool [-h] [-p PROJECTS [PROJECTS ...] | -l PROJECT_LIST] [-d DIR] [-v] [--version] [-f] [--ignore-errors] [--private] [-i] [-c CONFIG_FILE] [--fix-desc-file]
                           [-as ASSEMBLIES [ASSEMBLIES ...]] [--assembly-type {primary metagenome,binned metagenome,metatranscriptome}] [--assembly-list ASSEMBLY_LIST]

optional arguments:
  -h, --help            show this help message and exit
  -p PROJECTS [PROJECTS ...], --projects PROJECTS [PROJECTS ...]
                        Whitespace separated list of project accession(s)
  -l PROJECT_LIST, --project-list PROJECT_LIST
                        File containing line-separated project list
  -d DIR, --dir DIR     Base directory for downloads
  -v, --verbose         Verbose
  --version             Version
  -f, --force           Ignore download errors and force re-download all files
  --ignore-errors       Ignore download errors and continue
  --private             Use when fetching private data
  -i, --interactive     interactive mode - allows you to skip failed downloads.
  -c CONFIG_FILE, --config-file CONFIG_FILE
                        Alternative config file
  --fix-desc-file       Fixed runs in project description file
  -as ASSEMBLIES [ASSEMBLIES ...], --assemblies ASSEMBLIES [ASSEMBLIES ...]
                        Assembly ERZ accession(s), whitespace separated. Use to download only certain project assemblies
  --assembly-type {primary metagenome,binned metagenome,metatranscriptome}
                        Assembly type
  --assembly-list ASSEMBLY_LIST
                        File containing line-separated assembly accessions

Example

Download assembly study:

$ fetch-assembly-tool -p ERP111288 -c fetchdata-config-local.json -v -d /home/<user>/temp/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetch-tool-0.9.0.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

fetch_tool-0.9.0-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file fetch-tool-0.9.0.tar.gz.

File metadata

  • Download URL: fetch-tool-0.9.0.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for fetch-tool-0.9.0.tar.gz
Algorithm Hash digest
SHA256 6861088815d890d6da70237900f712bf4e62ba1cc96326c8fa626d34e8cb84d2
MD5 c90c3edc459d73e7fe498fdacf8d58d4
BLAKE2b-256 6b41329433328795051d2e6eae6c00c35fc54dc5be763724476734f9a8e6890e

See more details on using hashes here.

File details

Details for the file fetch_tool-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: fetch_tool-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for fetch_tool-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e20f035a6e70b5245ba96923ebd38d7785a3436c9f39d2df0a740c13aa1cf8a7
MD5 ee7b953dba603b1c528018326d4d2ecd
BLAKE2b-256 7981f9cc803e42ca3d1ab0bb01e0ab8fe295077c794dc71721300cc762506acf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page