Skip to main content

Read the latest Real Python tutorials

Project description

License: GPL v3 PyPI release

getENA

Sometimes we need to download a sequencing project from ENA, fortunately ENA offers in its platform a link to the file that we need. However, we can spend a lot of time downloading files manually if the amount of files is large.

I have developed a small project in Python to be able to do this work in an automated and parallel way to increase the performance.

Installation

pip install getENA

Alternatively, from GitHub

pip install git+https://github.com/EnzoAndree/getENA

Usage

Let's say I'm interested in Clostridium perfringens sequencing projects, we have to search ENA for public sequencing projects at https://www.ebi.ac.uk/ena/browser/text-search?query=clostridium%20perfringens. Here, we choose the codes that we need, for example:

PRJNA350702 PRJNA285473 PRJNA508810

We have 2 options to download the FASTQ files, (1) add the project codes to the command line separated by spaces as an argument, or (2) make a file containing a list of all the project codes that need.

For the first option (recommended for few projects for example < 5) we run the following

getENA.py -p PRJNA350702 PRJNA285473 PRJNA508810

For the second option (recommended for many projects e.g. >= 5) we run the following

getENA.py -pfile ena.list.txt

Where ena.list.txt is the file containing a list of all the project codes.

If you want, you can increase the performance by increasing the number of reads that are downloaded in parallel (-t option). However, be careful, because ENA aborts the connection if it detects that you have many connections at the same time with its FTP. Empirically I have observed that 12 parallel connections work properly without ENA cancelling the download.

As a crazy example of many parallel connections of the above commands would be the following:

getENA.py -t 64 -p PRJNA350702 PRJNA285473 PRJNA508810

One of the main features of getENA.py is that it automatically confirms the integrity of the FASTQ file when you download it. If the connection is lost, if ENA cancels the connection or if the getENA.py is stopped, you can run the program again and restart the download without losing the files that were already downloaded.

Licence

GPL v3

Author

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

getENA-1.0.3-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file getENA-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: getENA-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.7.3

File hashes

Hashes for getENA-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4708da0f2c6aaae23a0ac1cfd23fd3de863a4b718ab72a3755b0e0abe265eb6b
MD5 2403c36a5bd6d5406066455a2a191c5f
BLAKE2b-256 89aa2244d5c24e267428e11756d108bfe3c1f67db3ca5e7f034203b87e865c13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page