Skip to main content

Python scripts to upload primary metagenome and metatranscriptome assemblies to ENA on a per-study basis. This script generates xmls to register a new study and create manifests necessary for submission with webin-cli.

Project description

Public ENA Assembly uploader

Upload of metagenome and metatranscriptome assemblies to ENA

Pre-requisites:

  • CSV metadata file. One per study. See test/fixtures/test_metadata for an example
  • Compressed assembly fasta files in the locations defined in the metadata file

Set the following environmental variables with your webin details:

ENA_WEBIN

export ENA_WEBIN=Webin-0000

ENA_WEBIN_PASSWORD

export ENA_WEBIN_PASSWORD=password

Register study and generate pre-upload files - change this to python package installation instead?

The script needs python, requests, and ena-webin-cli to run. Install the package:

python3 -m pip install -i https://test.pypi.org/simple/ --no-deps assemblyuploader==0.0.0

If you already have a registered study accession for your assembly files skip to step 3.

Step 1. This step will generate a folder STUDY_upload and a project XML and submission XML within it:

study_xmls
  --study STUDY         raw reads study ID
  --library LIBRARY     metagenome or metatranscriptome
  --center CENTER       center for upload e.g. EMG
  --hold HOLD           hold date (private) if it should be different from the provided study in format dd-mm-yyyy. Will inherit the release date of the raw read study if not
                        provided.
  --tpa                 is the study a third party assembly. Default True
  --publication PUBLICATION
                        pubmed ID for connected publication if available

Step 2. This step submit the XML to ENA and generate a new assembly study accession. Keep note of the newly generated study accession:

submit_study
  --study STUDY         raw reads study ID
  --test                run test submission only

Step 3. This step will generate manifest files in the folder STUDY_UPLOAD for runs specified in the metadata file:

assembly_manifest
  --study STUDY         raw reads study ID
  --data DATA           metadata CSV - run_id, coverage, assembler, version, filepath
  --assembly_study ASSEMBLY_STUDY
                        pre-existing study ID to submit to if available. Must exist in the webin account
  --force               overwrite all existing manifests

Upload assemblies

Once manifest files are generated, it is necessary to use ENA's webin-cli resource to upload genomes.

To test your submission add the -test argument.

A live execution example within this repo is the following:

ena-webin-cli \
  -context=genome \
  -manifest=SRR12240187.manifest \
  -userName=$ENA_WEBIN \
  -password=$ENA_WEBIN_PASSWORD \
  -submit

More information on ENA's webin-cli can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assembly_uploader-1.0.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

assembly_uploader-1.0.0-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file assembly_uploader-1.0.0.tar.gz.

File metadata

  • Download URL: assembly_uploader-1.0.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.4

File hashes

Hashes for assembly_uploader-1.0.0.tar.gz
Algorithm Hash digest
SHA256 577820922472fcf44e8241f1cf87f41991ab8cec7e0582378a5ab520ad82bf5f
MD5 7ac1d8a6cc046ddabe447e5726c355d0
BLAKE2b-256 3e1f6abe4872af378cc79e9ab270d4d67f7b391452c1984e22be2704df34d019

See more details on using hashes here.

File details

Details for the file assembly_uploader-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for assembly_uploader-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 686705cf029f9228b649c00f15b5c04596c0b8c81655c3841bcf239affa3d1eb
MD5 3ab7098d890287b9ebbbda860cb577a1
BLAKE2b-256 beeb3da81c51fe00beac0d41b04978c1edee49e0c000c65fa352fab9cf9040fb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page