Skip to main content

Python scripts to upload primary metagenome and metatranscriptome assemblies to ENA on a per-study basis. This script generates xmls to register a new study and create manifests necessary for submission with webin-cli.

Project description

Public ENA Assembly uploader

Upload of metagenome and metatranscriptome assemblies to the European Nucleotide Archive (ENA)

Pre-requisites:

  • CSV metadata file. One per study. See test/fixtures/test_metadata for an example
  • Compressed assembly fasta files in the locations defined in the metadata file

Set the following environmental variables with your webin details:

ENA_WEBIN

export ENA_WEBIN=Webin-0000

ENA_WEBIN_PASSWORD

export ENA_WEBIN_PASSWORD=password

Installation

Install the package:

pip install assembly_uploader

Register study and generate pre-upload files

If you already have a registered study accession for your assembly files skip to step 3.

Step 1

This step will generate a folder STUDY_upload and a project XML and submission XML within it:

study_xmls
  --study STUDY         raw reads study ID
  --library LIBRARY     metagenome or metatranscriptome
  --center CENTER       center for upload e.g. EMG
  --hold HOLD           hold date (private) if it should be different from the provided study in format dd-mm-yyyy. Will inherit the release date of the raw read study if not
                        provided.
  --tpa                 is the study a third party assembly. Default True
  --publication PUBLICATION
                        pubmed ID for connected publication if available

Step 2

This step submit the XML to ENA and generate a new assembly study accession. Keep note of the newly generated study accession:

submit_study
  --study STUDY         raw reads study ID
  --test                run test submission only

Step 3

This step will generate manifest files in the folder STUDY_UPLOAD for runs specified in the metadata file:

assembly_manifest
  --study STUDY         raw reads study ID
  --data DATA           metadata CSV - run_id, coverage, assembler, version, filepath
  --assembly_study ASSEMBLY_STUDY
                        pre-existing study ID to submit to if available. Must exist in the webin account
  --force               overwrite all existing manifests

Upload assemblies

Once manifest files are generated, it is necessary to use ENA's webin-cli resource to upload genomes.

To test your submission add the -test argument.

A live execution example within this repo is the following:

ena-webin-cli \
  -context=genome \
  -manifest=SRR12240187.manifest \
  -userName=$ENA_WEBIN \
  -password=$ENA_WEBIN_PASSWORD \
  -submit

More information on ENA's webin-cli can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assembly_uploader-1.0.1.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

assembly_uploader-1.0.1-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file assembly_uploader-1.0.1.tar.gz.

File metadata

  • Download URL: assembly_uploader-1.0.1.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.4

File hashes

Hashes for assembly_uploader-1.0.1.tar.gz
Algorithm Hash digest
SHA256 ccc8e235b9c5d3482fd4de26ccc53ea9db68e2ad169eb82aeab753ba6c1b2975
MD5 45c90640537fdee1b00c387602607eba
BLAKE2b-256 f0cff5202158b6c600184776dafb3ff19ac91ef056d6026872ee06c4f1345bfe

See more details on using hashes here.

File details

Details for the file assembly_uploader-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for assembly_uploader-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5b5dc8a7302e05520c592c1eff78324d36bfb78b9d2088bf51596ff7b3e9f01d
MD5 4d4d464f16171cf37c798b5ffcdc1b3f
BLAKE2b-256 1d4c4f24fddde96074786fb772e7ebbc8e07124b91bb88296f8890f3dd234cda

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page