Skip to main content

Python scripts to upload primary metagenome and metatranscriptome assemblies to ENA on a per-study basis. This script generates xmls to register a new study and create manifests necessary for submission with webin-cli.

Project description

ENA Assembly uploader

Upload of metagenome and metatranscriptome assemblies to the European Nucleotide Archive (ENA)

Pre-requisites:

  • CSV metadata file. One per study. See test/fixtures/test_metadata for an example
  • Compressed assembly fasta files in the locations defined in the metadata file

Set the following environmental variables with your webin details:

ENA_WEBIN

export ENA_WEBIN=Webin-0000

ENA_WEBIN_PASSWORD

export ENA_WEBIN_PASSWORD=password

Installation

Install the package:

pip install assembly-uploader

Usage

From the command line

Register study and generate pre-upload files

If you already have a registered study accession for your assembly files skip to step 3.

Step 1: generate XML files for a new assembly study submission

This step will generate a folder STUDY_upload and a project XML and submission XML within it:

study_xmls
  --study STUDY         raw reads study ID
  --library LIBRARY     metagenome or metatranscriptome
  --center CENTER       center for upload e.g. EMG
  --hold HOLD           hold date (private) if it should be different from the provided study in format dd-mm-yyyy. Will inherit the release date of the raw read study if not
                        provided.
  --tpa                 use this flag if the study a third party assembly. Default False
  --publication PUBLICATION
                        pubmed ID for connected publication if available

Step 2: submit the new assembly study to ENA

This step submit the XML to ENA and generate a new assembly study accession. Keep note of the newly generated study accession:

submit_study
  --study STUDY         raw reads study ID
  --test                run test submission only

Step 3: make a manifest file for each assembly

This step will generate manifest files in the folder STUDY_UPLOAD for runs specified in the metadata file:

assembly_manifest
  --study STUDY         raw reads study ID
  --data DATA           metadata CSV - run_id, coverage, assembler, version, filepath
  --assembly_study ASSEMBLY_STUDY
                        pre-existing study ID to submit to if available. Must exist in the webin account
  --force               overwrite all existing manifests

Step 4: upload assemblies

Once manifest files are generated, it is necessary to use ENA's webin-cli resource to upload genomes.

To test your submission add the -test argument.

A live execution example within this repo is the following:

ena-webin-cli \
  -context=genome \
  -manifest=SRR12240187.manifest \
  -userName=$ENA_WEBIN \
  -password=$ENA_WEBIN_PASSWORD \
  -submit

More information on ENA's webin-cli can be found in the ENA docs.

From a Python script

This assembly_uploader can also be used a Python library, so that you can integrate the steps into another Python workflow or tool.

from pathlib import Path

from assembly_uploader.study_xmls import StudyXMLGenerator, METAGENOME
from assembly_uploader.submit_study import submit_study
from assembly_uploader.assembly_manifest import AssemblyManifestGenerator

# Generate new assembly study XML files
StudyXMLGenerator(
    study="SRP272267",
    center_name="EMG",
    library=METAGENOME,
    tpa=True,
    output_dir=Path("my-study"),
).write()

# Submit new assembly study to ENA
new_study_accession = submit_study("SRP272267", is_test=True, directory=Path("my-study"))
print(f"My assembly study has the accession {new_study_accession}")

# Create manifest files for the assemblies to be uploaded
# This assumes you have a CSV file detailing the assemblies with their assembler and coverage metadata
# see tests/fixtures/test_metadata for an example
AssemblyManifestGenerator(
    study="SRP272267",
    assembly_study=new_study_accession,
    assemblies_csv=Path("/path/to/my/assemblies.csv"),
    output_dir=Path("my-study"),
).write()

The ENA submission requires webin-cli, so follow Step 4 above. (You could still call this from Python, e.g. with subprocess.Popen.)

Development setup

Prerequisites: a functioning conda or pixi installation.

To install the assembly uploader codebase in "editable" mode:

conda env create -f requirements.yml
conda activate assemblyuploader
pip install -e '.[dev,test]'
pre-commit install

Testing

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assembly_uploader-1.1.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

assembly_uploader-1.1.0-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file assembly_uploader-1.1.0.tar.gz.

File metadata

  • Download URL: assembly_uploader-1.1.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for assembly_uploader-1.1.0.tar.gz
Algorithm Hash digest
SHA256 304e77820484d40cf45b476d4ad6a2838b1cfe64b47ef89f064ae275b6790616
MD5 d2016bf317e6cef9238a53e5e27715b6
BLAKE2b-256 18a7644bb697dff4411c7dd62508054efda916bc6eec7f4503d4226a5c550a61

See more details on using hashes here.

Provenance

The following attestation bundles were made for assembly_uploader-1.1.0.tar.gz:

Publisher: pypi.yml on EBI-Metagenomics/assembly_uploader

Attestations:

File details

Details for the file assembly_uploader-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for assembly_uploader-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fb0f70b4cbb4a1927788280804a13d1e133917df433fd6ed767e49281ee72e85
MD5 d31c69ab6e3b01a24d87a6ab520176f1
BLAKE2b-256 84c605f6f79fa51e8cdc51d8967851a029f72d023301f0d41e7d9500ef754025

See more details on using hashes here.

Provenance

The following attestation bundles were made for assembly_uploader-1.1.0-py3-none-any.whl:

Publisher: pypi.yml on EBI-Metagenomics/assembly_uploader

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page