Python scripts to upload primary metagenome and metatranscriptome assemblies to ENA on a per-study basis. This script generates xmls to register a new study and create manifests necessary for submission with webin-cli.
Project description
ENA Assembly uploader
Upload of metagenome and metatranscriptome assemblies to the European Nucleotide Archive (ENA)
Pre-requisites:
- CSV metadata file. One per study. See
tests/fixtures/test_metadatafor an example - Compressed assembly fasta files in the locations defined in the metadata file
Set the following environmental variables with your webin details:
ENA_WEBIN
export ENA_WEBIN=Webin-0000
ENA_WEBIN_PASSWORD
export ENA_WEBIN_PASSWORD=password
Installation
Install the package:
pip install assembly-uploader
Usage
From the command line
Register study and generate pre-upload files
If you already have a registered study accession for your assembly files skip to step 3.
Step 1: generate XML files for a new assembly study submission
This step will generate a folder <STUDY>_upload and a project XML and submission XML within it:
study_xmls
--study STUDY raw reads study ID
--library LIBRARY metagenome or metatranscriptome
--center CENTER center for upload e.g. EMG
--hold HOLD hold date (private) if it should be different from the provided study in format dd-mm-yyyy. Will inherit the release date of the raw read study if not
provided.
--tpa use this flag if the study is a third party assembly. Default False
--publication PUBLICATION
pubmed ID for connected publication if available
--private use flag if your data is private
Step 2: submit the new assembly study to ENA
This step submit the XML to ENA and generate a new assembly study accession. Keep note of the newly generated study accession:
submit_study
--study STUDY raw reads study ID
--directory PATH directory containing study XML
--test run test submission only
Step 3: make a manifest file for each assembly
[!IMPORTANT] Please read carefully before creating manifest files for co-assemblies:
- Co-assemblies cannot be generated from a mix of private and public runs - all runs used in a co-assembly must have the same privacy status (all private or all public).
- If your co-assembly was assembled from runs generated from multiple biological samples, you must first register a co-assembly sample (see ENA FAQ on co-assemblies) and then specify it in the
Samplecolumn of your metadata CSV file.
This step will generate manifest files in the folder <STUDY>_upload for runs specified in the metadata file:
assembly_manifest
--study STUDY raw reads study ID
--data DATA metadata CSV - runs (comma-separated and in quotes, example: "SRR1234,SRR5678"), coverage, assembler, version, filepath and optionally sample
--assembly_study ASSEMBLY_STUDY
pre-existing study ID to submit to if available. Must exist in the webin account
--force overwrite all existing manifests
--private use flag if your data is private
--tpa use this flag if the study is a third party assembly. Default False
Step 4: upload assemblies
Once manifest files are generated, it is necessary to use ENA's webin-cli resource to upload genomes.
To test your submission add the -test argument.
A live execution example within this repo is the following:
ena-webin-cli \
-context=genome \
-manifest=SRR12240187.manifest \
-userName=$ENA_WEBIN \
-password=$ENA_WEBIN_PASSWORD \
-submit
Optional step 5: publicly releasing a private study
release_study
--study STUDY study ID (e.g. of the assembly study)
--test run test submission only
More information on ENA's webin-cli can be found in the ENA docs.
From a Python script
This assembly_uploader can also be used a Python library, so that you can integrate the steps into another Python workflow or tool.
from pathlib import Path
from assembly_uploader.study_xmls import StudyXMLGenerator, METAGENOME
from assembly_uploader.submit_study import submit_study
from assembly_uploader.assembly_manifest import AssemblyManifestGenerator
# Generate new assembly study XML files
StudyXMLGenerator(
study="SRP272267",
center_name="EMG",
library=METAGENOME,
tpa=True,
output_dir=Path("my-study"),
).write()
# Submit new assembly study to ENA
new_study_accession = submit_study("SRP272267", is_test=True, directory=Path("my-study"))
print(f"My assembly study has the accession {new_study_accession}")
# Create manifest files for the assemblies to be uploaded
# This assumes you have a CSV file detailing the assemblies with their assembler and coverage metadata
# see tests/fixtures/test_metadata for an example
AssemblyManifestGenerator(
study="SRP272267",
assembly_study=new_study_accession,
assemblies_csv=Path("/path/to/my/assemblies.csv"),
output_dir=Path("my-study"),
).write()
The ENA submission requires webin-cli, so follow Step 4 above.
(You could still call this from Python, e.g. with subprocess.Popen.)
Finally, you can also publicly release a private/embargoed/held study:
from assembly_uploader.release_study import release_study
release_study("SRP272267")
Development setup
Prerequisites: a functioning conda or pixi installation.
To install the assembly uploader codebase in "editable" mode:
conda env create -f requirements.yml
conda activate assemblyuploader
pip install -e '.[dev,test]'
pre-commit install
Testing
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file assembly_uploader-1.3.4.tar.gz.
File metadata
- Download URL: assembly_uploader-1.3.4.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3512b0c2c00b98141335da275a8be8cdcfba37d88bbf455df1fa4a0829b1915b
|
|
| MD5 |
02b269114fa2634386b9fa140912d771
|
|
| BLAKE2b-256 |
94be91c3939fb2c5c05abc2d2ee80288290f62377b7a45e54846e4a381c1065e
|
Provenance
The following attestation bundles were made for assembly_uploader-1.3.4.tar.gz:
Publisher:
pypi.yml on EBI-Metagenomics/assembly_uploader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
assembly_uploader-1.3.4.tar.gz -
Subject digest:
3512b0c2c00b98141335da275a8be8cdcfba37d88bbf455df1fa4a0829b1915b - Sigstore transparency entry: 658715379
- Sigstore integration time:
-
Permalink:
EBI-Metagenomics/assembly_uploader@e951b96f0f23436ea4ff0565233a59d50cb08113 -
Branch / Tag:
refs/tags/v1.3.4 - Owner: https://github.com/EBI-Metagenomics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@e951b96f0f23436ea4ff0565233a59d50cb08113 -
Trigger Event:
release
-
Statement type:
File details
Details for the file assembly_uploader-1.3.4-py3-none-any.whl.
File metadata
- Download URL: assembly_uploader-1.3.4-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbff8b7a76978c2832da93e7516422eec6386238f32dface9040d0a1d03d6e62
|
|
| MD5 |
4dcbc46d1c5a61de70bc6e23de1e0605
|
|
| BLAKE2b-256 |
a4b08b317712a2caa5a1b7d96244bf149a1b5a286dc7de51995458bdd807474e
|
Provenance
The following attestation bundles were made for assembly_uploader-1.3.4-py3-none-any.whl:
Publisher:
pypi.yml on EBI-Metagenomics/assembly_uploader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
assembly_uploader-1.3.4-py3-none-any.whl -
Subject digest:
dbff8b7a76978c2832da93e7516422eec6386238f32dface9040d0a1d03d6e62 - Sigstore transparency entry: 658715390
- Sigstore integration time:
-
Permalink:
EBI-Metagenomics/assembly_uploader@e951b96f0f23436ea4ff0565233a59d50cb08113 -
Branch / Tag:
refs/tags/v1.3.4 - Owner: https://github.com/EBI-Metagenomics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@e951b96f0f23436ea4ff0565233a59d50cb08113 -
Trigger Event:
release
-
Statement type: