Skip to main content

Command Line Interface to upload data to the European Nucleotide Archive

Project description

Python application BioConda version Pipy version European Galaxy server DOI

ENA upload tool

About

The program submits experimental data and respective metadata to the European Nucleotide Archive (ENA). The metadata should be provided in separate tables corresponding to the following ENA objects:

  • STUDY
  • SAMPLE
  • EXPERIMENT
  • RUN

The program to perform the following actions:

  • add: add an object to the archive
  • modify: modify an object in the archive
  • cancel: cancel a private object and its dependent objects (under development)
  • release: release a private object immediately to the public (under development)

After a successful submission, new tsv tables will be generated with the ENA accession numbers filled in along with a submission receipt.

Tool dependencies

  • python 3.5+ including following packages:
    • Genshi
    • lxml
    • pandas
    • requests

Installation

pip install ena-upload-cli

Usage

Minimal:  ena-upoad-cli --action {add,modify,cancel,release} --center CENTER_NAME  --secret SECRET

All supported arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --action {add,modify,cancel,release}
                         add: add an object to the archive
                         modify: modify an object in the archive
                         cancel: cancel a private object and its dependent objects
                         release: release a private object immediately to public
  --study STUDY         table of STUDY object
  --sample SAMPLE       table of SAMPLE object
  --experiment EXPERIMENT
                        table of EXPERIMENT object
  --run RUN             table of RUN object
  --data [FILE [FILE ...]]
                        data for submission
  --center CENTER_NAME  specific to your Webin account
  --tool TOOL_NAME      Specify the name of the tool this submission is done with. Default: ena-upload-cli
  --tool_version TOOL_VERSION
                        Specify the version of the tool this submission is done with. Default: current version of tool
  --secret SECRET       .secret file containing the password of your Webin account
  -d, --dev             Flag to use the dev/sandbox endpoint of ENA.
  --vir                 Flag to use the viral sample template.

Mandatory arguments: --action, --center and --secret.

ENA Webin

A Webin can be made here if you don't have one already. The --webin_id parameter makes use of the full username looking like: Webin-XXXXX. Visit Webin online to check on your submissions or dev Webin to check on test submissions.

The .secret.yml file

To avoid exposing your credentials through the terminal history, it is recommended to make use of a .secret.yml file, containing your password and username keywords. An example is given in the root of this directory.

Dev instance

By default the submission will be done using following url to ENA: https://www.ebi.ac.uk/ena/submit/drop-box/submit/?auth=ENA

Use the --dev flag if you want to do a test submission using the tool by the sandbox dev instance of ENA: https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/?auth=ENA. A TEST submission will be discarded within 24 hours.

Supported columns for viral sample submissions

Viral samples are validated by ENA using the ENA virus pathogen checklist. The columns supported in the sample tsv table used by this tool are:

Column name ENA field name Field format Cardinality
alias alias free text mandatory
status auto_filled
accession accession auto_filled
title TITLE free text mandatory
scientific_name SCIENTIFIC_NAME free text mandatory
taxon_id TAXON_ID auto_filled
sample_description DESCRIPTION free text mandatory
submission_date auto_filled
geographic_location geographic location (country and/or sea) text choice mandatory
host_common_name host common name free text mandatory
host_subject_id host subject id free text mandatory
host_health_state host health state text choice mandatory
host_sex host sex text choice mandatory
host_scientific_name host scientific name free text mandatory
collector_name collector name free text mandatory
collecting_institution collecting institution free text mandatory
isolate isolate free text mandatory
collection_date collection date restricted text recommended
geographic_location_latitude geographic location (latitude) restricted text recommended
geographic_location_longitude geographic location (longitude) restricted text recommended
geographic_location_region geographic location (region and locality) free text recommended
sample_capture_status sample capture status text choice recommended
host_disease_outcome host disease outcome text choice recommended
host_age host age restricted text recommended
virus_identifier virus identifier free text recommended
receipt_date receipt date restricted text recommended
definition_for_seropositive_sample definition for seropositive sample free text recommended
serotype serotype (required for a seropositive sample) free text recommended
host_habitat host habitat text choice recommended
isolation_source_host_associated isolation source host-associated free text recommended
host_behaviour host behaviour text choice recommended
isolation_source_non_host_associated isolation source non-host-associated free text recommended
subject_exposure subject exposure free text optional
subject_exposure_duration subject exposure duration free text optional
type_exposure type exposure free text optional
personal_protective_equipment personal protective equipment free text optional
hospitalisation hospitalisation text choice optional
illness_duration illness duration free text optional
illness_symptoms illness symptoms free text optional
sample_storage_conditions sample storage conditions free text optional
strain strain free text optional
host_description host description free text optional
gravidity gravidity free text optional

Please use the ENA virus pathogen checklist on the website of ENA to know which values are allowed/possible in the restricted text and text choice fields.

The data files

Supported data

  • Read data
  • Genome Assembly
  • Transcriptome Assembly
  • Template Sequence
  • Other Analyses

Most files uploaded to the ENA FTP server need to be compressed.

More information on how ENA wants to receive the files can be found here.

Tool overview

inputs:

  • metadata tables
    • examples in example_table
    • Please define actions in status column e.g. add, modify, cancel, release
    • to perform bulk submission of all objects, the aliases ids in different ENA objects should be in the association where alias ids in experiment object link all objects together
  • experimental data
    • examples in example_data

outputs:

  • In the same directory of inputs
  • metadata tables with updated info in status and other relevant columns, e.g:
    • updated status: added, modified, canceled, released
    • accession ids
    • submission date

Test the tool

test command: add metadata and sequence data

ena_upload --action add --center 'your_center_name' --study example_tables/ENA_template_studies.tsv --sample example_tables/ENA_template_samples.tsv --experiment example_tables/ENA_template_experiments.tsv --run example_tables/ENA_template_runs.tsv --data example_data/*gz --dev --secret .secret.yml

test command: modify metadata

ena_upload --action modify --center 'your_center_name' --study example_tables/ENA_template_studies-2020-05-01T1421.tsv --dev --secret .secret.yml

test command for viral data

ena_upload --action add --center 'your_center_name' --study example_tables/ENA_template_studies.tsv --sample example_tables/ENA_template_samples_vir.tsv --experiment example_tables/ENA_template_experiments.tsv --run example_tables/ENA_template_runs.tsv --data example_data/*gz --dev --vir --secret .secret.yml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ena-upload-cli-0.3.1.tar.gz (48.8 kB view hashes)

Uploaded Source

Built Distribution

ena_upload_cli-0.3.1-py3-none-any.whl (53.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page