Skip to main content

Helper scripts for submission to ena (microbial + sarscov2) and gisaid (sarscov2 only)

Project description

subhelper

helper scripts for submission to ena (microbial + sarscov2) and gisaid (sarscov2 only)

enahelper

interactive site: https://www.ebi.ac.uk/ena/submit/sra/#home webin (xml) submission: https://www.ebi.ac.uk/ena/submit/webin/

gisaidsub USAGE

usage: gisaidsub.py [-h] [-v] [--version] [--template TEMPLATE]
                    [--outputdir OUTPUTDIR] [--fasta_output FASTA_OUTPUT]
                    [--field_mappings FIELD_MAPPINGS]
                    [--global_values GLOBAL_VALUES]
                    meta_sheet fasta_dir

gisaidsub prepares files for gisaid sub using the interactive batch
submission.

positional arguments:
  meta_sheet            path to metadata sheet
  fasta_dir             directory of fasta files

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         verbose output
  --version             show program's version number and exit
  --template TEMPLATE   Path to GISAID template
  --outputdir OUTPUTDIR
                        output directory
  --fasta_output FASTA_OUTPUT
                        fasta output filename
  --field_mappings FIELD_MAPPINGS
                        field mappings YAML
  --global_values GLOBAL_VALUES
                        global values YAML

Licence: GPLv3 by Nabil-Fareed Alikhan <nabil@happykhan.com>

gisaidsub explained

The way the script works is that you first need a directory of all the fasta consensus files in one directory.

You then need an existing sheet of metadata, usually this is provided to you.

You then need to make two yaml files, that tell the script rules on what fields map to what. First field, is the name that GISAID wants in its table, the second is what its call in your sheet. e.g.

covv_location: Location
covv_collection_date: Date_of_Collection
covv_gender: Gender
covv_patient_age: Age
sample_name: Sample

Then you want to have another yaml file of "globals", values that apply to every record, such as.

sample_prefix: MYSample-
submitter: <Your_gisiad_id>
covv_seq_technology: Illumina
covv_orig_lab: <originating lab>
covv_orig_lab_addr: <originating lab address>
covv_subm_lab: <submitting lab>
covv_subm_lab_addr: <submitting lab address> 
covv_authors: <authors>
country: <country collection>
continent: <continent>

You can add in as many of the standard gisaid fields. See gisaidschema or GISAID documentation for what those fields could be.

You then run gisaidhelper:

python gisaidsub.py metadata_they_gave.csv  all_fasta_dir --outputdir my_output  --field_mapping my_first_file.yaml --global_values something_global.yaml  

The script then:

  • takes you csv input swaps the field names as per the mapping yaml and add in the global info.
  • then it validates it with the gisaidscheme.py and produces a csv for submission.
  • It also goes to the fasta dir and merges the sequences into a single file (this is what gisaid wants)
  • and renames each sequence so it is consistent with the metadata. i.e. changes it to hcov-19/X/X/2021

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subhelper-1.0.8.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

subhelper-1.0.8-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file subhelper-1.0.8.tar.gz.

File metadata

  • Download URL: subhelper-1.0.8.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for subhelper-1.0.8.tar.gz
Algorithm Hash digest
SHA256 0e335684415eeefaaf605429ca1fa51bdf57c4c7a4b81de460565cf10f9c103b
MD5 0b287d864888a0776bc1f5848b7d3a43
BLAKE2b-256 5e36069f547db366a11c9b9e241f3a15f2f9f76d13eec3946e8775151f6be3bb

See more details on using hashes here.

File details

Details for the file subhelper-1.0.8-py3-none-any.whl.

File metadata

  • Download URL: subhelper-1.0.8-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for subhelper-1.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8de53b79d5ed8e38a03a123ca81f4a09728ff02a54b7d2379ba12cf0fd5f84b4
MD5 e9a43734cf7218e8d3ee804c3ca98134
BLAKE2b-256 ad74a678c8dac537a4b4121206888d0bbb817dde8529c627e38dd719f96b80dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page