Helper scripts for submission to ena (microbial + sarscov2) and gisaid (sarscov2 only)
Project description
subhelper
helper scripts for submission to ena (microbial + sarscov2) and gisaid (sarscov2 only)
enahelper
interactive site: https://www.ebi.ac.uk/ena/submit/sra/#home webin (xml) submission: https://www.ebi.ac.uk/ena/submit/webin/
gisaidsub USAGE
usage: gisaidsub.py [-h] [-v] [--version] [--template TEMPLATE]
[--outputdir OUTPUTDIR] [--fasta_output FASTA_OUTPUT]
[--field_mappings FIELD_MAPPINGS]
[--global_values GLOBAL_VALUES]
meta_sheet fasta_dir
gisaidsub prepares files for gisaid sub using the interactive batch
submission.
positional arguments:
meta_sheet path to metadata sheet
fasta_dir directory of fasta files
optional arguments:
-h, --help show this help message and exit
-v, --verbose verbose output
--version show program's version number and exit
--template TEMPLATE Path to GISAID template
--outputdir OUTPUTDIR
output directory
--fasta_output FASTA_OUTPUT
fasta output filename
--field_mappings FIELD_MAPPINGS
field mappings YAML
--global_values GLOBAL_VALUES
global values YAML
Licence: GPLv3 by Nabil-Fareed Alikhan <nabil@happykhan.com>
gisaidsub explained
The way the script works is that you first need a directory of all the fasta consensus files in one directory.
You then need an existing sheet of metadata, usually this is provided to you.
You then need to make two yaml files, that tell the script rules on what fields map to what. First field, is the name that GISAID wants in its table, the second is what its call in your sheet. e.g.
covv_location: Location
covv_collection_date: Date_of_Collection
covv_gender: Gender
covv_patient_age: Age
sample_name: Sample
Then you want to have another yaml file of "globals", values that apply to every record, such as.
sample_prefix: MYSample-
submitter: <Your_gisiad_id>
covv_seq_technology: Illumina
covv_orig_lab: <originating lab>
covv_orig_lab_addr: <originating lab address>
covv_subm_lab: <submitting lab>
covv_subm_lab_addr: <submitting lab address>
covv_authors: <authors>
country: <country collection>
continent: <continent>
You can add in as many of the standard gisaid fields. See gisaidschema or GISAID documentation for what those fields could be.
You then run gisaidhelper:
python gisaidsub.py metadata_they_gave.csv all_fasta_dir --outputdir my_output --field_mapping my_first_file.yaml --global_values something_global.yaml
The script then:
- takes you csv input swaps the field names as per the mapping yaml and add in the global info.
- then it validates it with the gisaidscheme.py and produces a csv for submission.
- It also goes to the fasta dir and merges the sequences into a single file (this is what gisaid wants)
- and renames each sequence so it is consistent with the metadata. i.e. changes it to hcov-19/X/X/2021
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file subhelper-1.0.8.tar.gz
.
File metadata
- Download URL: subhelper-1.0.8.tar.gz
- Upload date:
- Size: 24.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e335684415eeefaaf605429ca1fa51bdf57c4c7a4b81de460565cf10f9c103b |
|
MD5 | 0b287d864888a0776bc1f5848b7d3a43 |
|
BLAKE2b-256 | 5e36069f547db366a11c9b9e241f3a15f2f9f76d13eec3946e8775151f6be3bb |
File details
Details for the file subhelper-1.0.8-py3-none-any.whl
.
File metadata
- Download URL: subhelper-1.0.8-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8de53b79d5ed8e38a03a123ca81f4a09728ff02a54b7d2379ba12cf0fd5f84b4 |
|
MD5 | e9a43734cf7218e8d3ee804c3ca98134 |
|
BLAKE2b-256 | ad74a678c8dac537a4b4121206888d0bbb817dde8529c627e38dd719f96b80dc |