Converts an annotated DNA multi-sequence alignment (in NEXUS format) to an EMBL flatfile for submission to ENA via the Webin-CLI submission tool
Project description
annonex2embl
Converts an annotated DNA multi-sequence alignment (in NEXUS format) to an EMBL flatfile for submission to ENA via the Webin-CLI submission tool.
INSTALLATION
First, please be sure to have Python 3 installed on your system. Then:
To get the most recent stable version of annonex2embl, run:
pip install annonex2embl
Or, alternatively, if you want to get the latest development version of annonex2embl, run:
pip install git+https://github.com/michaelgruenstaeudl/annonex2embl.git
INPUT, OUTPUT AND PREREQUISITES
- Input: an annotated DNA multiple sequence alignment in NEXUS format; a comma-delimited metadata table
- Output: a submission-ready, multi-record EMBL flatfile
Requirements / Input preparation
The annotations of a NEXUS file are specified via SETS-block, which is located beneath a DATA-block and defines sets of characters in the DNA alignment. In such a SETS-block, every gene and every exon charset must be accompanied by one CDS charset. Other charsets can be defined unaccompanied.
Example of a complete SETS-BLOCK
BEGIN SETS;
CHARSET matK_gene_forward = 929-2530;
CHARSET matK_CDS_forward = 929-2530;
CHARSET trnK_intron_forward = 1-928 2531-2813;
END;
Examples of corresponding DESCR variable
DESCR="tRNA-Lys (trnK) intron, partial sequence; maturase K (matK) gene, complete sequence"
EXAMPLE USAGE
On Linux / MacOS
SCRPT=$PWD/scripts/annonex2embl_launcher_CLI.py
INPUT=examples/input/TestData1.nex
METAD=examples/input/Metadata.csv
OTPUT=examples/temp/TestData1.embl
DESCR='description of alignment here' # Do not use double-quotes
EMAIL=your_email_here@yourmailserver.com
AUTHR='your name here' # Do not use double-quotes
MNFTS=PRJEB00000
MNFTD=${DESCR//[^[:alnum:]]/_}
python3 $SCRPT -n $INPUT -c $METAD -d "$DESCR" -e $EMAIL -a "$AUTHR" -o $OTPUT --productlookup --manifeststudy $MNFTS --manifestdescr $MNFTD --compress
On Windows
SET SCRPT=$PWD\scripts\annonex2embl_launcher_CLI.py
SET INPUT=examples\input\TestData1.nex
SET METAD=examples\input\Metadata.csv
SET OTPUT=examples\temp\TestData1.embl
SET DESCR='description of alignment here'
SET EMAIL=your_email_here@yourmailserver.com
SET AUTHR='your name here'
SET MNFTS=PRJEB00000
SET MNFTD=a_unique_description_here
python %SCRPT% -n %INPUT% -c %METAD% -d %DESCR% -e %EMAIL% -a %AUTHR% -o %OTPUT% --productlookup --manifeststudy %MNFTS% --manifestdescr %MNFTD% --compress
TO DO
- In the title line of each output record, the mol_type "; DNA;" must be automatically replaced with "; genomic DNA;".
- Implement a test that checks if any sequence name is duplicated. This occurs at times and needs to throw an error at the beginning of the software execution.
- Implement a test that checks if any sequence name does not have a corresponding entry in the metadata file. This occurs at times and needs to throw an error at the beginning of the software execution.
CHANGELOG
See CHANGELOG.md
for a list of recent changes to the software.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for annonex2embl-0.9.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0071b6fa362505380ff87728b8c5752f1051d98a5544340f6dc45a435e12f5c |
|
MD5 | 24488df3f54e926548275834e6ce5c26 |
|
BLAKE2b-256 | 394bb1da27a94d7b417038d9904de28d551ab882f5b6b4a3da347cc9757c031b |