Calculates seguid, lseguid & cseguid checksums for biological sequences

Project description

Seguid calculator is a small GUI application for calculating the SEGUID, lSEGUID and cSEGUID checksums for a biological sequence (DNA, RNA or protein). It is available as executables for Windows, MacOSX and Linux (see below).

The SEGUID checksum is defined as the SHA-1 cryptographic hash of a primary biological sequence in uppercase. SEGUID was suggested by Babnigg and Giometti as a way to provide stable identifiers of protein sequences in databases for cross referencing.

There are several implementations of SEGUID calculation available, such as the one in Biopython. Bio.SeqUtils.CheckSum. See slides and the Biopython wiki. See also this blog post on the subject.

The lSEGUID is the SEGUID of the lexocographically smallest of the sense or antisense strands of a double stranded DNA sequence. This means that if a sequence and its reverse compliment have the same lSEGUIDs. This can be useful to identify double stranded DNA sequences, regardless of the form they are presented.

Circular SEGUID or cSEGUID is the SEGUID checksum for circular (DNA) sequences. As there are many circular permutations of a circular sequence, the use of the SEGUID checksum directly is impractical as there would be many checksums for the same sequence.The cSEGUID is the SEGUID of the lexicographically minimal string rotation of a sequence or its reverse complement (whichever is lexicographically smaller). The cSEGUID provide a unique and stable identifier for circular sequence, such as plasmids.

Example

The cSEGUID checksum can be useful to quickly determine if two sequences refer to the same vector. The sequence of the plasmid pFA6a-GFPS65T-kanMX6 is available from Genbank and from other sources such as the Forsburg lab, sequence here or here.

Both sequences are the same size and claim to describe the same vector, although the origins seem to have been set differently. Analysis of both sequences in seguid_calculator proves that both sequences are in fact representations of the same sequence by their identical cSEGUIDs:

Implementation

Seguid_calculator is written in python 2.7 with wxPython 3. Development happens on Github where source code is available.

Executables

Executables are available for

• Windows 64 bit

• Mac OSX dmg and a zip file containing an app

• Linux deb package

Automatic build status

Windows standalone executables (32 and 64 bit) are built on AppVeyor using pyinstaller and Miniconda.

Standalone executables (64 bit) for MacOSX are built on TravisCI using pyinstaller and Miniconda.

A debian package (.deb) is built offline, currently on Ubuntu 16.04 using stdeb. Look at the script “run_this_scritp_to_create_deb_package.sh”. This installs system shorcuts as well.

Project details

Uploaded source