Calculates seguid, lseguid & cseguid checksums for biological sequences
Project description
Seguid calculator is a small GUI application for calculating
the SEGUID, lSEGUID and cSEGUID checksums for a biological sequence
(DNA, RNA or protein). It is available as executables for Windows,
MacOSX and Linux (see below).
The SEGUID checksum is defined as the SHA-1 cryptographic hash of a primary biological sequence in uppercase. SEGUID was suggested by Babnigg and Giometti as a way to provide stable identifiers of protein sequences in databases for cross referencing.
There are several implementations of SEGUID calculation available, such as the one in Biopython. Bio.SeqUtils.CheckSum. See slides and the Biopython wiki. See also this blog post on the subject.
The lSEGUID is the SEGUID of the lexocographically smallest of the sense or antisense strands of a double stranded DNA sequence. This means that if a sequence and its reverse compliment have the same lSEGUIDs. This can be useful to identify double stranded DNA sequences, regardless of the form they are presented.
Circular SEGUID or cSEGUID is the SEGUID checksum for circular (DNA) sequences. As there are many circular permutations of a circular sequence, the use of the SEGUID checksum directly is impractical as there would be many checksums for the same sequence.The cSEGUID is the SEGUID of the lexicographically minimal string rotation of a sequence or its reverse complement (whichever is lexicographically smaller). The cSEGUID provide a unique and stable identifier for circular sequence, such as plasmids.
Example
The cSEGUID checksum can be useful to quickly determine if two sequences refer to the same vector. The sequence of the plasmid pFA6a-GFPS65T-kanMX6 is available from Genbank and from other sources such as the Forsburg lab, sequence here or here.
Both sequences are the same size and claim to describe the same vector, although the origins seem to have been set differently. Analysis of both sequences in seguid_calculator proves that both sequences are in fact representations of the same sequence by their identical cSEGUIDs:
Genbank
alt text
</figcaption>Forsburg
alt text
</figcaption>Implementation
Seguid_calculator is written in python 2.7 with wxPython 3. Development happens on Github where source code is available.
Executables
Executables are available for
Windows 64 bit
Mac OSX dmg and a zip file containing an app
Linux deb package
The executables can be downloaded from the button at the top of this page called releases.
Visit the website Bjorn Johansson’s group at CBMA for more information.
Automatic build status
Windows standalone executables (32 and 64 bit) are built on AppVeyor using pyinstaller and Miniconda.
Standalone executables (64 bit) for MacOSX are built on TravisCI using pyinstaller and Miniconda.
A debian package (.deb) is built offline, currently on Ubuntu 16.04 using stdeb. Look at the script “run_this_scritp_to_create_deb_package.sh”. This installs system shorcuts as well.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.