Skip to main content

Calculates seguid, lseguid & cseguid checksums for biological sequences

Project description

alt textSeguid calculator is a small GUI application for calculating the SEGUID, lSEGUID and cSEGUID checksums for a biological sequence (DNA, RNA or protein). It is available as executables for Windows, MacOSX and Linux (see below).

The SEGUID checksum is defined as the SHA-1 cryptographic hash of a primary biological sequence in uppercase. SEGUID was suggested by Babnigg and Giometti as a way to provide stable identifiers of protein sequences in databases for cross referencing.

There are several implementations of SEGUID calculation available, such as the one in Biopython. Bio.SeqUtils.CheckSum. See slides and the Biopython wiki. See also this blog post on the subject.

The lSEGUID is the SEGUID of the lexocographically smallest of the sense or antisense strands of a double stranded DNA sequence. This means that if a sequence and its reverse compliment have the same lSEGUIDs. This can be useful to identify double stranded DNA sequences, regardless of the form they are presented.

Circular SEGUID or cSEGUID is the SEGUID checksum for circular (DNA) sequences. As there are many circular permutations of a circular sequence, the use of the SEGUID checksum directly is impractical as there would be many checksums for the same sequence.The cSEGUID is the SEGUID of the lexicographically minimal string rotation of a sequence or its reverse complement (whichever is lexicographically smaller). The cSEGUID provide a unique and stable identifier for circular sequence, such as plasmids.

Example

The cSEGUID checksum can be useful to quickly determine if two sequences refer to the same vector. The sequence of the plasmid pFA6a-GFPS65T-kanMX6 is available from Genbank and from other sources such as the Forsburg lab, sequence here or here.

Both sequences are the same size and claim to describe the same vector, although the origins seem to have been set differently. Analysis of both sequences in seguid_calculator proves that both sequences are in fact representations of the same sequence by their identical cSEGUIDs:

Genbank

seguid_calculator

alt text

Forsburg

seguid_calculator

alt text

Implementation

Seguid_calculator is written in python 2.7 with wxPython 3. Development happens on Github where source code is available.

Executables

Executables are available for

  • Windows 64 bit
  • Mac OSX dmg and a zip file containing an app
  • Linux deb package

The executables can be downloaded from the button at the top of this page called releases.

Visit the website Bjorn Johansson’s group at CBMA for more information.

Automatic build status

Windows standalone executables (32 and 64 bit) are built on AppVeyor using pyinstaller and Miniconda.

Build status

Standalone executables (64 bit) for MacOSX are built on TravisCI using pyinstaller and Miniconda.

Build Status

A debian package (.deb) is built offline, currently on Ubuntu 16.04 using stdeb. Look at the script “run_this_scritp_to_create_deb_package.sh”. This installs system shorcuts as well.

Project details


Release history Release notifications

This version

1.1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for seguid-calculator, version 1.1.0
Filename, size File type Python version Upload date Hashes
Filename, size seguid_calculator-1.1.0.zip (31.9 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page