Skip to main content

Calculates SEGUID, cSEGUID & lSEGUID checksums for biological sequences

Project description

seguid_calculator

Conda Package Setuptools Package Pytest Pyinstaller

seguid_calculator_small.png

Seguid_calculator is a small GUI for calculating the SEGUID, lSEGUID and cSEGUID checksums for a biological sequence (DNA, RNA or protein).

Installation

The quickest way to use seguid_calculator is by downloading one of the executables, they requre no installation at all. Executables are available from here: releases:

  • seguid_calculator.exe for Windows
  • seguid_calculator_for_mac.zip for MacOS
  • seguid_calculator is an executable for Linux

Unfortunately no DEB or RPM packages yet (these are a planned feature for when I figure out how to make them). These packages are built automatically using Github actions. There is also an online version (see links at the end of this page.

Source installation

setuptools (pip) can be installed like this:

pip install seguid_calculator

This should work well on Windows and MacOSX. On Linux, wxpython may have to be installed separately.

Alternatively, there is a conda package that should install on all platforms on python 3.7, 3.8 or 3.9:

conda install -c bjornfjohansson seguid_calculator

For this, you need to install the anaconda scientific python distribution.

What does it do ?

The SEGUID checksum is defined as the SHA-1 cryptographic hash of a primary biological sequence in uppercase. SEGUID was suggested by Babnigg and Giometti as a way to provide stable identifiers of protein sequences in databases for cross referencing.

There are several implementations of SEGUID calculation available, such as the one in Biopython. Bio.SeqUtils.CheckSum. See slides and the Biopython wiki.

See also this blog post on the subject.

cSEGUID

Circular SEGUID or cSEGUID is the SEGUID checksum for circular (DNA) sequences. As there are many permutations of a circular sequence, the use of the SEGUID checksum directly is impractical as there would be many checksums for the different permutations of the same circular sequence. The cSEGUID is instead defined as the SEGUID of the lexicographically minimal string rotation of a sequence or its reverse complement (whichever is lexicographically smaller).

The cSEGUID provide a unique and stable identifier for circular sequences, such as plasmids.

Example

The cSEGUID checksum can be useful to quickly determine if two sequences refer to the same vector. The sequence of the plasmid pFA6a-GFPS65T-kanMX6 is available from Genbank and from other sources such as the Forsburg lab, sequence here, a copy of which was saved here.

Both sequences are the same size and claim to describe the same vector. Analysis of both sequences in seguid_calculator proves that both sequences are in fact representations of the same sequence by their identical cSEGUIDs:

Genbank

alt text

Forsburg

alt text

lSEGUID

The lSEGUID is the SEGUID of the lexicographically smallest of the sense or anti-sense strands of a blunt double stranded DNA sequence. This can be useful to identify double stranded DNA sequences, regardless of the form they are presented.

Implementation

Seguid_calculator is written in python 3 with wxPython 4 which is the only dependence. Development happens on Github.

Online version

There is also an online version built with flask and hosted on pythonanywhere.

seguid_calculator_flask

Click here or on the image above to take you to the website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release. See tutorial on generating distribution archives.

Built Distribution

seguid_calculator-1.2.5-py3-none-any.whl (28.1 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page