Skip to main content

Gui for calculating seguid checksums for biological sequences.

Project description

seguid_calculator

Pytest Pyinstaller PyPI version

seguid_calculator_small.png

Seguid_calculator is a GUI for calculating checksums for DNA or RNA sequences. Four checksums are defined in the table below with their respective use case.

ssDNA dsDNA
linear slSEGUID(*) dlSEGUID
circular scSEGUID dcSEGUID

(*) The slSEGUID checksum is also useful for protein sequences.

Installation

The quickest way to use seguid_calculator is by downloading one of the apps in the table below, they require no installation at all.

Pick the adequate file for your operating system:

OS File
seguid_calculator.exe
seguid_calculator_for_mac.zip
seguid_calculator

There is also an online version (see links at the end of this page).

Source Python installation

Installation from PyPi:

pip install seguid_calculator

What does it do ?

The SEGUID checksum was defined as the base64 encoded SHA-1 cryptographic checksum of a primary biological sequence in uppercase.

SEGUID was suggested by Babnigg and Giometti as a stable identifier for cross referencing protein sequences in databases.

Implementations of the SEGUID checksum can be found in BioPython. Bio.SeqUtils.CheckSum.

For more information, see these slides and the Biopython wiki (scroll down to the "Using the SEGUID checksum" header) as well as this blog post.

slSEGUID

The single-strand linear SEGUID or slSEGUID is meant for single stranded DNA or protein sequence which share basic topology, i.e. The sequence has a beginning and an end and only one strand.

slSEGUID is fundamentally a base64url encoded version of the original SEGUID checksum where forward slash and plus (/ , +) characters of the standard base64 encoding are replaced by - and _. This makes the checksum directly useful as a part of a URL.

scSEGUID

The single-strand circular SEGUID or scSEGUID is useful for single-stranded circular DNA sequences and other molecules sharing the same properties. A circular sequence of this type has no identifiable beginning or end and also no complementary strand. A real world example of this kind of molecule is the M13 phage that maintains its genome as a circular single stranded molecule.

As there are many permutations of a circular sequence, using the slSEGUID checksum directly would be impractical as there could be several checksums for the same sequence. The scSEGUID algorithm first finds the lexicographically minimal string rotation and then applies the same checksum algorithm as for the slSEGUID.

N.B. Plasmids are usually not this kind of molecule, see dcSEGUID below.

dlSEGUID

The double-strand linear SEGUID or dlSEGUID is useful for double-stranded DNA sequences as the one depicted below. The two representations are equivalent representations of the same DNA molecule.

            5'-GATTACA-3'
               |||||||
            3'-CTAATGT-5'

            5'-TGTAATC-3'
               |||||||
            3'-ACATTAG-5'

The molecule is made up of two antiparalell complementary strands and has a beginning and an end. As the strands are complementary, each strand completely identify the other strand in the case of a blunt molecule as the one depicted.

For this reason, most databases only store one of the strands as the other one is easy to infer. The dlSEGUID algorithm compares two top strands GATTACA and TGTAATC and chooses the smallest one (GATTACA).

A string containing the GATTACA string is concatenated with a linebreak character and the reverse of the complementary strand GATTACA\nCTAATGT and further processed as for the slSEGUID checksum.

dcSEGUID

The dcSEGUID (double-strand circular SEGUID) checksum is defined for circular dsDNA molecules such as most plasmids and bacterial chromosomes.

The smallest rotation is found for each of the two strands in a manner similar to that of the scSEGUID checksum. A string in uppercase letters is constructed from the watson sequence starting at its minimum point, a line break and the complementary sequence in 3'-5' order. Another string is constructed from the crick sequence at its minimum point a line break and a the watson string in 3'-5' order. The two strings are compared and the checksum is calculated from the string.

The dcSEGUID checksum can be useful to determine if two sequences refer to the same plasmid vector. The sequence of the plasmid pFA6a-GFPS65T-kanMX6 is available from Genbank and from other sources on the web such as the Forsburg lab, sequence here, a copy of the Forsburg lab sequence was saved here.

Both sequences are understood to describe the same vector. The sequences are both, 4882 bp, but the GenBank sequence starts and ends with GAAC...TATA and the Forsburg lab sequence with ACGC...TAGA.

The two screenshots below show that the dcSEGUID checksums are identical, which proves that the two sequences describe the same double stranded circular DNA molecule.

Genbank sequence for pFA6a-GFPS65T-kanMX6

Genbank

Forsburg lab sequence for pFA6a-GFPS65T-kanMX6

Forsburg

Implementation

Seguid_calculator is written in Python and depends on wxPython and seguid.

Online version

seguid_calculator_flask

Click on the image above to take you to the website. The online version was built with flask and hosted on pythonanywhere.

How to install Online version on pythonanywhere:

16:33 ~ $ mkvirtualenv --python=python3.9 MyVirtualenv
created virtual environment CPython3.9.5.final.0-64 in 13108ms
  creator CPython3Posix(dest=/home/seguidcalculator/.virtualenvs/MyVirtualenv, clear=False, no_vcs_ignore=False, global
=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/seguidca
lculator/.local/share/virtualenv)
    added seed packages: pip==21.3, setuptools==58.2.0, wheel==0.37.0
  activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
virtualenvwrapper.user_scripts creating /home/seguidcalculator/.virtualenvs/MyVirtualenv/bin/predeactivate
virtualenvwrapper.user_scripts creating /home/seguidcalculator/.virtualenvs/MyVirtualenv/bin/postdeactivate
virtualenvwrapper.user_scripts creating /home/seguidcalculator/.virtualenvs/MyVirtualenv/bin/preactivate
virtualenvwrapper.user_scripts creating /home/seguidcalculator/.virtualenvs/MyVirtualenv/bin/postactivate
virtualenvwrapper.user_scripts creating /home/seguidcalculator/.virtualenvs/MyVirtualenv/bin/get_env_details
(MyVirtualenv) 16:36 ~ $ pip install flask flask-wtf wtforms
Looking in links: /usr/share/pip-wheels
Collecting flask
  Downloading Flask-2.2.2-py3-none-any.whl (101 kB)
     |████████████████████████████████| 101 kB 2.1 MB/s
Collecting flask-wtf
(MyVirtualenv) 16:37 ~ $
(MyVirtualenv) 16:40 ~ $ git checkout https://github.com/BjornFJohansson/seguid_calculator.git
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
(MyVirtualenv) 16:43 ~ $ git clone https://github.com/BjornFJohansson/seguid_calculator.git
Cloning into 'seguid_calculator'...
remote: Enumerating objects: 1555, done.
remote: Counting objects: 100% (441/441), done.
remote: Compressing objects: 100% (159/159), done.
remote: Total 1555 (delta 236), reused 437 (delta 232), pack-reused 1114
Receiving objects: 100% (1555/1555), 76.46 MiB | 53.41 MiB/s, done.
Resolving deltas: 100% (879/879), done.
Updating files: 100% (48/48), done.
(MyVirtualenv) 16:44 ~ $ ls
README.txt  seguid_calculator
(MyVirtualenv) 16:44 ~ $

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seguid_calculator-2.0.0a2.tar.gz (102.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seguid_calculator-2.0.0a2-py3-none-any.whl (98.7 kB view details)

Uploaded Python 3

File details

Details for the file seguid_calculator-2.0.0a2.tar.gz.

File metadata

  • Download URL: seguid_calculator-2.0.0a2.tar.gz
  • Upload date:
  • Size: 102.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.7 Linux/6.5.0-15-generic

File hashes

Hashes for seguid_calculator-2.0.0a2.tar.gz
Algorithm Hash digest
SHA256 5c6e246b251516be087bbe17eaea10606e9d7c025623437eceb6f3857e8e4f5d
MD5 9a0856a78eb8fa2cb1b9df67b730d985
BLAKE2b-256 6b1a5658339ac316f1a19b079acaec7e7c9497a1be63d06fd22b4f1d10fbe0aa

See more details on using hashes here.

File details

Details for the file seguid_calculator-2.0.0a2-py3-none-any.whl.

File metadata

  • Download URL: seguid_calculator-2.0.0a2-py3-none-any.whl
  • Upload date:
  • Size: 98.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.7 Linux/6.5.0-15-generic

File hashes

Hashes for seguid_calculator-2.0.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 2d75b44e7da28fb8c0aa758abcd3dfd3ca8b6b2b3c40faa0b06a68ce26166b98
MD5 afbc1442efc1550899d07d2989b3c11b
BLAKE2b-256 3374ca89bb3784c7dea572d1b50fdbf4c5607109ee86f1db6b4db46de8fb0c31

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page