Skip to main content

An implementation of the homo-edit distance algorithm.

Project description

Homo-Edit-Distance

DOI

A homo-insertion is an insertion of a string of equal characters, which we also call a block, into another string. A homo-deletion is the inverse operation, that is, the deletion of such a block. We consider the following problem: Given two strings, what is the minimum number of homo-insertions or homo-deletions needed to convert one into the other? We refer to this number as the homo-edit distance.

References

The algorithm is described in the following publication:

  • M. Brand, N. K. Tran, P. Spohr, S. Schrinner, G. W. Klau. The homo-edit distance problem. bioRxiv, Cold Spring Harbor Laboratory, DOI: tbd

Installation from Repository

pip3 install homoeditdistance

Installation from Source

git clone https://github.com/AlBi-HHU/homo-edit-distance.git
cd homo-edit-distance
python3 setup.py install

How to Run on the Command Line

The Python package comes with a command line tool hed, which can be used to run a demonstration of the algorithm. Its source code is located in demonstration.py. It may also help you to see how to invoke the functions. If you just cloned the repository you can start the demonstration from inside the cloned repository using

python3 -m homoeditdistance

Help

usage: hed [-h] -s STRING1 -t STRING2 [-a] [-b]

Given two strings, find their homo-edit distance

optional arguments:
  -h, --help            show this help message and exit
  -s STRING1, --string1 STRING1
                        first string. Use quotation marks around your string
                        (e.g. "STRING")for the empty string or strings with
                        special characters
  -t STRING2, --string2 STRING2
                        second string
  -a, --all             show all optimal subsequences
  -b, --backtrace       print transformation steps

Example

Output of hed -s "TCAGACT" -t "TAGGCTT" -a -b

The homo-edit distance between TCAGACT and TAGGCTT is 4

The following optimal subsequences were found, and obtained using the listed operations:

TAGCT
Possible optimal sequence of operations:
s: TCAGACT t: TAGGCTT
Deleting substring 1 -> 2 (C) from s
Deleting: C       Result: T-AGACT
Deleting substring 4 -> 5 (A) from s
Deleting: A       Result: T-AG-CT
Deleting substring 3 -> 4 (G) from t
Deleting: G       Result: TAG-CTT
Deleting substring 6 -> 7 (T) from t
Deleting: T       Result: TAG-CT-

How to Use in Your Own Code

Homo-Edit-Distance between Two Strings

from homoeditdistance import homoEditDistance

string1 = "TCAGACT"
string2 = "TAGGCTT"
print('The homo-edit-distance of {} and {} is {}.'.format(string1, string2, homoEditDistance(string1, string2, 0)['hed']))

How to Run the Unit Tests

Make sure that unittest Python package is installed, and run python3 -m unittest from inside the cloned repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

homoeditdistance-0.0.1.tar.gz (8.2 kB view details)

Uploaded Source

File details

Details for the file homoeditdistance-0.0.1.tar.gz.

File metadata

  • Download URL: homoeditdistance-0.0.1.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for homoeditdistance-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9021fe9094fc0348214ceddd4ff4c1bc828f0dcc3ecb1e8269ff45e07ae753f1
MD5 3cd34a3301253ed4b78b8424f13e7f80
BLAKE2b-256 d340b9214b3f2ea209a241af3a4b640b03819244c6444c189477ba2bb56e3306

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page