An implementation of the homo-edit distance algorithm.
Project description
Homo-Edit-Distance
A homo-insertion is an insertion of a string of equal characters, which we also call a block, into another string. A homo-deletion is the inverse operation, that is, the deletion of such a block. We consider the following problem: Given two strings, what is the minimum number of homo-insertions or homo-deletions needed to convert one into the other? We refer to this number as the homo-edit distance.
References
The algorithm is described in the following publication:
- M. Brand, N. K. Tran, P. Spohr, S. Schrinner, G. W. Klau. The homo-edit distance problem. bioRxiv, Cold Spring Harbor Laboratory, DOI: tbd
Installation from Repository
pip3 install homoeditdistance
Installation from Source
git clone https://github.com/AlBi-HHU/homo-edit-distance.git
cd homo-edit-distance
python3 setup.py install
How to Run on the Command Line
The Python package comes with a command line tool hed
, which can be used to run a demonstration of the algorithm. Its source code is located in demonstration.py. It may also help you to see how to invoke the functions. If you just cloned the repository you can start the demonstration from inside the cloned repository using
python3 -m homoeditdistance
Help
usage: hed [-h] -s STRING1 -t STRING2 [-a] [-b]
Given two strings, find their homo-edit distance
optional arguments:
-h, --help show this help message and exit
-s STRING1, --string1 STRING1
first string. Use quotation marks around your string
(e.g. "STRING")for the empty string or strings with
special characters
-t STRING2, --string2 STRING2
second string
-a, --all show all optimal subsequences
-b, --backtrace print transformation steps
Example
Output of hed -s "TCAGACT" -t "TAGGCTT" -a -b
The homo-edit distance between TCAGACT and TAGGCTT is 4
The following optimal subsequences were found, and obtained using the listed operations:
TAGCT
Possible optimal sequence of operations:
s: TCAGACT t: TAGGCTT
Deleting substring 1 -> 2 (C) from s
Deleting: C Result: T-AGACT
Deleting substring 4 -> 5 (A) from s
Deleting: A Result: T-AG-CT
Deleting substring 3 -> 4 (G) from t
Deleting: G Result: TAG-CTT
Deleting substring 6 -> 7 (T) from t
Deleting: T Result: TAG-CT-
How to Use in Your Own Code
Homo-Edit-Distance between Two Strings
from homoeditdistance import homoEditDistance
string1 = "TCAGACT"
string2 = "TAGGCTT"
print('The homo-edit-distance of {} and {} is {}.'.format(string1, string2, homoEditDistance(string1, string2, 0)['hed']))
How to Run the Unit Tests
Make sure that unittest
Python package is installed, and run python3 -m unittest
from inside the cloned repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file homoeditdistance-0.0.1.tar.gz
.
File metadata
- Download URL: homoeditdistance-0.0.1.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9021fe9094fc0348214ceddd4ff4c1bc828f0dcc3ecb1e8269ff45e07ae753f1 |
|
MD5 | 3cd34a3301253ed4b78b8424f13e7f80 |
|
BLAKE2b-256 | d340b9214b3f2ea209a241af3a4b640b03819244c6444c189477ba2bb56e3306 |