Empirical determination of approximate values for levenshtein distances between random strings.
This repository contains empirically determined approximate expected Levenshtein distances between random strings over alphabets of different sizes, as well as simple python code to generate them.
To use the code, you will need
Simply clone this repo:
git clone https://github.com/nickmachnik/expected-levenshtein.git [TARGET DIR]
and then install via pip
pip install [TARGET DIR]
Test the cloned package:
cd [TARGET DIR] python -m unittest
Computing average levenshtein distances
To compute the approximate expected Levenshtein distances of random strings of lengths 1 ≤ lengths ≤ n, use
This example shows how to compute the distances of random strings up to length 100 over a 4-letter alphabet, averaged over 1000 replicates.
from sample import random_average_levenshtein import numpy as np random_average_levenshtein(100, 1000, np.arange(4))
Generating models for expected distances
For long sequences, the distance matrix returned by
random_average_levenshtein can get quite large.
If you prefer not to load and query a large matrix object every time you need an expected distance,
fit.model_average_levenshtein generates a polynomial model for each row in
the distance matrix. That way, the information that needs to be stored to compute approximate
expected levenshtein distances is reduced to the coefficients of the polynomials. Once computed,
these can be used to predict expected distances with
This example shows how to generate and use such models for random strings from length 25 to length 50.
from sample import random_average_levenshtein from fit import poly, model_average_levenshtein import numpy as np # sample distances average_distances = random_average_levenshtein(50, 1000, np.arange(4)) # make models row_indices, coefficients, mean_squared_deviations = model_average_levenshtein( average_distances, model_rows=np.arange(25, 51)) # predict expected distance for n=50, m=44 coeff_n_50 = coefficients[-1] predicted_expected_distance = poly(44, coeff_n_50)
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size expected_levenshtein-0.1.1-py3-none-any.whl (7.8 kB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size expected-levenshtein-0.1.1.tar.gz (6.4 kB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for expected_levenshtein-0.1.1-py3-none-any.whl
Hashes for expected-levenshtein-0.1.1.tar.gz