A toolkit for quickly calculating longest common substrings with specific relevance to entropy estimation.
Project description
FastMatch
A package for quickly calculating longest common substrings with a fixed starting location of one substring. Once the two strings, $s_1$ and $s_2$ are defined, this package can be used to find the length of the longest substring that starts in the range s2[0..j) and matches a prefix of the string s1[i..n). This prefix must begin at index $i$ in $s_1$ and must end before index $j$ in $s_2$. The indices $(i,j)$ are passed as a list of tuples with increasing $i,j$, allowing many of these matches to be computed at a time.
This algorithm employs properties of a sorted suffix array to allow the longest match length to be found in O(1) with O(N) precomputation.
This function is designed to be used within a modified Kontoyannis Shannon entropy estimator, to improve computational speed. This implementation is currently provided in the ProcessEntropy package.
Example Usage
# load packages
import LCSFinder as lcs
import numpy as np
# initialise strings
list_source = np.random.randint(1,10,100)
list_target = np.random.randint(1,10,100)
# set up objects
source = lcs.Vector1D([int(x) for x in ([np.floor(x) for x in test1])])
target = lcs.Vector1D([int(x) for x in ([np.floor(x) for x in test2])])
ob = lcs.LCSFinder(source,target)
# set up indices to search from
l_t = lcs.Vector2D(tuple((i,i+1) for i in range(len(list_source))))
ob.ComputeAllLCSs(l_t)
Requirements
-
C++ compiler C++11 or greater
-
Python 3.x
Installation
pip install FastMatch
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file LCSFinder-1.1.0.tar.gz
.
File metadata
- Download URL: LCSFinder-1.1.0.tar.gz
- Upload date:
- Size: 55.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99e7c0a7018e6a2fee54171e1125f789f5933e02c4830f446be5ac3e23673034 |
|
MD5 | 560bc82def67acc0e447e6d1b0ed17e3 |
|
BLAKE2b-256 | ffd015c29551a62a21975e8c949769c80038593869d450ae26e57062c04490ac |