Skip to main content

A toolkit for quickly calculating longest common substrings with specific relevance to entropy estimation.

Project description

FastMatch

A package for quickly calculating longest common substrings with a fixed starting location of one substring. Once the two strings, $s_1$ and $s_2$ are defined, this package can be used to find the length of the longest substring that starts in the range s2[0..j) and matches a prefix of the string s1[i..n). This prefix must begin at index $i$ in $s_1$ and must end before index $j$ in $s_2$. The indices $(i,j)$ are passed as a list of tuples with increasing $i,j$, allowing many of these matches to be computed at a time.

This algorithm employs properties of a sorted suffix array to allow the longest match length to be found in O(1) with O(N) precomputation.

This function is designed to be used within a modified Kontoyannis Shannon entropy estimator, to improve computational speed. This implementation is currently provided in the ProcessEntropy package.

Example Usage

# load packages
import LCSFinder as lcs
import numpy as np

# initialise strings
list_source = np.random.randint(1,10,100)
list_target = np.random.randint(1,10,100)

# set up objects
source = lcs.Vector1D([int(x) for x in ([np.floor(x) for x in test1])])
target = lcs.Vector1D([int(x) for x in ([np.floor(x) for x in test2])])
ob = lcs.LCSFinder(source,target)

# set up indices to search from
l_t =  lcs.Vector2D(tuple((i,i+1) for i in range(len(list_source))))

ob.ComputeAllLCSs(l_t)

Requirements

  • C++ compiler C++11 or greater

  • Python 3.x

Installation

pip install FastMatch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LCSFinder-1.1.0.tar.gz (55.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page