LCSFinder

A toolkit for quickly calculating longest common substrings with specific relevance to entropy estimation.

Project description

FastMatch

A package for quickly calculating longest common substrings with a fixed starting location of one substring. Once the two strings, $s_1$ and $s_2$ are defined, this package can be used to find the length of the longest substring that starts in the range s2[0..j) and matches a prefix of the string s1[i..n). This prefix must begin at index $i$ in $s_1$ and must end before index $j$ in $s_2$. The indices $(i,j)$ are passed as a list of tuples with increasing $i,j$, allowing many of these matches to be computed at a time.

This algorithm employs properties of a sorted suffix array to allow the longest match length to be found in O(1) with O(N) precomputation.

This function is designed to be used within a modified Kontoyannis Shannon entropy estimator, to improve computational speed. This implementation is currently provided in the ProcessEntropy package.

Example Usage

# load packages
import LCSFinder as lcs
import numpy as np

# initialise strings
list_source = np.random.randint(1,10,100)
list_target = np.random.randint(1,10,100)

# set up objects
source = lcs.Vector1D([int(x) for x in ([np.floor(x) for x in test1])])
target = lcs.Vector1D([int(x) for x in ([np.floor(x) for x in test2])])
ob = lcs.LCSFinder(source,target)

# set up indices to search from
l_t =  lcs.Vector2D(tuple((i,i+1) for i in range(len(list_source))))

ob.ComputeAllLCSs(l_t)

Requirements

C++ compiler C++11 or greater
Python 3.x

Installation

pip install FastMatch

Project details

Release history Release notifications | RSS feed

This version

1.1.0

Mar 24, 2023

1.0.0 yanked

Mar 22, 2023

Reason this release was yanked:

32 bit Integer support

0.6.0

Mar 16, 2023

0.5.0 yanked

Mar 16, 2023

0.1.0 yanked

Mar 14, 2023

0.0.4 yanked

Feb 21, 2023

Reason this release was yanked:

Build not working on all systems

0.0.3

Feb 21, 2023

0.0.2

Mar 16, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LCSFinder-1.1.0.tar.gz (55.7 kB view details)

Uploaded Mar 24, 2023 Source

File details

Details for the file LCSFinder-1.1.0.tar.gz.

File metadata

Download URL: LCSFinder-1.1.0.tar.gz
Upload date: Mar 24, 2023
Size: 55.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.10

File hashes

Hashes for LCSFinder-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`99e7c0a7018e6a2fee54171e1125f789f5933e02c4830f446be5ac3e23673034`
MD5	`560bc82def67acc0e447e6d1b0ed17e3`
BLAKE2b-256	`ffd015c29551a62a21975e8c949769c80038593869d450ae26e57062c04490ac`

See more details on using hashes here.

LCSFinder 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

FastMatch

Example Usage

Requirements

Installation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes