Suffix Array for all-substring search
Project description
sufarray: implementation of suffix array
It uses prefix doubling for calculating the suffix array, which is O(n log n) time. When searching for all occurrences of a string, it takes O(log n) time. Further occurrences takes constant time.
At present the implementation is Python only. This has the drawback that the speed is not comparable with those of other languages.
Usage
To construct a suffix array of string s
:
import sufarray
sarray = sufarray.SufArray(s)
After that, you can get the actual array with:
sarray.get_array()
This produces a permutation a
of range(len(s))
, where the values
a[0]
, a[1]
, etc. are such that the suffixes s[a[i]:]
is sorted.
Other than that, you can find the positions where a string n
appears
within s
using:
sarray.find_all(n)
This is a generator function, yielding all the possible positions.
Implementations
Two implementations are provided. The simple version is called
SufArrayBruteForce
, and a version SufArrayPD
is provided which
calculates the suffix array using prefix doubling, and augment the
result with a LCP (Longest common prefix) array for use with
searching. The latter has better bounded performance guarantees, but
with arrays of small text it is slower. The default is SufArrayPD
.
You can use the brute force version using
sufarray.SufArrayBruteForce
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sufarray_kkto-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07b689566ce3e0448318d21b27a97340aa2decf1a254bbae3805613283a7422f |
|
MD5 | 9dd9b285ca7a91323cb371a6d3b6bfcf |
|
BLAKE2b-256 | dab72dfdbf65d52ce6ee0186df5ab358743321903d1b99c95d151e074f6f215f |