Skip to main content

String algorithms

Project description

PyPI version Build Status

pydivsufsort: bindings to libdivsufsort

pydivsufsort prebuilds libdivsufsort as a shared library and includes it in a Python package with bindings.

Features:

  • bindings to divsufsort that return numpy arrays
  • handle almost any integer data type (e.g. int64) and not only char
  • additional string algorithms

Installation

On Linux, macOS and Windows:

python -m pip install pydivsufsort

We provide precompiled wheels for common systems, and a source distribution for Unix systems. Manual compilation on Windows might require some tweaking, please create an issue.

Usage

Using String Inputs

import numpy as np
from pydivsufsort import divsufsort, kasai

string_inp = "banana$"
string_suffix_array = divsufsort(string_inp)
string_lcp_array = kasai(string_inp, string_suffix_array)
print(string_suffix_array, string_lcp_array)
# [6 5 3 1 0 4 2] [0 1 3 0 0 2 0]

Using Integer Inputs

import numpy as np
from pydivsufsort import divsufsort, kasai

string_inp = "banana$"

# Convert the string input to integers first
int_inp = np.unique(np.array(list(string_inp)), return_inverse=True)[1]
int_suffix_array = divsufsort(int_inp)
int_lcp_array = kasai(int_inp, int_suffix_array)
print(int_suffix_array, int_lcp_array)
# [6 5 3 1 0 4 2] [0 1 3 0 0 2 0]

Using Multiple Sentinel Characters Witin A String

import numpy as np
from pydivsufsort import divsufsort, kasai

sentinel_inp = "a$banana#and@a*bandana+"
sentinel_suffix_array = divsufsort(sentinel_inp)
sentinel_lcp_array = kasai(sentinel_inp, sentinel_suffix_array)
print(sentinel_suffix_array, sentinel_lcp_array)
# [ 8  1 14 22 12  7  0 13 21  5 19  3  9 16  2 15 11 18  6 20  4 10 17] [0 0 0 0 0 1 1 1 1 3 3 2 3 0 3 0 1 0 2 2 1 2 0]

Testing

pytest

Technical details (for performance tweaks)

libdivsufsort is compiled in both 32 and 64 bits, as the 32 bits version is faster. pydivsufsort automatically chooses to use the 32 bits version when possible (aka when the input size is less than 2**31-1).

For best performance, use contiguous arrays. If you have a sliced array, pydivsufsort converts it automatically with numpy.ascontiguousarray.

The precompiled libraries use OpenMP. You can disable it by setting the env variable OMP_NUM_THREADS=1, and it will yield the same performance as the version compiled without OpenMP

The original libdivsufsort only supports char as the base type. pydivsufsort can handle arrays of any integer type (even signed), by encoding each element as multiple chars, which makes the computation slower. If your values use an integer type that is bigger than required, but they span over a small contiguous range, pydivsufsort will automatically change their type (see #6).

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydivsufsort-0.0.5.tar.gz (222.0 kB view hashes)

Uploaded Source

Built Distributions

pydivsufsort-0.0.5-cp310-cp310-win_amd64.whl (212.7 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

pydivsufsort-0.0.5-cp310-cp310-win32.whl (179.0 kB view hashes)

Uploaded CPython 3.10 Windows x86

pydivsufsort-0.0.5-cp310-cp310-musllinux_1_1_x86_64.whl (2.0 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

pydivsufsort-0.0.5-cp310-cp310-musllinux_1_1_i686.whl (2.0 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

pydivsufsort-0.0.5-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

pydivsufsort-0.0.5-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.whl (1.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.12+ i686

pydivsufsort-0.0.5-cp310-cp310-macosx_10_9_x86_64.whl (265.5 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

pydivsufsort-0.0.5-cp39-cp39-win_amd64.whl (212.6 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

pydivsufsort-0.0.5-cp39-cp39-win32.whl (179.0 kB view hashes)

Uploaded CPython 3.9 Windows x86

pydivsufsort-0.0.5-cp39-cp39-musllinux_1_1_x86_64.whl (2.0 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

pydivsufsort-0.0.5-cp39-cp39-musllinux_1_1_i686.whl (2.0 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

pydivsufsort-0.0.5-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

pydivsufsort-0.0.5-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686

pydivsufsort-0.0.5-cp39-cp39-macosx_10_9_x86_64.whl (265.5 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

pydivsufsort-0.0.5-cp38-cp38-win_amd64.whl (232.2 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

pydivsufsort-0.0.5-cp38-cp38-win32.whl (189.0 kB view hashes)

Uploaded CPython 3.8 Windows x86

pydivsufsort-0.0.5-cp38-cp38-musllinux_1_1_x86_64.whl (2.1 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

pydivsufsort-0.0.5-cp38-cp38-musllinux_1_1_i686.whl (2.1 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

pydivsufsort-0.0.5-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.4 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

pydivsufsort-0.0.5-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl (1.3 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

pydivsufsort-0.0.5-cp38-cp38-manylinux2010_x86_64.whl (1.4 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

pydivsufsort-0.0.5-cp38-cp38-manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

pydivsufsort-0.0.5-cp38-cp38-macosx_10_13_x86_64.whl (275.5 kB view hashes)

Uploaded CPython 3.8 macOS 10.13+ x86-64

pydivsufsort-0.0.5-cp38-cp38-macosx_10_9_x86_64.whl (261.4 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

pydivsufsort-0.0.5-cp37-cp37m-win_amd64.whl (224.0 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

pydivsufsort-0.0.5-cp37-cp37m-win32.whl (184.5 kB view hashes)

Uploaded CPython 3.7m Windows x86

pydivsufsort-0.0.5-cp37-cp37m-musllinux_1_1_x86_64.whl (1.9 MB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

pydivsufsort-0.0.5-cp37-cp37m-musllinux_1_1_i686.whl (1.9 MB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ i686

pydivsufsort-0.0.5-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

pydivsufsort-0.0.5-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686

pydivsufsort-0.0.5-cp37-cp37m-manylinux2010_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

pydivsufsort-0.0.5-cp37-cp37m-manylinux2010_i686.whl (926.3 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686

pydivsufsort-0.0.5-cp37-cp37m-macosx_10_13_x86_64.whl (266.5 kB view hashes)

Uploaded CPython 3.7m macOS 10.13+ x86-64

pydivsufsort-0.0.5-cp37-cp37m-macosx_10_9_x86_64.whl (256.0 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

pydivsufsort-0.0.5-cp36-cp36m-win_amd64.whl (223.9 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

pydivsufsort-0.0.5-cp36-cp36m-win32.whl (184.5 kB view hashes)

Uploaded CPython 3.6m Windows x86

pydivsufsort-0.0.5-cp36-cp36m-musllinux_1_1_x86_64.whl (1.9 MB view hashes)

Uploaded CPython 3.6m musllinux: musl 1.1+ x86-64

pydivsufsort-0.0.5-cp36-cp36m-musllinux_1_1_i686.whl (1.9 MB view hashes)

Uploaded CPython 3.6m musllinux: musl 1.1+ i686

pydivsufsort-0.0.5-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

pydivsufsort-0.0.5-cp36-cp36m-manylinux_2_12_i686.manylinux2010_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ i686

pydivsufsort-0.0.5-cp36-cp36m-manylinux2010_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

pydivsufsort-0.0.5-cp36-cp36m-manylinux2010_i686.whl (930.9 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ i686

pydivsufsort-0.0.5-cp36-cp36m-macosx_10_13_x86_64.whl (266.1 kB view hashes)

Uploaded CPython 3.6m macOS 10.13+ x86-64

pydivsufsort-0.0.5-cp36-cp36m-macosx_10_9_x86_64.whl (255.1 kB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page