Skip to main content

String algorithms

Project description

PyPI version Build Status

pydivsufsort: bindings to libdivsufsort

pydivsufsort prebuilds libdivsufsort as a shared library and includes it in a Python package with bindings.

Features:

  • bindings to divsufsort that return numpy arrays
  • handle almost any integer data type (e.g. int64) and not only char
  • additional string algorithms

Installation

On Linux, macOS and Windows:

python -m pip install pydivsufsort

We provide precompiled wheels for common systems, and a source distribution for Unix systems. Manual compilation on Windows might require some tweaking, please create an issue.

Usage

Using String Inputs

import numpy as np
from pydivsufsort import divsufsort, kasai

string_inp = "banana$"
string_suffix_array = divsufsort(string_inp)
string_lcp_array = kasai(string_inp, string_suffix_array)
print(string_suffix_array, string_lcp_array)
# [6 5 3 1 0 4 2] [0 1 3 0 0 2 0]

Using Integer Inputs

import numpy as np
from pydivsufsort import divsufsort, kasai

string_inp = "banana$"

# Convert the string input to integers first
int_inp = np.unique(np.array(list(string_inp)), return_inverse=True)[1]
int_suffix_array = divsufsort(int_inp)
int_lcp_array = kasai(int_inp, int_suffix_array)
print(int_suffix_array, int_lcp_array)
# [6 5 3 1 0 4 2] [0 1 3 0 0 2 0]

Using Multiple Sentinel Characters Witin A String

import numpy as np
from pydivsufsort import divsufsort, kasai

sentinel_inp = "a$banana#and@a*bandana+"
sentinel_suffix_array = divsufsort(sentinel_inp)
sentinel_lcp_array = kasai(sentinel_inp, sentinel_suffix_array)
print(sentinel_suffix_array, sentinel_lcp_array)
# [ 8  1 14 22 12  7  0 13 21  5 19  3  9 16  2 15 11 18  6 20  4 10 17] [0 0 0 0 0 1 1 1 1 3 3 2 3 0 3 0 1 0 2 2 1 2 0]

Testing

pytest

Technical details (for performance tweaks)

libdivsufsort is compiled in both 32 and 64 bits, as the 32 bits version is faster. pydivsufsort automatically chooses to use the 32 bits version when possible (aka when the input size is less than 2**31-1).

For best performance, use contiguous arrays. If you have a sliced array, pydivsufsort converts it automatically with numpy.ascontiguousarray.

The precompiled libraries use OpenMP. You can disable it by setting the env variable OMP_NUM_THREADS=1, and it will yield the same performance as the version compiled without OpenMP

The original libdivsufsort only supports char as the base type. pydivsufsort can handle arrays of any integer type (even signed), by encoding each element as multiple chars, which makes the computation slower. If your values use an integer type that is bigger than required, but they span over a small contiguous range, pydivsufsort will automatically change their type (see #6).

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydivsufsort-0.0.4.tar.gz (201.2 kB view details)

Uploaded Source

Built Distributions

pydivsufsort-0.0.4-cp38-cp38-win_amd64.whl (195.9 kB view details)

Uploaded CPython 3.8 Windows x86-64

pydivsufsort-0.0.4-cp38-cp38-win32.whl (158.2 kB view details)

Uploaded CPython 3.8 Windows x86

pydivsufsort-0.0.4-cp38-cp38-manylinux2010_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

pydivsufsort-0.0.4-cp38-cp38-manylinux2010_i686.whl (818.4 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

pydivsufsort-0.0.4-cp38-cp38-macosx_10_13_x86_64.whl (223.9 kB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

pydivsufsort-0.0.4-cp37-cp37m-win_amd64.whl (191.6 kB view details)

Uploaded CPython 3.7m Windows x86-64

pydivsufsort-0.0.4-cp37-cp37m-win32.whl (154.9 kB view details)

Uploaded CPython 3.7m Windows x86

pydivsufsort-0.0.4-cp37-cp37m-manylinux2010_x86_64.whl (905.1 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

pydivsufsort-0.0.4-cp37-cp37m-manylinux2010_i686.whl (697.7 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686

pydivsufsort-0.0.4-cp37-cp37m-macosx_10_13_x86_64.whl (221.0 kB view details)

Uploaded CPython 3.7m macOS 10.13+ x86-64

pydivsufsort-0.0.4-cp36-cp36m-win_amd64.whl (191.4 kB view details)

Uploaded CPython 3.6m Windows x86-64

pydivsufsort-0.0.4-cp36-cp36m-win32.whl (155.0 kB view details)

Uploaded CPython 3.6m Windows x86

pydivsufsort-0.0.4-cp36-cp36m-manylinux2010_x86_64.whl (911.7 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

pydivsufsort-0.0.4-cp36-cp36m-manylinux2010_i686.whl (698.6 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ i686

pydivsufsort-0.0.4-cp36-cp36m-macosx_10_13_x86_64.whl (220.9 kB view details)

Uploaded CPython 3.6m macOS 10.13+ x86-64

File details

Details for the file pydivsufsort-0.0.4.tar.gz.

File metadata

  • Download URL: pydivsufsort-0.0.4.tar.gz
  • Upload date:
  • Size: 201.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9

File hashes

Hashes for pydivsufsort-0.0.4.tar.gz
Algorithm Hash digest
SHA256 f4e9b2b4f0d3c884d4a04ec8813b7b75f321a97556f72a9301bb80014d1885f4
MD5 958412e08aacd8c6e92b31cb2025e8c9
BLAKE2b-256 4652b80eace77a6856e1d0434189403156aca525bdb48d29b8f8dc90da61eb8e

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 195.9 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.0

File hashes

Hashes for pydivsufsort-0.0.4-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 21464c2543eb48fefbd63c287c00040f308d3988c24389f232079a635e401b8a
MD5 ad8f40c8c11600c2b8a9dbc1cd548a4c
BLAKE2b-256 99d6842062f871d0a219d6f2713d2f258d760b6273165f6479472b171c845744

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp38-cp38-win32.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp38-cp38-win32.whl
  • Upload date:
  • Size: 158.2 kB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.0

File hashes

Hashes for pydivsufsort-0.0.4-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 5e46324b65e26a2e8adcb483b39bd8049baf383744c49e60c540779951f19490
MD5 6ad3f63f85ca29671108474ae5d0a6ff
BLAKE2b-256 c8535d1ef1f1f286cfc0e6962b6348839261fd0c019bf657fba1ed593e0f7fb5

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.7

File hashes

Hashes for pydivsufsort-0.0.4-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f2ca44716a0f43fe0a8a29d7575786bf4d75d08a63f90e6636044860d9e79678
MD5 c60004f17bb6652fff221297eb2f7479
BLAKE2b-256 4e646122ed541e6bb748355f279239778eb23ec50be266741db16c8b2405a7b8

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp38-cp38-manylinux2010_i686.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp38-cp38-manylinux2010_i686.whl
  • Upload date:
  • Size: 818.4 kB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.7

File hashes

Hashes for pydivsufsort-0.0.4-cp38-cp38-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 6048338e0413abcd0fce12d5d0bea7a7a2598d46e4937dd51814392f7fbaf8bb
MD5 fef9224864f64f2e97ce0e946893c87f
BLAKE2b-256 a461d63bdbac3c39e74a654d1731828253464b0b4b8305964056fa3922d596a5

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp38-cp38-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 223.9 kB
  • Tags: CPython 3.8, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for pydivsufsort-0.0.4-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 98897daf80ae1060690b0ab09f8b8dd2469edf67d537f3c6327c93791e697e7e
MD5 db9c59aa612ea86ef60a2a0f5cff5967
BLAKE2b-256 33e9e21cb964c0b7435857b087557570604cb66f4c271b28457aa0e5aae58641

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 191.6 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.0

File hashes

Hashes for pydivsufsort-0.0.4-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 178a02deee19c69dc58ce63891dc9b24f045aecd4be68e5a6d960fea24eb4c96
MD5 7d6a728cd1c84c782b418517b6cfb1d7
BLAKE2b-256 14b9af4e885c4b62267033f4bfdc7cbdbc33e0f1845bff9729f025213e11ca3e

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp37-cp37m-win32.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 154.9 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.0

File hashes

Hashes for pydivsufsort-0.0.4-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 608c76f6c9424d64bfc3da8aeef0623bfcc497f772ceae7ca4bfedb836c8d319
MD5 d2198148f86b050c1d12282aa62c532b
BLAKE2b-256 efbddc79762bf8f2767566924b6bafdd3836b16904e80ec123a0a5788ccdc6ac

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 905.1 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.7

File hashes

Hashes for pydivsufsort-0.0.4-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 3bff53534ee753c50442eeb56463281c0ae450674eab3c3b143c24cd3562ea89
MD5 ff02cf4b1d792aeb421e54160f02047f
BLAKE2b-256 044d9e52dc1a56cba0a692e15b645e9f7fd01d8e06e2ce5290d90a919225a4c5

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp37-cp37m-manylinux2010_i686.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp37-cp37m-manylinux2010_i686.whl
  • Upload date:
  • Size: 697.7 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.7

File hashes

Hashes for pydivsufsort-0.0.4-cp37-cp37m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 44c12e0ae71f043dc92597b0324d777f0d7db3486d5c4bfb34d31d5f19f533e4
MD5 4543fe502911b2ef59394e09bc5b3099
BLAKE2b-256 36cce0953d95fc828db92ad4bdada8bcafa9bc794949be75a159e573c1f34d04

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp37-cp37m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp37-cp37m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 221.0 kB
  • Tags: CPython 3.7m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for pydivsufsort-0.0.4-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 810aa0c2b5202126f93bfdbe0635f78d532b94f612db5e4e54ad3c9776b2e6a5
MD5 faddd732e40bf7081a4688834b84bbf2
BLAKE2b-256 caa22fbe8e44c44922123f55a49e73dfab5bccf6d3d96082bd9c0af3c3cd6107

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 191.4 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.0

File hashes

Hashes for pydivsufsort-0.0.4-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 2be665f5dad535f4084712e1a743fa6e746f732b77900fba9a0cdf9e9cf4dc78
MD5 697d466d697834cb755d6dc3a8a4b405
BLAKE2b-256 b0075484a286a52184b89b2191f788a17609ef67f9dff8bbfb2bcad0121a17ce

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp36-cp36m-win32.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 155.0 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.0

File hashes

Hashes for pydivsufsort-0.0.4-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 572cd26d85173f4691947d2ca4668693c34833a3bd0a50067f13188a21af2563
MD5 2e65c09ecf9008fdca0b7e0d77f5786b
BLAKE2b-256 39a46471a9e28980eb4b058cce61bdaeea73a489d60299d056d189e8fc00c540

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 911.7 kB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.7

File hashes

Hashes for pydivsufsort-0.0.4-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 5796ab174ca2f451ae60260d9b47866cad9431edf4d1c5fa17066f0b3af30a03
MD5 0f43481a1682c048c21c76ba08396c62
BLAKE2b-256 688bf2a48b81627eb68142f912ee5cff3d50e812238ea27a20f251a7d511b9d6

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp36-cp36m-manylinux2010_i686.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp36-cp36m-manylinux2010_i686.whl
  • Upload date:
  • Size: 698.6 kB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.7

File hashes

Hashes for pydivsufsort-0.0.4-cp36-cp36m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 0e63bbd05a103bc5732f9d784748b31ace2d273893d13e965d83838b179c6e48
MD5 55b925ba29a904600b9dc3d9b6819f71
BLAKE2b-256 252eaec3e382a629ea3e6e06290a765e268b7a0b4514ae29b74328164be0268c

See more details on using hashes here.

File details

Details for the file pydivsufsort-0.0.4-cp36-cp36m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: pydivsufsort-0.0.4-cp36-cp36m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 220.9 kB
  • Tags: CPython 3.6m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for pydivsufsort-0.0.4-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 97e25c3992900ae91bd374424c4bfd8a11b1048f81f97c0e9a312be25df4364b
MD5 54180a1541733d6bafaf316db2bd5dfe
BLAKE2b-256 16bdf1aa7b59ee0eeb9587a64abba3f0dd07905042a98e80be74722ccede0950

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page