Skip to main content

A Python library written in Rust that searches for substrings quickly using a Suffix Array

Project description

Logo

A Python library written in Rust that searches for substrings quickly using a Suffix Array

license Python Build PyPi

Table of Contents

About The Project

PySubstringSearch is a library designed to search over an index file for substring patterns. In order to achieve speed and efficiency, the library is written in Rust. For string indexing, the library uses libsais suffix array construction library. The index created consists of the original text and a 32bit suffix array struct. To get around the limitations of the Suffix Array Construction implementation, the library uses a proprietary container protocol to hold the original text and index in chunks of 512MB.

The module implements a method for searching.

  • search - Find different entries with the same substring concurrently. Concurrency increases as the index file grows in size with multiple inner chunks.
  • search_multiple - same as search but accepts multiple substrings in a single call

Built With

Performance

500MB File

Library Function Time #Results Improvement Factor
ripgrepy Ripgrepy('google', '500mb').run().as_string.split('\n') 47.2ms 5943 1.0x
PySubstringSearch reader.search('google') 497µs 5943 95x
ripgrepy Ripgrepy('text_two', '500mb').run().as_string.split('\n') 44.7ms 159 1.0x
PySubstringSearch reader.search('text_two') 14.9µs 159 3000x

7500MB File

Library Function Time #Results Improvement Factor
ripgrepy Ripgrepy('google', '6000mb').run().as_string.split('\n') 900ms 62834 1.0x
PySubstringSearch reader.search('google') 10.1ms 62834 89.1x
ripgrepy Ripgrepy('text_two', '6000mb').run().as_string.split('\n') 820ms 0 1.0x
PySubstringSearch reader.search('text_two') 200µs 0 4100x

Installation

pip3 install PySubstringSearch

Usage

Create an index

import pysubstringsearch

# creating a new index file
# if a file with this name is already exists, it will be overwritten
writer = pysubstringsearch.Writer(
    index_file_path='output.idx',
)

# adding entries to the new index
writer.add_entry('some short string')
writer.add_entry('another but now a longer string')
writer.add_entry('more text to add')

# adding entries from file lines
writer.add_entries_from_file_lines('input_file.txt')

# making sure the data is dumped to the file
writer.finalize()

Search a substring within an index

import pysubstringsearch

# opening an index file for searching
reader = pysubstringsearch.Reader(
    index_file_path='output.idx',
)

# lookup for a substring
reader.search('short')
>>> ['some short string']

# lookup for a substring
reader.search('string')
>>> ['some short string', 'another but now a longer string']

# lookup for multiple substrings
reader.search_multiple(
    [
        'short',
        'longer',
    ],
)
>>> ['some short string', 'another but now a longer string']

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Gal Ben David - gal@intsights.com

Project Link: https://github.com/Intsights/PySubstringSearch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pysubstringsearch-0.7.1-cp311-none-win_amd64.whl (167.3 kB view details)

Uploaded CPython 3.11 Windows x86-64

pysubstringsearch-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (266.0 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pysubstringsearch-0.7.1-cp311-cp311-macosx_11_0_arm64.whl (219.2 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

pysubstringsearch-0.7.1-cp311-cp311-macosx_10_7_x86_64.whl (234.2 kB view details)

Uploaded CPython 3.11 macOS 10.7+ x86-64

pysubstringsearch-0.7.1-cp310-none-win_amd64.whl (167.3 kB view details)

Uploaded CPython 3.10 Windows x86-64

pysubstringsearch-0.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (266.0 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pysubstringsearch-0.7.1-cp310-cp310-macosx_11_0_arm64.whl (219.2 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

pysubstringsearch-0.7.1-cp310-cp310-macosx_10_7_x86_64.whl (234.1 kB view details)

Uploaded CPython 3.10 macOS 10.7+ x86-64

pysubstringsearch-0.7.1-cp39-none-win_amd64.whl (167.3 kB view details)

Uploaded CPython 3.9 Windows x86-64

pysubstringsearch-0.7.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (266.0 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pysubstringsearch-0.7.1-cp39-cp39-macosx_11_0_arm64.whl (219.4 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

pysubstringsearch-0.7.1-cp39-cp39-macosx_10_7_x86_64.whl (234.4 kB view details)

Uploaded CPython 3.9 macOS 10.7+ x86-64

pysubstringsearch-0.7.1-cp38-none-win_amd64.whl (167.5 kB view details)

Uploaded CPython 3.8 Windows x86-64

pysubstringsearch-0.7.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (266.0 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pysubstringsearch-0.7.1-cp38-cp38-macosx_11_0_arm64.whl (219.1 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

pysubstringsearch-0.7.1-cp38-cp38-macosx_10_7_x86_64.whl (234.1 kB view details)

Uploaded CPython 3.8 macOS 10.7+ x86-64

pysubstringsearch-0.7.1-cp37-none-win_amd64.whl (167.5 kB view details)

Uploaded CPython 3.7 Windows x86-64

pysubstringsearch-0.7.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (266.0 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

pysubstringsearch-0.7.1-cp37-cp37m-macosx_11_0_arm64.whl (219.1 kB view details)

Uploaded CPython 3.7m macOS 11.0+ ARM64

pysubstringsearch-0.7.1-cp37-cp37m-macosx_10_7_x86_64.whl (234.1 kB view details)

Uploaded CPython 3.7m macOS 10.7+ x86-64

File details

Details for the file pysubstringsearch-0.7.1-cp311-none-win_amd64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 f996db385c528fd5bbf2a52e6e5bec82bf5f06fa79df65756075e93ec793c1eb
MD5 9e2b5c72c8885d8138f429afeb88e553
BLAKE2b-256 78c26455efe5a9f129e07917b77135067d71b998f126dc1ee5cff2631bb72ff1

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c8e41c9ddf2c5d234a17d1431c7c68fa804dbf1818529302a8bb74bc357692a1
MD5 4078bd5dafd73a78fd2b652711bfc0ac
BLAKE2b-256 a97f2f750572a9c2a655caa1befe16acfa958dbf3aa8f41ad2e31ebed6e6e641

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 36461c4257e8a8208248b2f5ef6ce822e940fb7ed0c88f890f7b532c4d8f8b3d
MD5 e996b46d2d526ccf717c717253218581
BLAKE2b-256 c91037dc4b35376bb42b5d087e63051ac937db7797d9823418510bc5b921f396

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp311-cp311-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 8afc514a716d59c6bed1318a2c4f7017c8c55abe20e22801d0c7f077ad946e51
MD5 d1eb14dd2116ae5eba6dd60f214d7a14
BLAKE2b-256 017f9c2c2ed7ae84576cd3f43174f1d4e9ce97fb7d8a8306ede74a254f99ab2f

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 85671c3282bce90f51d7de865c06d9300732123e72e29573007ee42098ab0cf8
MD5 a112bb7a2cce1d8d819b71639e74471c
BLAKE2b-256 c0a0e6ef8d79397224735d10dec79357f925cee2758984c6b75ea3d1a9c40bc9

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 91acc31b4ef4cee7202e0d93870c9dd238666f6a622a566b0fcaac3e4d0c81f5
MD5 53cc2e47ed6b9c655ce23f7629897a5e
BLAKE2b-256 5caa91e5432335641129a7b1068aebaf191aabee043fee8d39c69d3d8ec7b61d

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 afd3ea4663aba46b51c79029627a6407791417c47bd3dac2522f71c01b108641
MD5 904354772f9942aee9c8125af62faeb9
BLAKE2b-256 a2d4cc7a7b1c21b3075d898f240f67daa9109faad4c45fdde0cb498b480914db

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp310-cp310-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 e2ee2a3e2238964b8e24983ec2714daff22d08d40ea1ae4288deadf372a4d7af
MD5 0469cae0478bb35d7b81285fa9f915aa
BLAKE2b-256 0a14e16e0d4c8ed147df6342bcdf594e997140880ec2dde1aebf4faf27415923

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 f76d0c39eeccd28d301e100408f5b89cb399e1a305a9f011996cf699f5de5978
MD5 c92792f2f7572c2ac5394018a56ec89d
BLAKE2b-256 6c9ca18388eeaa292949f3b0393c2e52a5f5d346ad331f2b96a0bb30e9c91726

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a6490f7a5de76171442870f876af0f87a3a7e9fdb4bcc5e89913defa28cc4b02
MD5 197d37578d831059a008760fa184e435
BLAKE2b-256 8170aea042a73ae1a23075dbf77feb07eac46c5425f4954970c9d7a1b6db92f8

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 55cf6549c9f3604b1663d0a89d4e53003ab5097364d0557cf87f607ec166b115
MD5 511a39f89bb43cf2905b403d9dd489c5
BLAKE2b-256 1f2d04e15426dfc8bd4632bc4736930e4b950b95672f7bbe4fbd2209dd0bae52

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp39-cp39-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 a5b12cc74b2c74e93aaf83a4ab1435f396b63ffc4e784fc0c3ae1169ab8583b7
MD5 647778ab8899f5b57f05b614107f6bec
BLAKE2b-256 93d4719f850005a7153dc4fd9766bf1f371e1f23e8ab6539e756cd79eba989e4

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp38-none-win_amd64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 fe97bb077a672eab9875d8fb28023ea897cd1616ea5d1a81d44ed5d9743e2c62
MD5 91ad57355e29ee01eb1acb407059015f
BLAKE2b-256 e58c31bef64c1bf29431459a17f7d495683f48e810a9f2fca8b8f47f0b111161

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 145ea03bc633332edb766c20a04882f045fd2109929c475e4b240bd9ea0fb3b8
MD5 5dc5f370654226ffb86a345f6af1c30b
BLAKE2b-256 429ef9b62cb09823a2f51b6496373b6097c63c4ae863385e5bcdee86c47ab184

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f91a8d05c982aa76b79b90b3120453f768df0a7a3fa66ee2353e9ac0851ad5b9
MD5 4982e7647f1d29e2b59ad870a43edee5
BLAKE2b-256 5d3731de723861c62e7270bb796b4fed9fe7c4e2df73220a274014e1f83a79ae

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp38-cp38-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 ede8582bef7db406bb4f673b610ccbbaeb93859db9f9d9feb142a0b200d61ab1
MD5 27989163dc02f5830c39a6bfa463aad0
BLAKE2b-256 8fc61971961a7250a064da0895eacf0a5efea0d3a410bd274d2a370fc544b57a

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp37-none-win_amd64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp37-none-win_amd64.whl
Algorithm Hash digest
SHA256 4077dd6fae7cb1b301ec105cf3e4a7f3a144f7f35003931e95dd7130664f1a5d
MD5 1c72ec435befdb0e8e218ff0c3cb7cf5
BLAKE2b-256 0944b6a45ceeffdf9a58c9032cebc7d6c8e783a0011f76437609a04a84d7e09b

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a96f8ab698c31d5eaa20b043f16b4747f31dd7a3faeb491474da3a47dc57ae92
MD5 7ce4e65e6575bc832a36529c6e818504
BLAKE2b-256 e865e9d9ab346fe6514a04fa0ba703e77171dc2d9de4f84108fa39e087822384

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp37-cp37m-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp37-cp37m-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 897cd8fafb250ea8aa22188346a9ccebda5cb7676d5eadca11e42a9527ff9aed
MD5 bded8a67f4cccaa3668c892badd6d844
BLAKE2b-256 77cdee00185178515d7db9c82c663601935f9a383ae810022dd9534b7e0cf466

See more details on using hashes here.

File details

Details for the file pysubstringsearch-0.7.1-cp37-cp37m-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pysubstringsearch-0.7.1-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 61aefe880f7b44ba825bf32b1769f43c553cbab1bebd61a5b8218728ec223286
MD5 0b0ffae0ad370eff6b5b22eb39eab977
BLAKE2b-256 12470b26f96658f8cffaadabbc6fb82ae50c7b56fc3f486d4e1f9c8d49cd4da7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page