Skip to main content

Wavelet Matrix/Tree succinct data structure for full text search (using shellinford C++ library)

Project description

shellinford

travis-ci.org coveralls.io pyversion latest version license

Shellinford is an implementation of a Wavelet Matrix/Tree succinct data structure for document retrieval.

It is based on shellinford C++ library.

NOTE: This module requires C++11 compiler

Installation

$ pip install shellinford

Usage

Create a new FM-index instance

>>> import shellinford
>>> fm = shellinford.FMIndex()
  • shellinford.Shellinford([use_wavelet_tree=True, filename=None])

    • When given a filename, Shellinford loads FM-index data from the file

Build FM-index

>>> fm.build(['Milky Holmes', 'Sherlock "Sheryl" Shellingford', 'Milky'], 'milky.fm')
  • build([docs, filename])

    • When given a filename, Shellinford stores FM-index data to the file

Search word from FM-index

>>> for doc in fm.search('Milky'):
>>>     print('doc_id:', doc.doc_id)
>>>     print('count:', doc.count)
>>>     print('text:', doc.text)
doc_id: 0
count: [1]
text: Milky Holmes
doc_id: 2
count: [1]
text: Milky

>>> for doc in fm.search(['Milky', 'Holmes']):
>>>     print('doc_id:', doc.doc_id)
>>>     print('count:', doc.count)
>>>     print('text:', doc.text)
doc_id: 1
count: [1]
text: Milky Holmes
  • search(query, [_or=False, ignores=[]])

    • If _or = True, then “OR” search is executed, else “AND” search

    • Given ignores, “NOT” search is also executed

    • NOTE: The search function is available after FM-index is built or loaded

Count word from FM-index

>>> fm.count('Milky'):
2

>>> fm.count(['Milky', 'Holmes']):
1
  • count(query, [_or=False])

    • If _or = True, then “OR” search is executed, else “AND” search

    • NOTE: The count function is available after FM-index is built or loaded

    • This function is slightly faster than the search function

Add a document

>>> fm.push_back('Baritsu')
  • push_back(doc)

    • NOTE: A document added by this method is not available to search until build

Read FM-index from a binary file

>>> fm.read('milky_holmes.fm')
  • read(path)

Write FM-index binary to a file

>>> fm.write('milky_holmes.fm')
  • write(path)

Check Whether FM-Index contains string

>>> 'baritsu' in fm

License

  • Wrapper code is licensed under the New BSD License.

  • Bundled shellinford C++ library (c) 2012 echizen_tm is licensed under the New BSD License.

CHANGES

0.4.1 (2010-02-08)

  • Make “in” operator faster

0.4.0 (2018-09-30)

  • FMIndex.count() is added

  • No longer support Python 2.6

  • bug fix

0.3.5 (2018-09-05)

  • FMIndex.build() and FMIndex.pushback() ignore empty string

  • FMIndex supports “in” operator. (e.g., ‘a’ in fm)

  • Support Python 3.5, 3.6 and 3.7

0.3.4 (2016-10-28)

  • FMIndex.search() returns list

0.3 (2014-11-24)

  • “OR” search and “NOT” search are available in FMIndex.search().

  • FMIndex.size and FMIndex.docsize are available as property

0.2 (2014-03-28)

“AND” search is available by giving Sequence (list, tuple, etc.) FMIndex.search()

0.1 (2014-03-11)

First release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shellinford-0.4.1.tar.gz (65.0 kB view details)

Uploaded Source

Built Distributions

shellinford-0.4.1-cp39-cp39-win_amd64.whl (99.2 kB view details)

Uploaded CPython 3.9 Windows x86-64

shellinford-0.4.1-cp39-cp39-win32.whl (85.3 kB view details)

Uploaded CPython 3.9 Windows x86

shellinford-0.4.1-cp38-cp38-win_amd64.whl (98.9 kB view details)

Uploaded CPython 3.8 Windows x86-64

shellinford-0.4.1-cp38-cp38-win32.whl (85.8 kB view details)

Uploaded CPython 3.8 Windows x86

shellinford-0.4.1-cp37-cp37m-win_amd64.whl (121.6 kB view details)

Uploaded CPython 3.7m Windows x86-64

shellinford-0.4.1-cp37-cp37m-win32.whl (103.1 kB view details)

Uploaded CPython 3.7m Windows x86

shellinford-0.4.1-cp36-cp36m-win_amd64.whl (122.6 kB view details)

Uploaded CPython 3.6m Windows x86-64

shellinford-0.4.1-cp36-cp36m-win32.whl (104.0 kB view details)

Uploaded CPython 3.6m Windows x86

File details

Details for the file shellinford-0.4.1.tar.gz.

File metadata

  • Download URL: shellinford-0.4.1.tar.gz
  • Upload date:
  • Size: 65.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.2

File hashes

Hashes for shellinford-0.4.1.tar.gz
Algorithm Hash digest
SHA256 c19f125a9d22d9676dbec64c0490ddd2d95d2449363052ddc2f4a588a52b04b3
MD5 d485d6483ace46aca6b6662bea346877
BLAKE2b-256 fdd7717cc007043e951cccc6f384b25df4161cb54391b69f93c5b1b29cf9b924

See more details on using hashes here.

File details

Details for the file shellinford-0.4.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: shellinford-0.4.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 99.2 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for shellinford-0.4.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 387dcfc4eab1d02f034f040301b44fe85bae45e1b3a7c216fb11d24eb179561a
MD5 32300f3fb06f618c5e7cd68b9ddfb6f5
BLAKE2b-256 bf45f0c098da42050b3dae1a2af6b2c3b6e83ac59fbbd0697a89958df4413a79

See more details on using hashes here.

File details

Details for the file shellinford-0.4.1-cp39-cp39-win32.whl.

File metadata

  • Download URL: shellinford-0.4.1-cp39-cp39-win32.whl
  • Upload date:
  • Size: 85.3 kB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for shellinford-0.4.1-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 6cfa15eb07ed4d120f98270a47cab6ed9998552aab4e2220eaac83882d014253
MD5 84ef80b84ab3e4d390083569c64b171d
BLAKE2b-256 c1b6fd277c52c5cfcde5f85c9de958f3a02aac7a17c7520a49266cbdc4396ef9

See more details on using hashes here.

File details

Details for the file shellinford-0.4.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: shellinford-0.4.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 98.9 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for shellinford-0.4.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 2b4b5bd9b9175987ceaea76db1791123c909fa3378d0fd596882b86d18338131
MD5 caf9af5668f999da123531c64e17b4f9
BLAKE2b-256 6bbcb7e34f98040e13d688e075b11b907ad3ed793f831a22bcb51c88369d224d

See more details on using hashes here.

File details

Details for the file shellinford-0.4.1-cp38-cp38-win32.whl.

File metadata

  • Download URL: shellinford-0.4.1-cp38-cp38-win32.whl
  • Upload date:
  • Size: 85.8 kB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for shellinford-0.4.1-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 d3480e4f7e5c2033c8d82a9b56a367b3d034fe5d0a057bc9cf1442b79ad2c05b
MD5 ceb65998da3a4fc20a9ec94d190eea24
BLAKE2b-256 1016cf74911f8c30457866174e462a3c1824a36383cd9618939bf215f409dfa4

See more details on using hashes here.

File details

Details for the file shellinford-0.4.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: shellinford-0.4.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 121.6 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for shellinford-0.4.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 801ed0d8050a371ba42bb8d97c25487f040484bc958c47188cef8da7768dd18c
MD5 ca19765818a752baf066928de4b92db0
BLAKE2b-256 5ca1c75922dda3f2dc9d8835dd9619bcb5ce70d8fc5ac323ef4721cc47d25db5

See more details on using hashes here.

File details

Details for the file shellinford-0.4.1-cp37-cp37m-win32.whl.

File metadata

  • Download URL: shellinford-0.4.1-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 103.1 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for shellinford-0.4.1-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 3c2d5f05401508c17540c4afa0e7b1da224915973edc862f6e21bd91763b9189
MD5 65eb799186f17296fc779fa27f13cf5a
BLAKE2b-256 656458cad7a4ca19e1303d7f2c04c30ef9686b67e23061829298523efaf93a07

See more details on using hashes here.

File details

Details for the file shellinford-0.4.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: shellinford-0.4.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 122.6 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for shellinford-0.4.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 ac02460be8eccffd26cbfe240f51e6fbe36d36a53eab1942ea4eb6fbc463e385
MD5 5e8a3804e264f68135432d929e87e9e6
BLAKE2b-256 d0b3dfdc83468ec44bf866ec0b960333fbe0b6ed2e7be3f2cbe4144c7b830196

See more details on using hashes here.

File details

Details for the file shellinford-0.4.1-cp36-cp36m-win32.whl.

File metadata

  • Download URL: shellinford-0.4.1-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 104.0 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for shellinford-0.4.1-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 04a323dbce44234f4b8df7f61307c26022fe5e34d5ed433d8ed03df8d3b7a725
MD5 07c7eb51f777dfe91e3d64e0c9923c17
BLAKE2b-256 def5ff8b6601e0f3a5c487b716f9b1099c17906f5eda21b12ce4aebfa161a7d7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page