A general purpose indexer written in Python.
Project description
A general purpose indexer written in Python. Licensed under the MIT license.
Features
The indexr.buildr
package is capable of constructing an inverted index.
The indexr.utils
package contains utilities, such as a tokenization method for converting a text to tokens.
Setup
This package can be installed using pip:
pip install indexr
Examples
In this example, an indexer is constructed for 3 files. The example uses the following 3 files:
0.txt
:
The 0th document.
1.txt
:
The 1st document.
2.txt
:
The 2nd document. Some words: repeat, repeat, repeat.
The following code sample can be found in the demo directory (demo/buildr.py
).
# Build the index
index = build_index(files, 'index', force_rebuild=True, indexer=SPIMI(show_progress=True))
# Try to find the word "1st"
print('All found occurrences of "1st":')
print(index.find('1st', frequencies=True), "\n")
# Try to find the word "The"
print('All found occurrences of "The":')
print(index.find('The', frequencies=True), "\n")
# Try to find the word "repeat"
print('All found occurrences of "repeat":')
print(index.find('repeat', frequencies=True), "\n")
It gives the following output:
>>> All found occurrences of "1st":
>>> {'1.txt': 1}
>>>
>>> All found occurrences of "The":
>>> {'0.txt': 1, '1.txt': 1, '2.txt': 1}
>>>
>>> All found occurrences of "repeat":
>>> {'2.txt': 3}
So indeed, it finds 1 occurrence of “1st”, 3 occurrences of “The” (1 occurrence in each file) and 3 occurrences of “repeat” (3 occurrences in one file).
Documentation
Credits
Tools used in rendering this package:
History
1.0.0 (2015-12-07)
First release, including the BSB algorithm and the SPIMI algorithm.
0.1.0 (2015-12-04)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for indexr-1.0.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba7ef3d9965c0597d13fdc55cf3fbcb01a5a81886f938284b1796a227323fcc5 |
|
MD5 | 07042769da83b05a950db3a708d0084d |
|
BLAKE2b-256 | 28b69e788deba0b5c41e60bff61d8c136460abc2cf9e98b5f0006a40dac22212 |