Python implementation of the Pass-join index
Project description
Passjoin
Python implementation of the Pass-join index.
This index allows to efficiently query similar words within a distance threshold.
The implementation is based on this paper and the existing Javascript implementation in the mnemoist package (link).
Installation
Usage
Index creation
from passjoin import Passjoin
from Levenshtein import distance # or any string distance function
max_edit_distance = 1 # maximum edit distance for retrieval
corpus = ['pierre', 'pierr', 'jean', 'jeanne']
passjoin_index = Passjoin(corpus, max_edit_distance, distance)
Index querying
passjoin_index.get_word_variations('pierre')
>> {'pierre', 'pierr'}
passjoin_index.get_word_variations('jeann')
>> {'jean', 'jeanne'}
passjoin_index.get_word_variations('jeanine')
>> {'jeanne'}
Contributing
Clone the project.
Install pipenv.
Run pipenv install --dev
Launch test with pipenv run pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
passjoin-0.0.1.tar.gz
(2.8 kB
view hashes)
Built Distribution
Close
Hashes for passjoin-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f59a009f9aa665eb52f9dd56e36420a0c7c4f2e74f35bf20819677ee4e838c5f |
|
MD5 | 78269923df40a6c8911b453d877cf0d3 |
|
BLAKE2b-256 | c8a5bf00ecaad1310fcd4d8ca1754e9aefcba1e83c01d45aa70a4eefe69901f1 |