A `set` subclass providing fuzzy search based on N-grams.
Here is the documentation annd the tutorial.
How does it work?
The NGram class extents the Python set class with the ability to search for set members ranked by their N-Gram string similarity to the query. There are also methods for comparing a pair of strings.
The set stores arbitrary items by using a specified “key” function to produce the string representation of set members for the n-gram indexing.
N-grams are obtained by splitting strings into overlapping substrings of N (usually N=3) characters in length.
To find items similar to a query string, it splits the query into N-grams, collects all items sharing at least one N-gram with the query, and ranks the items by score based on the ratio of shared to unshared N-grams between strings.
The starting point was the Perl String::Trigram module by Tarek Ahmed. In 2007, Michel Albert (exhuma) wrote the ngram module and submitted 2.0.0b2 to Sourceforge. Since late 2008 python-ngram has been developed by Graham Poulter, adding features, documentation, performance improvements and Python 3 support.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.