Python implementation of SymSpell Compound
pip install sympound
If you want a quick complete example, see example.py.
Creating the sympound object
The first step is to create an
sympound object, the constructor takes two main arguments:
distancefunis a function that will be used to compute the distance between two strings. It takes two arguments (the two strings to compare). You typically want to use a function computing the Damerau-Levenshtein distance, but you can get more creative and use keyboard distances.
maxDictionaryEditDistanceis the maximum distance that will be pre-computed. Increasing this parameter will return more suggestions, but also make the memory print much larger
Then some dictionaries can be added through the
load_dictionary(filename) function, typically taking a file path as argument. The format of the dictionary is typically either a list of words (one per line), or a list of word and frequency (separated by a space). See example-dict.txt for an example.
You can also add entries directly with
create_dictionary_entry(key, count) where
key is the valid string and
count the frequency associated with it. This is the advised method to use if your data is not in a simple format like the previously described dictionary.
A lot of computations happen at this stage and adding a large number of entries can easily take more than one minute, so we provide two functions to save the analyzed ductionaries as a pickle:
load_pickle(filename), both taking a file path as argument. Note that the pickled is gzipped.
Once the dictionaries are loaded, you can get suggestions for a string by calling
lookup_compound(str, edit_distance_max), where
str is the string you want to analyze and
edit_distance_max is the maximum distance you want suggestions for.
The function returns a sorted list of
SuggestItems, containing three fields:
termbeing the suggested fixed string
distancebeing the distance with the original string
countbeing the frequency if given in the dictionary
Upload on pip:
python setup.py sdist twine upload dist/*
The code is Copyright Esukhia, 2018, and is distributed under the MIT License.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.