Skip to main content

Python implementation of SymSpell Compound

Project description

sympound-python

This library is an implementation of the SymSpellCompound algorithm in Python. It was initially forked from rcourivaud/symspellcompound although most of the code has been rewritten.

Installation

pip install sympound

Documentation

If you want a quick complete example, see example.py.

Creating the sympound object

The first step is to create an sympound object, the constructor takes two main arguments:

  • distancefun is a function that will be used to compute the distance between two strings. It takes two arguments (the two strings to compare). You typically want to use a function computing the Damerau-Levenshtein distance, but you can get more creative and use keyboard distances.
  • maxDictionaryEditDistance is the maximum distance that will be pre-computed. Increasing this parameter will return more suggestions, but also make the memory print much larger

adding dictionaries

Then some dictionaries can be added through the load_dictionary(filename) function, typically taking a file path as argument. The format of the dictionary is typically either a list of words (one per line), or a list of word and frequency (separated by a space). See example-dict.txt for an example.

You can also add entries directly with create_dictionary_entry(key, count) where key is the valid string and count the frequency associated with it. This is the advised method to use if your data is not in a simple format like the previously described dictionary.

A lot of computations happen at this stage and adding a large number of entries can easily take more than one minute, so we provide two functions to save the analyzed ductionaries as a pickle: save_pickle(filename) and load_pickle(filename), both taking a file path as argument. Note that the pickled is gzipped.

Lookup

Once the dictionaries are loaded, you can get suggestions for a string by calling lookup_compound(str, edit_distance_max), where str is the string you want to analyze and edit_distance_max is the maximum distance you want suggestions for.

The function returns a sorted list of SuggestItems, containing three fields:

  • term being the suggested fixed string
  • distance being the distance with the original string
  • count being the frequency if given in the dictionary

Maintainance

Upload on pip:

python setup.py sdist
twine upload dist/*

Copyright

The code is Copyright Esukhia, 2018, and is distributed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sympound-0.6.0.tar.gz (6.9 kB view details)

Uploaded Source

File details

Details for the file sympound-0.6.0.tar.gz.

File metadata

  • Download URL: sympound-0.6.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.6rc1

File hashes

Hashes for sympound-0.6.0.tar.gz
Algorithm Hash digest
SHA256 247f59f3c11d5ad6a59911a5a615b5a8aef8c8dd4736d419a29dc01e1ee8ad56
MD5 ceab0b40dafe72d4a471f97a3cb41655
BLAKE2b-256 4c363b463aeec546bf99bb6f74582019fabb710726a1d58342a5e8414515addd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page