Skip to main content

A simple python spellchecker built on BK Trees and Damerau Levenshtein distance

Project description

Description

This package is intended as a lightweight, efficient, and customizable spelling corrector. By using the Damerau-Levenshtein distance and a cached BK tree, the code narrows down possible typos and ranks them based on their ordering in an dictionary.txt file.

Usage

Checker

The Checker class is the root of all of the spellchecking, and should be used under most circumstances. To load the cached BK tree (or generate a new one, should the need present itself), call the load method. It takes two paramters, a wordlist and a pickling location (defaults to bktree.pickle in the root directory). If there is no cache present at the given location, it will generate one. ### Methods:

Checker.repickle()

This should be called following dictionary modifications not made with the inbuilt updateDict function. The tree must be recalculated on change.

Checker.load()

Loads the BK tree and sets up the dictionary.

Checker.check(word,returnNum,returnType,repeat,forcePrecision)

Checks for a word or list of words. word:list or str returnNum: The number of arguments to return; 0 is all of the items found within the tolerance of the tree. 1 will return only the best element of the list, as defined by the order of the given dictionary. returnType: “pairings”, “rankings”, or “words” (default) - Pairings returns an array of each item and its ranking in the dictionary (in tuple form). I.e: [(cow,16),(frog,11)] - Rankings returns an array of just the rankings based on the dictionary. - Words returns words in respect to.

repeat: This is primarily a speed saving option. In situations involving extremely heavy programs or dictionaries, this should be set to False. It allows the tolerance to increase recursively until at least one match is found for unknown words. forcePrecision: A manual way to change the tolerance. The internal mechanism is almost always sufficient, and this should seldom be changed from its default, False. This method returns a String, an Array, or None.

Checker.updateDict(word,priority,pickle)

Inserts a word,dictionary, or list into a chosen point in the wordlist. Word can be a dictionary with the keys of the intended words and each key have a location attribute for where to insert it into the list. Lists will be inserted in reverse chronological order for priority. Strings are simply inserted. Priority defines where to put an item in the dictionary, with -1 (default) being at the very end. (low priority) Pickle defaults to true, and repickles it after the word(s) is added. This should be set to false if you intend to call repickle later, after further modifications.

Example of base code:

from pyspell.checker import *
check=Checker("./pyspell/data/wordlist.txt","./pyspell/data/bktree.pickle");
check.load();
print(check.check("grat")) # --> great
print(check.check("diiffficult"))  # ---> difficult

(example.py)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pythonspell-0.7.tar.gz (5.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page