Pure python spell checker based on work by Peter Norvig
Project description
# pyspellchecker
Pure Python Spell Checking based on
[Peter Norvig's](https://norvig.com/spell-correct.html) blog post on setting up
a simple spell checking algorithm.
It uses a [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance)
algorithm to find permutations within an edit distance of 2 from the
original word. It then compares all permutations (insertions, deletions,
replacements, and transpositions) to known words in a word frequency list.
Those words that are found more often in the frequency list are `more likely`
the correct results.
## Installation
The easiest method to install is using pip:
``` bash
pip install pyspellchecker
```
To install from source:
``` bash
git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python setup.py install
```
As always, I highly recommend using the [Pipenv](https://github.com/pypa/pipenv)
package to help manage dependencies!
## Quickstart
After installation, using pyspellchecker should be fairly straight forward:
``` python
from spellchecker import SpellChecker
spell = SpellChecker()
# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])
for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))
# Get a list of `likely` options
print(spell.candidates(word))
```
If the Word Frequency list is not to your liking, you can add additional text
to generate a more appropriate list for your use case.
``` python
from spellChecker import SpellChecker
spell = SpellChecker() # loads default word frequency list
spell.word_frequency.load_text_file('./my_free_text_doc.txt')
# if I just want to make sure some words are not flagged as misspelled
spell.word_frequency.load_words(['microsoft', 'apple', 'google'])
spell.known(['microsoft', 'google']) # will return both now!
```
More work in storing and loading word frequency lists is planned; stay tuned.
## Additional Methods
On-line documentation is in the future; until then you can find SpellChecker
here:
`correction(word)`: Returns the most probable result for the misspelled word
`candidates(word)`: Returns a set of possible candidates for the misspelled
word
`known([words])`: Returns those words that are in the word frequency list
`unknown([words])`: Returns those words that are not in the frequency list
`word_probability(word)`: The frequency of the given word out of all words in
the frequency list
#### The following are less likely to be needed by the user but are available:
`edit_distance_1(word)`: Returns a set of all strings at a Levenshtein Distance
of one
`edit_distance_2(word)`: Returns a set of all strings at a Levenshtein Distance
of two
Pure Python Spell Checking based on
[Peter Norvig's](https://norvig.com/spell-correct.html) blog post on setting up
a simple spell checking algorithm.
It uses a [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance)
algorithm to find permutations within an edit distance of 2 from the
original word. It then compares all permutations (insertions, deletions,
replacements, and transpositions) to known words in a word frequency list.
Those words that are found more often in the frequency list are `more likely`
the correct results.
## Installation
The easiest method to install is using pip:
``` bash
pip install pyspellchecker
```
To install from source:
``` bash
git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python setup.py install
```
As always, I highly recommend using the [Pipenv](https://github.com/pypa/pipenv)
package to help manage dependencies!
## Quickstart
After installation, using pyspellchecker should be fairly straight forward:
``` python
from spellchecker import SpellChecker
spell = SpellChecker()
# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])
for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))
# Get a list of `likely` options
print(spell.candidates(word))
```
If the Word Frequency list is not to your liking, you can add additional text
to generate a more appropriate list for your use case.
``` python
from spellChecker import SpellChecker
spell = SpellChecker() # loads default word frequency list
spell.word_frequency.load_text_file('./my_free_text_doc.txt')
# if I just want to make sure some words are not flagged as misspelled
spell.word_frequency.load_words(['microsoft', 'apple', 'google'])
spell.known(['microsoft', 'google']) # will return both now!
```
More work in storing and loading word frequency lists is planned; stay tuned.
## Additional Methods
On-line documentation is in the future; until then you can find SpellChecker
here:
`correction(word)`: Returns the most probable result for the misspelled word
`candidates(word)`: Returns a set of possible candidates for the misspelled
word
`known([words])`: Returns those words that are in the word frequency list
`unknown([words])`: Returns those words that are not in the frequency list
`word_probability(word)`: The frequency of the given word out of all words in
the frequency list
#### The following are less likely to be needed by the user but are available:
`edit_distance_1(word)`: Returns a set of all strings at a Levenshtein Distance
of one
`edit_distance_2(word)`: Returns a set of all strings at a Levenshtein Distance
of two
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyspellchecker-0.1.0.tar.gz
(2.4 MB
view hashes)
Built Distribution
Close
Hashes for pyspellchecker-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd3577dbfd42e46090c10bc7a41a54ad08655d580fb32ff2c77725d1d3fe12a5 |
|
MD5 | 84d6a60845aa015a09f471d1d3ac7d45 |
|
BLAKE2b-256 | 7f1825ae6e5dc6b9a885df9ea646fd607a050f083d595cd949b2083e27e73e8c |