probabilistic spell checker
Project description
Probabilistic Spell Checker in Python
Spell checker based on edit distance (Levenshtein-ish) and word frequency in a training corpus.
For theoretical background, see Peter Norvig's article,
https://norvig.com/spell-correct.html
Comes with a dictionary trained from German Wikipedia articles.
Usage:
import json
import re
import probspellchecker
spellchecker = probspellchecker.ProbabilisticSpellChecker(
word_counts=json.load(open("dictionary_dewiki_full10plus.json")),
word_whitelist=["my", "custom", "wordlist"],
)
text = "lorem ipsum whatever"
for word in re.findall(r"\w+", text):
correction = spellchecker.correction(word.lower())
if correction:
print(correction)
else:
print(word)
Word counts is just that, a dict with word to count mapping. To build your own dictionaries, see probdict-from-dewiki.py and probdict-from-text.py. You may also specify a whitelist of words that should just be accepted by the spell checker, which is useful if your name is not in the dictionary.
If your language sports special characters like the German umlauts, you might need to pass an additional charset parameter which is a string with all allowed characters of your language, to the ProbabilisticSpellChecker. This is then used to generate candidate words.
Logging
If the logging annoys you, just shut it up:
spell_log = logging.getLogger("probspellchecker")
spell_log.setLevel(logging.ERROR) # log only errors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file probspellchecker-0.1.3.tar.gz
.
File metadata
- Download URL: probspellchecker-0.1.3.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 735c1f46a8457fa9b28eebd878991d6189517ecc05848e833dd8240f160ea9bf |
|
MD5 | 9dba22856f02b9261e66a44860c973a1 |
|
BLAKE2b-256 | a3d18230e6ee19220a074266fe59db4b1755739e7830c8973fe592e7366d281d |
File details
Details for the file probspellchecker-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: probspellchecker-0.1.3-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f497da79cb39d72af0ffdf97065c94425acc356a9442d917f90bb8eed0899029 |
|
MD5 | 14da1f3ec3af194ffe771640172d25ea |
|
BLAKE2b-256 | 4637d33a6ad59535a6dece54bc309cd565706b4824d0b28d5a5d5ba2754ab065 |