Skip to main content

probabilistic spell checker

Project description

Probabilistic Spell Checker in Python

Spell checker based on edit distance (Levenshtein-ish) and word frequency in a training corpus.

For theoretical background, see Peter Norvig's article,

https://norvig.com/spell-correct.html

Comes with a dictionary trained from German Wikipedia articles.

Usage:

import json
import re
import probspellchecker

spellchecker = probspellchecker.ProbabilisticSpellChecker(
    word_counts=json.load(open("dictionary_dewiki_full10plus.json")),
    word_whitelist=["my", "custom", "wordlist"],
)

text = "lorem ipsum whatever"
for word in re.findall(r"\w+", text):
    correction = spellchecker.correction(word.lower())
    if correction:
        print(correction)
    else:
        print(word)

Word counts is just that, a dict with word to count mapping. To build your own dictionaries, see probdict-from-dewiki.py and probdict-from-text.py. You may also specify a whitelist of words that should just be accepted by the spell checker, which is useful if your name is not in the dictionary.

If your language sports special characters like the German umlauts, you might need to pass an additional charset parameter which is a string with all allowed characters of your language, to the ProbabilisticSpellChecker. This is then used to generate candidate words.

Logging

If the logging annoys you, just shut it up:

spell_log = logging.getLogger("probspellchecker")
spell_log.setLevel(logging.ERROR)  # log only errors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probspellchecker-0.1.3.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

probspellchecker-0.1.3-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file probspellchecker-0.1.3.tar.gz.

File metadata

  • Download URL: probspellchecker-0.1.3.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.3

File hashes

Hashes for probspellchecker-0.1.3.tar.gz
Algorithm Hash digest
SHA256 735c1f46a8457fa9b28eebd878991d6189517ecc05848e833dd8240f160ea9bf
MD5 9dba22856f02b9261e66a44860c973a1
BLAKE2b-256 a3d18230e6ee19220a074266fe59db4b1755739e7830c8973fe592e7366d281d

See more details on using hashes here.

File details

Details for the file probspellchecker-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: probspellchecker-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.5.3

File hashes

Hashes for probspellchecker-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f497da79cb39d72af0ffdf97065c94425acc356a9442d917f90bb8eed0899029
MD5 14da1f3ec3af194ffe771640172d25ea
BLAKE2b-256 4637d33a6ad59535a6dece54bc309cd565706b4824d0b28d5a5d5ba2754ab065

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page