spelchek

A pure-python Bayesian spellchecker

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
Programming Language
Topic
- Text Processing

Project description

A cheap-ass, pure-python spellchecker based on Peter Norvig’s Python Bayes demo All the interesting work is his.

The interesting external methods are

known() filters a list of words and returns only those in the dictionary,
correct() returns the best guess for the supplied word
guesses() returns all guesses for the supplied word
add() adds a word to the dictionary, with an optional priority value

So simple uses would be something like

import spelchek
print spelchek.correct('eaxmple')
# 'example'

The current corpus of words includes about 75,000 entries. It does not include punction such as hyphens, apostrophes or spaces. The module also supports optional user-supplied dictionaries, see the documentation of spelchek.py for details.

Important Caveat

The heart of a spell checker is the dictionary, and the dictionary here is cadged together out of a bunch of free online sources. No real effort has been made to check it for accuracy, and although it’s trivially correct with several tens of thousands of words involved errors are pretty much inevitable (if you find one, feel free to submit a pull request and I’ll update corpus.txt as needed).

The algorithm is language agnostic so it should be easy to create dictionaries for languages other than English. If you come up with a non-English dictionary submit a pull request and we can extend the module to support language choice.

Installation

the module is a simple python module with no binary dependencies. The default dictionary is the file corpus.txt which lives inside the spelchek package.

You can extend the built in dictionary in two ways.

You can add words to the corpus.txt file; its’s a plain text file with words and frequency scores separated by a comma. High frequency scores make a word more likely to be suggested as a correction, where low frequencies are ‘rarer’ and so less likely to be suggested. This method is easiest if you are working with a source distributions from the github repository
You can add a custom dictionary of your own using the same , format and point to it be setting an envrionment variable called SPELCHEK. These entries will be added to the default dictionary at import time (note that they will replace the assigned priorities of existing words). This is a low-friction way to try adding non-English language support.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
Programming Language
Topic
- Text Processing

Release history Release notifications | RSS feed

This version

0.54

Dec 21, 2018

0.52

Dec 21, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spelchek-0.54.tar.gz (234.5 kB view details)

Uploaded Dec 21, 2018 Source

File details

Details for the file spelchek-0.54.tar.gz.

File metadata

Download URL: spelchek-0.54.tar.gz
Upload date: Dec 21, 2018
Size: 234.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for spelchek-0.54.tar.gz
Algorithm	Hash digest
SHA256	`2a3b1e5cdc447585aa09a446f5c253140f2f95e7d34cd3afdcf254a6c61f9ed1`
MD5	`f6c0dbe4aabd0b30179cc0490bdfa224`
BLAKE2b-256	`5918d977458016aa9cc7065369aef1f22701c13c3c60d6bc708b0ac7bd37aa62`

See more details on using hashes here.

spelchek 0.54

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Important Caveat

Installation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes