Skip to main content

A manual spell checker built on pyenchant that allows you to swiftly correct misspelled words.

Project description

forthebadge made-with-python ForTheBadge built-with-love

PyPI version shields.io Downloads Maintenance

Manual Spell Checker

A manual spell checker built on pyenchant that allows you to swiftly correct misspelled words.

Why does this exist?

While I was working on a text based multi-class classification competition, I noticed that the data contained a lot of misspelled words, errors which automated spell check packages out there couldn't fix. This was because the data had been compiled based on a survey of people who weren't native English speakers. As the there weren't many samples in the dataset (~1000), I decided to write some code for automated detection of spelling errors which I could then fix manually, and thus, this package was born.

How to install?

pip install manual_spellchecker

Features

  • All features as provided by pyenchant
  • Quickly analyze and get a list of all misspelled words
  • Can replace, skip and delete misspelled words
  • Use your favourite tokenizer for splitting words
  • Replaced misspelled words via provided suggestions by simply typing in their indices
  • Can checkpoint current set of corrections
  • Contexualized pretty printing for easy visual correction (works on both command line and notebook)

Functions and Parameters

# Initialize the spell checking object
__init__(dataframe, column_names, tokenizer=None, num_n_words_dis=5, save_path=None)
  • dataframe - Takes a pandas dataframe as input
  • column_names - Pass the column name(s) upon which you want to perform spelling correction
  • tokenizer=None - Pass your favourite tokenizer like nltk or spacy, etc. (Default: splits on space)
  • num_n_words_dis=5 - This decides how many neighbouring words to display on either side of the error
  • save_path=None - If a save path is provided, the final corrected dataframe is saved as a csv. (Default: the dataframe is not saved but returned)
# For quick analysis of all the misspelled words
spell_check()
# Returns a list of all the misspelled words
get_all_errors()
# Starts the process of correcting erroneous words
correct_words()

Important Note:

  • Type -999 into the input box to stop the error correction and save the current progress (if save_path is provided)
  • Simply press enter if you want to skip the current word
  • Type in "" or '' in the input box to delete a misspelled word

Usage

How to import?

from manual_spellchecker import spell_checker

Quick analysis of the total number of errors

# Read the data
df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, "text")
# Quick analysis
ob.spell_check()

Multiple columns can be passed for spelling correction

# Read the data
df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, ["text", "label"])
# Quick analysis
ob.spell_check()

You can pass your own tokenizers

# Import nltk's word tokenizer
from nltk import word_tokenize
# Read the data
df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, "text", word_tokenize)
# Quick analysis
ob.spell_check()

Get a list of all the errors

# Read the data
df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, "text")
# Quick analysis. This needs to be performed before getting all errors
ob.spell_check()
# Returns a list of all errors
ob.get_all_errors()

Make corrections

# Read the data
df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, "text")
# Start corrections
ob.correct_words()

To save

df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, "text", save_path="correct_train_data.csv")

Future Ideas

  • Will be adding automated, contextual error corrections

Feature Request

Drop me an email at atif.hit.hassan@gmail.com if you want any particular feature

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manual_spellchecker-1.2.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

manual_spellchecker-1.2-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file manual_spellchecker-1.2.tar.gz.

File metadata

  • Download URL: manual_spellchecker-1.2.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.4

File hashes

Hashes for manual_spellchecker-1.2.tar.gz
Algorithm Hash digest
SHA256 5b24731b9f905b2ea29c0d15a21c5a6162e22a8db3a357afb50175ab08e939b1
MD5 3e031733d503a08a85ccdb1171560bda
BLAKE2b-256 6934ab82f6526bdc1b45953751b9d8f82521aaef36146c9086724fa83459a375

See more details on using hashes here.

File details

Details for the file manual_spellchecker-1.2-py3-none-any.whl.

File metadata

  • Download URL: manual_spellchecker-1.2-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.4

File hashes

Hashes for manual_spellchecker-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c3dde9011e5f657b4a72d4f8dbff87220aae85fb6f75294cfea2501feceac7b9
MD5 ac5cda2ee8dece5f0b28acea1adb490c
BLAKE2b-256 be0e25f455176ff6f26bcd4379530143c7d6f9f621767d9d543972e7d404324a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page