A manual spell checker built on pyenchant that allows you to swiftly correct misspelled words.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Manual Spell Checker

A manual spell checker built on pyenchant that allows you to swiftly correct misspelled words.

Why does this exist?

While I was working on a text based multi-class classification competition, I noticed that the data contained a lot of misspelled words, errors which automated spell check packages out there couldn't fix. This was because the data had been compiled based on a survey of people who weren't native English speakers. As the there weren't many samples in the dataset (~1000), I decided to write some code for automated detection of spelling errors which I could then fix manually, and thus, this package was born.

How to install?

pip install manual_spellchecker

How to use it?

Parameters

dataframe - Takes a pandas dataframe as input
column_names - Pass the column names upon which you want to perform spelling correction
tokenizer=None - Pass your favourite tokenizer like nltk or spacy, etc. (Default: splits on space)
num_n_words_dis=5 - This decides how many neighbouring words to display on either side of the error
save_path=None - If a save path is provided, the final corrected dataframe is saved as a csv. (Default: the dataframe is not saved externally)

Functions

spell_check - Prints the total number of suspected errors
get_all_errors - Returns a list of all the suspected errors
correct_words - Starts the process of manual correction

Important Note: Type -999 into the input box to stop the error correction and save the current progress (is save_path is provided) P.S.: As the package is built on pyenchant, it also provides suggestions while performing corrections

How to import?

from manual_spellchecker import spell_checker

Quick analysis of the total number of errors

df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, "text")
# Quick analysis
ob.spell_check()

Multiple columns can be passed for spelling correction

df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, ["text", "label"])
# Quick analysis
ob.spell_check()

Tokenizers affect the type/number of error(s)

from nltk.tokenize import word_tokenize
df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, "text", tokenizer=word_tokenize)
# Quick analysis
ob.spell_check()

Get a list of all the errors

df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, "text")
# Get all the errors as a list
ob.get_all_errors()

Make corrections

df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, "text")
# Perform correction
ob.correct_words()

To save

df = pd.read_csv("Train.csv")
# Initialize the model
ob = spell_checker(df, "text", save_path="correct_train_data.csv")

Future Ideas

Will be adding automated, contextual error corrections

Feature Request

Drop me an email at atif.hit.hassan@gmail.com if you want any particular feature

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.2

Jun 16, 2020

1.1

Jun 16, 2020

This version

1.0

May 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manual_spellchecker-1.0.tar.gz (13.3 kB view hashes)

Uploaded May 27, 2020 Source

Built Distribution

manual_spellchecker-1.0-py3-none-any.whl (13.7 kB view hashes)

Uploaded May 27, 2020 Python 3

Hashes for manual_spellchecker-1.0.tar.gz

Hashes for manual_spellchecker-1.0.tar.gz
Algorithm	Hash digest
SHA256	`260df14e05184c09f414b3cc398a1be75f9116ebebee54be255dbf9a7ca79054`
MD5	`b0c2b169d126c9626bf81b3298e3bc36`
BLAKE2b-256	`719a70aa75cb0cf5a1eefbc6b68d7e4c48c003b4127a315f25f0d38872f3f877`

Hashes for manual_spellchecker-1.0-py3-none-any.whl

Hashes for manual_spellchecker-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6493a4845a64f1589aab4d36a27929fed15605267b47c6d9709277b1128e0804`
MD5	`ac8f856a75cf51cfc7947dd5727b9e94`
BLAKE2b-256	`b1eefb4f8b39c691f4dbb1a18612c008dd14fdaf2f5c48b1602c8a5bb99a3dfc`