Spello: Fast and Smart Spell Correction
Project description
A Fast and Smart SpellCorrection using Sound and Edit-distance based Correction available in English and 10 indian languages.
What is it • Installation • Getting Started
What is it
Spello is a spellcorrection model using power of two model Symspell and Phoneme in backend to get the best possible spelling suggestion for misspelled words in a text. Phoneme Model uses Soundex algo in background and is responsible to suggest corrections for sound related mistakes in word. Also, we are using a modifield version of Symspell model to get spell suggestions based on edit-distances.
Currently, this module is available for English(en) and 10 other indian languages which are Hindi(hi), Marathi(mr), Bengali(bn), Punjabi(pa), Gujarati(gu), Oriya(or), Tamil(ta), Telegu(tl), Kannada(kn), Malyalam(ml).
💾 Installation
Install the spello via `pip`$ pip install spello
⚡ ️Getting Started
1. Model Initialisation
Initialise the model for one of the suppored languages.
>>> from spello.model import SpellCorrectionModel
>>> sp = SpellCorrectionModel(language='en')
2. Model Training
Here, you can provide two types of training data to the model
- List of text or sentences.
- Dict having word and their corresponding count.
Training providing list of sentences
>>> sp.train(['my name is aman', 'this is a text corpus'])
Training providing words counter
>>> sp.train({'my': 2, 'name': 1, 'aman': 1, 'text': 10, 'corpus': 5})
List of text is a recommended type for training data as here model also tries to learn context in which words are appearing, which further help to find best possible suggestion in case more than one suggestions are suggested by symspell or phoneme model
3. Model Prediction
>>> sp.spell_correct('my naem is naman')
{'original_text': 'my naem is naman',
'spell_corrected_text': 'my name is aman',
'correction_dict': {'naem': 'name', 'naman': 'aman'}
}
4. Save Model
Call the save method to save the trained model at given model dir
>>> sp.save(model_save_dir='/home/ubuntu/')
'/home/ubuntu/model.pkl' # saved model path
5. Load Model
Load the trained model from saved path, First initialise the model and call the load method
>>> from spello.model import SpellCorrectionModel
>>> sp = SpellCorrectionModel(language='en')
>>> sp.load('/home/ubuntu/model.pkl')
6. Customize Configuration of Model
Here, you are also provided to customize various configuration of the model like
- Setting minumum and maximum length eligible for spellcorrection
>>> sp.config.min_length_for_spellcorrection = 2 # default is 3
>>> sp.config.max_length_for_spellcorrection = 20 # default is 15
- Setting Max edit distance allowed for each char level for symspell and phoneme model
>>> sp.config.symspell_allowed_distance_map = {2:0, 3: 1, 4: 2, 5: 3, 6: 3, 7: 4, 8: 4, 9:5, 10:5, 11:5, 12:5, 13: 6, 14: 6, 15: 6, 16: 6, 17: 6, 18: 6, 19: 6, 20: 6}
# above dict signifies max edit distance possible for word of length 6 is 3, for length 7 is 4 and so on..
To set default config
>>> sp.set_default_config()
there are many more configurations which you can set, check this file for more details
Download Models
To get started, here are few simple models. They are trained on 30K news + 30k wikipedia sentences.
To train model for other languages, you can download data from here and follow training process.
Credits
This software uses the following open source packages:
This project follows the all-contributors specification. Contributions of any kind welcome!
Please read the contribution guidelines first.
Citing
If you use spello in a scientific publication, we would appreciate references to the following BibTex entry:
@misc{haptik2020spello,
title={spello},
author={Srivastava, Aman},
howpublished={\url{https://github.com/hellohaptik/spello}},
year={2020}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.