Skip to main content

Spello: Fast and Smart Spell Correction

Project description

A Fast and Smart SpellCorrection using Sound and Edit-distance based Correction available in English and 10 indian languages.

GitHub stars Downloads Pypi package GitHub issues GitHub license Contributors

What is itInstallationGetting Started

What is it

Spello is a spellcorrection model using power of two model Symspell and Phoneme in backend to get the best possible spelling suggestion for misspelled words in a text. Phoneme Model uses Soundex algo in background and is responsible to suggest corrections for sound related mistakes in word. Also, we are using a modifield version of Symspell model to get spell suggestions based on edit-distances.
Currently, this module is available for English(en) and 10 other indian languages which are Hindi(hi), Marathi(mr), Bengali(bn), Punjabi(pa), Gujarati(gu), Oriya(or), Tamil(ta), Telegu(tl), Kannada(kn), Malyalam(ml).

💾 Installation

▴ Back to top

Install the spello via `pip`
$ pip install spello

⚡ ️Getting Started

▴ Back to top

1. Model Initialisation

Initialise the model for one of the suppored languages.

>>> from spello.model import SpellCorrectionModel  
>>> sp = SpellCorrectionModel(language='en')  

2. Model Training

Here, you can provide two types of training data to the model

  • List of text or sentences.
  • Dict having word and their corresponding count.

Training providing list of sentences

>>> sp.train(['my name is aman', 'this is a text corpus'])

Training providing words counter

>>> sp.train({'my': 2, 'name': 1, 'aman': 1, 'text': 10, 'corpus': 5})

List of text is a recommended type for training data as here model also tries to learn context in which words are appearing, which further help to find best possible suggestion in case more than one suggestions are suggested by symspell or phoneme model

3. Model Prediction

>>> sp.spell_correct('my naem is naman')  
{'original_text': 'my naem is naman',
 'spell_corrected_text': 'my name is aman',
 'correction_dict': {'naem': 'name', 'naman': 'aman'}
}

4. Save Model

Call the save method to save the trained model at given model dir

>>> sp.save(model_save_dir='/home/ubuntu/')
'/home/ubuntu/model.pkl' # saved model path

5. Load Model

Load the trained model from saved path, First initialise the model and call the load method

>>> from spello.model import SpellCorrectionModel
>>> sp = SpellCorrectionModel(language='en')
>>> sp.load('/home/ubuntu/model.pkl')

6. Customize Configuration of Model

Here, you are also provided to customize various configuration of the model like

  1. Setting minumum and maximum length eligible for spellcorrection
>>> sp.config.min_length_for_spellcorrection = 2 # default is 3
>>> sp.config.max_length_for_spellcorrection = 20 # default is 15
  1. Setting Max edit distance allowed for each char level for symspell and phoneme model
>>> sp.config.symspell_allowed_distance_map = {2:0, 3: 1, 4: 2, 5: 3, 6: 3, 7: 4, 8: 4, 9:5, 10:5, 11:5, 12:5, 13: 6, 14: 6, 15: 6, 16: 6, 17: 6, 18: 6, 19: 6, 20: 6}
# above dict signifies max edit distance possible for word of length 6 is 3, for length 7 is 4 and so on..

To set default config

>>> sp.set_default_config()

there are many more configurations which you can set, check this file for more details

Download Models

To get started, here are few simple models. They are trained on 30K news + 30k wikipedia sentences.

To train model for other languages, you can download data from here and follow training process.

Credits

This software uses the following open source packages:

This project follows the all-contributors specification. Contributions of any kind welcome!

Please read the contribution guidelines first.

Citing

▴ Back to top

If you use spello in a scientific publication, we would appreciate references to the following BibTex entry:

@misc{haptik2020spello,
  title={spello},
  author={Srivastava, Aman},
  howpublished={\url{https://github.com/hellohaptik/spello}},
  year={2020}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spello-1.0.0.tar.gz (23.2 kB view hashes)

Uploaded Source

Built Distribution

spello-1.0.0-py3-none-any.whl (22.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page