Skip to main content

Spello: Fast and Smart Spell Correction

Project description

A Fast and Accurate SpellCorrection using Sound and Edit-distance based Correction available in English and Hindi language.

GitHub stars Downloads Pypi package GitHub issues GitHub license Contributors

What is itInstallationGetting Started

What is it

Spello is a spellcorrection model built with combination of two models, Phoneme and Symspell Phoneme Model uses Soundex algo in background and suggests correct spellings using phonetic concepts to identify similar sounding words. On the other hand, Symspell Model uses concept of edit-distance in order to suggest correct spellings. Spello get's you best of both, taking into consideration context of the word as well.
Currently, this module is available for English(en) and Hindi(hi).

💾 Installation

▴ Back to top

Install spello via `pip`
$ pip install spello

You can either train a new model from scratch or use pre-trained model. Alternatively you can also train model for your domain and use that on priority while use pre-trained model as a fallback

⚡ ️Getting Started

▴ Back to top

1. Model Initialisation

Initialise the model for one of the suppored languages.

>>> from spello.model import SpellCorrectionModel  
>>> sp = SpellCorrectionModel(language='en')  

2. Model Training - New Model

You can choose to train model by providing data in one of the following format

  • List of text or sentences.
  • Dict having word and their corresponding count.

Training providing list of sentences

>>> sp.train(['I want to play cricket', 'this is a text corpus'])

Training providing words counter

>>> sp.train({'i': 2, 'want': 1, 'play': 1, 'cricket': 10, 'mumbai': 5})

List of text is a recommended type for training data as here model also tries to learn context in which words are appearing, which further help to find best possible suggestion in case more than one suggestions are suggested by symspell or phoneme model

3. Model Prediction

>>> sp.spell_correct('i wnt to plai kricket')  
{'original_text': 'i wnt to plai kricket',
 'spell_corrected_text': 'i want to play cricket',
 'correction_dict': {'wnt': 'want', 'plai': 'play', 'kricket': 'cricket'}
}

4. Save Model

Call the save method to save the trained model at given model dir

>>> sp.save(model_save_dir='/home/ubuntu/')
'/home/ubuntu/model.pkl' # saved model path

5. Load Model

Load the trained model from saved path, First initialise the model and call the load method

>>> from spello.model import SpellCorrectionModel
>>> sp = SpellCorrectionModel(language='en')
>>> sp.load('/home/ubuntu/model.pkl')

6. Customize Configuration of Model (Optional)

Here, you are also provided to customize various configuration of the model like

  1. Setting minumum and maximum length eligible for spellcorrection
>>> sp.config.min_length_for_spellcorrection = 4 # default is 3
>>> sp.config.max_length_for_spellcorrection = 12 # default is 15
  1. Setting Max edit distance allowed for each char level for symspell and phoneme model
>>> sp.config.symspell_allowed_distance_map = {2:0, 3: 1, 4: 2, 5: 3, 6: 3, 7: 4, 8: 4, 9:5, 10:5, 11:5, 12:5, 13: 6, 14: 6, 15: 6, 16: 6, 17: 6, 18: 6, 19: 6, 20: 6}
# above dict signifies max edit distance possible for word of length 6 is 3, for length 7 is 4 and so on..

To reset to default config

>>> sp.set_default_config()

there are many more configurations which you can set, check this file for more details

Get Started with Pre-trained Models

We have trained a basic model on 30K news + 30k wikipedia sentences
Follow below steps to get started with these model

  1. Download a pretrained model from below links

    language model size md5 hash
    en en.pkl.zip 84M ec55760a7e25846bafe90b0c9ce9b09f
    en en_large.pkl.zip 284M 9a4f5069b2395c9d5a1e8b9929e0c0a9
    hi hi.pkl.zip 75M ad8681161932fdbb8b1368bb16b9644a
    hi hi_large.pkl.zip 341M 0cc73068f88a73612e7dd84434ad61e6
  2. Unzip the downloaded file

  3. Init and Load the model by specifying path of unzipped file

>>> from spello.model import SpellCorrectionModel
>>> sp = SpellCorrectionModel(language='en')
>>> sp.load('/path/to/file/en.pkl')
  1. Run the spell correction
>>> sp.spell_correct('i wnt to plei futbal')
{'original_text': 'i wnt to plei futbal',
 'spell_corrected_text': 'i want to play football',
 'correction_dict': {'wnt': 'want', 'plei': 'play', 'futbal': 'football'}
}

To train model for other languages, you can download data from here and follow training process.

Credits

This software uses the following open source packages:

Contribution guidelines

This project follows the all-contributors specification. Contributions of any kind welcome!

Please read the contribution guidelines first.

Future Scope / Limitations

One of the limitations of the current model is, it does not suggest corrections for any grammatical mistakes or for words in the vocabulary of the model. For example, in a sentence “I want to by Apple”, it will not suggest any correction for “by” as it is a valid English word but the correct replacement should be "buy".

In a future release, we will be adding features to suggest corrections for improper use word in a sentence.

Citing

▴ Back to top

If you use spello in a scientific publication, we would appreciate references to the following BibTex entry:

@misc{haptik2020spello,
  title={spello},
  author={Srivastava Aman, Reddy SL Ruthvik },
  howpublished={\url{https://github.com/hellohaptik/spello}},
  year={2020}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spello-1.3.0.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

spello-1.3.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file spello-1.3.0.tar.gz.

File metadata

  • Download URL: spello-1.3.0.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.60.0 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.9

File hashes

Hashes for spello-1.3.0.tar.gz
Algorithm Hash digest
SHA256 a927758f5fd7bcd75b22ebbb04f375958dc9557d6fe75cdab6efdfc235b601d6
MD5 67e8d50caefd8783b01995abf7ade07c
BLAKE2b-256 b77ceae8bda3ef7e1cb5c09147b7a25c27f1bd02b25b3235329d80b7abe8a34e

See more details on using hashes here.

File details

Details for the file spello-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: spello-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 25.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.60.0 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.9

File hashes

Hashes for spello-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c66e8cd73dce5c7f7d4ec7af2ffd037b7249b49a25c46062d9775aafa1b56137
MD5 c06dea99a26d2ad1cca828125551d788
BLAKE2b-256 0abd594aab741948d7e079310175301d594d98732d1363af3b5d133f437e70a6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page