spaCy pipeline component for spelling correction using sysmepll.
Project description
# spaCy Symspell ## Spelling correction implementation in spaCy via Symspell
This package is a [spaCy 2.0 extension](https://spacy.io/usage/processing-pipelines#section-extensions) that adds sentnece/spelling corrections via Symspell to spaCy’s text processing pipeline.
## Installation
pip install spacy_symspell
## Notes This package is still in Alpha and there may be unforeseen errors. Dictionary loading time is also significant, can take up to 30 seconds on slow machines.
## Usage
Adding the component to the processing pipeline is relatively simple:
import spacy from spacy_symspell import SpellingCorrector
nlp = spacy.load(‘en_core_web_sm’) corrector = SpellingCorrector() nlp.add_pipe(corrector) doc = nlp(‘What doyuoknowabout antyhing’)
- for s in doc._.suggestions:#iterable
print(s) #What doyon about anything
doc._.segmentation #::segmented_string - What doyouk now about antyhing ::corrected_string - that dook now about anything
spaCy_symspell operates on Doc and Span spaCy objects. When called on a Doc or Span, the object is given two attributes: suggestions (a list of all found spelling suggestions) and segmentation (a corrected sentence in the case of ommitted spaces).
## Todo Symspell accuracy can be improved with the help of spaCy by extracting and analyzing resulting n-grams and cross-referencing with possible n-grams deductible from the character groups in the symspell result. For example the correction ‘that dook now’ leaves us with a verbless sentence, and on closer analysis will reveal that the character group ‘now’ is related with the verb ‘know’, and the verb know is associated with the n-gram ‘you know’.
## Under the hood [spacy_symspell](https://github.com/xwiz/spacy_symspell) is currently a wrapper of the [python port](https://github.com/mammothb/symspellpy) for [Symspell](https://github.com/wolfgarbe/SymSpell). For additional details, see the linked project pages.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spacy_symspell-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2c30268095fba6c344bd16c92f1362478402872874069b6224b068c27b7c694 |
|
MD5 | 29d52a759a66ddfcee3d67f40c92c841 |
|
BLAKE2b-256 | 82c19dce7f0f0e2d02692d444ef8b1746b837cf4c1532a2dd3012a8be7f7ba58 |