spaCy pipeline component for spelling correction using sysmepll.
Project description
# spaCy Symspell ## Spelling correction implementation in spaCy via Symspell
This package is a [spaCy 2.0 extension](https://spacy.io/usage/processing-pipelines#section-extensions) that adds sentnece/spelling corrections via Symspell to spaCy’s text processing pipeline.
## Installation
pip install spacy_symspell
## Notes This package is still in Alpha and there may be unforeseen errors. Dictionary loading time is also significant, can take up to 30 seconds on slow machines.
## Usage
Adding the component to the processing pipeline is relatively simple:
import spacy from spacy_symspell import SpellingCorrector
nlp = spacy.load(‘en_core_web_sm’) corrector = SpellingCorrector() nlp.add_pipe(corrector) doc = nlp(‘What doyuoknowabout antyhing’)
- for s in doc._.suggestions:#iterable
print(s) #What doyon about anything
doc._.segmentation #::segmented_string - What doyouk now about antyhing ::corrected_string - that dook now about anything
spaCy_symspell operates on Doc and Span spaCy objects. When called on a Doc or Span, the object is given two attributes: suggestions (a list of all found spelling suggestions) and segmentation (a corrected sentence in the case of ommitted spaces).
## Todo Symspell accuracy can be improved with the help of spaCy by extracting and analyzing resulting n-grams and cross-referencing with possible n-grams deductible from the character groups in the symspell result. For example the correction ‘that dook now’ leaves us with a verbless sentence, and on closer analysis will reveal that the character group ‘now’ is related with the verb ‘know’, and the verb know is associated with the n-gram ‘you know’.
## Under the hood [spacy_symspell](https://github.com/xwiz/spacy_symspell) is currently a wrapper of the [python port](https://github.com/mammothb/symspellpy) for [Symspell](https://github.com/wolfgarbe/SymSpell). For additional details, see the linked project pages.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spacy_symspell-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86f2d220eb4532593836e5ae548d76eea613fca4b83044515ef53396f99976bb |
|
MD5 | 484ff782977740433aa63b5b309cfb59 |
|
BLAKE2b-256 | 1f00b6c46a89b46c461c8818bf57adaa5f99cf53e5ae8d3271a67b2b413dfa06 |