spaCy pipeline component for spelling correction using sysmepll.
Project description
# spaCy Symspell
## Spelling correction implementation in spaCy via Symspell
This package is a [spaCy 2.0 extension](https://spacy.io/usage/processing-pipelines#section-extensions) that adds sentnece/spelling corrections via Symspell to spaCy's text processing pipeline.
## Installation
`pip install spacy_symspell`
## Notes
This package is still in Alpha and there may be unforeseen errors. Performance improvement still needs to be done for the symspell python port.
## Usage
Adding the component to the processing pipeline is relatively simple:
```
import spacy
from spacy_symspell import SpellingCorrector
nlp = spacy.load('en_core_web_sm')
corrector = SpellingCorrector()
nlp.add_pipe(corrector)
doc = nlp('What doyuoknowabout antyhing')
doc._.suggestions #iterable
doc._.segmentation #that dook now about anything
```
spaCy_symspell operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `suggestions` (a list of all found spelling suggestions) and `segmentation` (a corrected sentence in the case of ommitted spaces).
## Todo
Symspell accuracy can be improved with the help of spaCy by extracting and analyzing resulting n-grams and cross-referencing with possible n-grams deductible from the character groups in the symspell result. For example the correction 'that dook now' leaves us with a verbless sentence, and on closer analysis will reveal that the character group 'now' is related with the verb 'know', and the verb know is associated with the n-gram 'you know'.
## Under the hood
spacy_symspell is currently a wrapper of the [python port](https://github.com/mammothb/symspellpy) for [Symspell](https://github.com/wolfgarbe/SymSpell). For additional details, see the linked project pages.
## Spelling correction implementation in spaCy via Symspell
This package is a [spaCy 2.0 extension](https://spacy.io/usage/processing-pipelines#section-extensions) that adds sentnece/spelling corrections via Symspell to spaCy's text processing pipeline.
## Installation
`pip install spacy_symspell`
## Notes
This package is still in Alpha and there may be unforeseen errors. Performance improvement still needs to be done for the symspell python port.
## Usage
Adding the component to the processing pipeline is relatively simple:
```
import spacy
from spacy_symspell import SpellingCorrector
nlp = spacy.load('en_core_web_sm')
corrector = SpellingCorrector()
nlp.add_pipe(corrector)
doc = nlp('What doyuoknowabout antyhing')
doc._.suggestions #iterable
doc._.segmentation #that dook now about anything
```
spaCy_symspell operates on `Doc` and `Span` spaCy objects. When called on a `Doc` or `Span`, the object is given two attributes: `suggestions` (a list of all found spelling suggestions) and `segmentation` (a corrected sentence in the case of ommitted spaces).
## Todo
Symspell accuracy can be improved with the help of spaCy by extracting and analyzing resulting n-grams and cross-referencing with possible n-grams deductible from the character groups in the symspell result. For example the correction 'that dook now' leaves us with a verbless sentence, and on closer analysis will reveal that the character group 'now' is related with the verb 'know', and the verb know is associated with the n-gram 'you know'.
## Under the hood
spacy_symspell is currently a wrapper of the [python port](https://github.com/mammothb/symspellpy) for [Symspell](https://github.com/wolfgarbe/SymSpell). For additional details, see the linked project pages.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spacy_symspell-0.1.0.tar.gz
(3.7 MB
view hashes)
Built Distribution
Close
Hashes for spacy_symspell-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 940b0eee74148ad114af1608cd17a973aa03e065359ab878fa1c9b959afc7f39 |
|
MD5 | f5c7077afdb34870e9ce6c5c0daf31e4 |
|
BLAKE2b-256 | 50f2dec4e9f992328a00da4cfd15ad7e4413943cbea7fb8bdda3136b43417c44 |