spaCy pipeline component for spelling correction using sysmepll.
Project description
# spaCy Symspell ## Spelling correction implementation in spaCy via Symspell
This package is a [spaCy 2.0 extension](https://spacy.io/usage/processing-pipelines#section-extensions) that adds sentnece/spelling corrections via Symspell to spaCy’s text processing pipeline.
## Installation
pip install spacy_symspell
## Notes This package is still in Alpha and there may be unforeseen errors. Dictionary loading time is also significant, can take up to 30 seconds on slow machines.
## Usage
Adding the component to the processing pipeline is relatively simple:
import spacy from spacy_symspell import SpellingCorrector
nlp = spacy.load(‘en_core_web_sm’) corrector = SpellingCorrector() nlp.add_pipe(corrector) doc = nlp(‘What doyuoknowabout antyhing’)
- for s in doc._.suggestions:#iterable
print(s) #What doyon about anything
doc._.segmentation #::segmented_string - What doyouk now about antyhing ::corrected_string - that dook now about anything
spaCy_symspell operates on Doc and Span spaCy objects. When called on a Doc or Span, the object is given two attributes: suggestions (a list of all found spelling suggestions) and segmentation (a corrected sentence in the case of ommitted spaces).
## Todo Symspell accuracy can be improved with the help of spaCy by extracting and analyzing resulting n-grams and cross-referencing with possible n-grams deductible from the character groups in the symspell result. For example the correction ‘that dook now’ leaves us with a verbless sentence, and on closer analysis will reveal that the character group ‘now’ is related with the verb ‘know’, and the verb know is associated with the n-gram ‘you know’.
## Under the hood [spacy_symspell](https://github.com/xwiz/spacy_symspell) is currently a wrapper of the [python port](https://github.com/mammothb/symspellpy) for [Symspell](https://github.com/wolfgarbe/SymSpell). For additional details, see the linked project pages.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file spacy_symspell-0.1.2.tar.gz
.
File metadata
- Download URL: spacy_symspell-0.1.2.tar.gz
- Upload date:
- Size: 3.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
79337bb996f182a9c9e84a4d3d19258020d51820065e3a0291de095d8cd7b608
|
|
MD5 |
9d7556b4c09cdedf8889ddd91225d205
|
|
BLAKE2b-256 |
c18a900a5a4f55aeb75daf107789e70bae27475f82b104436378fc1fea3eb4f1
|
File details
Details for the file spacy_symspell-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: spacy_symspell-0.1.2-py3-none-any.whl
- Upload date:
- Size: 3.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
d2c30268095fba6c344bd16c92f1362478402872874069b6224b068c27b7c694
|
|
MD5 |
29d52a759a66ddfcee3d67f40c92c841
|
|
BLAKE2b-256 |
82c19dce7f0f0e2d02692d444ef8b1746b837cf4c1532a2dd3012a8be7f7ba58
|