Skip to main content

spaCy pipeline component for spelling correction using sysmepll.

Project description

# spaCy Symspell ## Spelling correction implementation in spaCy via Symspell

This package is a [spaCy 2.0 extension](https://spacy.io/usage/processing-pipelines#section-extensions) that adds sentnece/spelling corrections via Symspell to spaCy’s text processing pipeline.

## Installation

pip install spacy_symspell

## Notes This package is still in Alpha and there may be unforeseen errors. Dictionary loading time is also significant, can take up to 30 seconds on slow machines.

## Usage

Adding the component to the processing pipeline is relatively simple:

import spacy from spacy_symspell import SpellingCorrector

nlp = spacy.load(‘en_core_web_sm’) corrector = SpellingCorrector() nlp.add_pipe(corrector) doc = nlp(‘What doyuoknowabout antyhing’)

for s in doc._.suggestions:#iterable

print(s) #What doyon about anything

doc._.segmentation #::segmented_string - What doyouk now about antyhing ::corrected_string - that dook now about anything

spaCy_symspell operates on Doc and Span spaCy objects. When called on a Doc or Span, the object is given two attributes: suggestions (a list of all found spelling suggestions) and segmentation (a corrected sentence in the case of ommitted spaces).

## Todo Symspell accuracy can be improved with the help of spaCy by extracting and analyzing resulting n-grams and cross-referencing with possible n-grams deductible from the character groups in the symspell result. For example the correction ‘that dook now’ leaves us with a verbless sentence, and on closer analysis will reveal that the character group ‘now’ is related with the verb ‘know’, and the verb know is associated with the n-gram ‘you know’.

## Under the hood [spacy_symspell](https://github.com/xwiz/spacy_symspell) is currently a wrapper of the [python port](https://github.com/mammothb/symspellpy) for [Symspell](https://github.com/wolfgarbe/SymSpell). For additional details, see the linked project pages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_symspell-0.1.2.tar.gz (3.7 MB view details)

Uploaded Source

Built Distribution

spacy_symspell-0.1.2-py3-none-any.whl (3.7 MB view details)

Uploaded Python 3

File details

Details for the file spacy_symspell-0.1.2.tar.gz.

File metadata

  • Download URL: spacy_symspell-0.1.2.tar.gz
  • Upload date:
  • Size: 3.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.3

File hashes

Hashes for spacy_symspell-0.1.2.tar.gz
Algorithm Hash digest
SHA256 79337bb996f182a9c9e84a4d3d19258020d51820065e3a0291de095d8cd7b608
MD5 9d7556b4c09cdedf8889ddd91225d205
BLAKE2b-256 c18a900a5a4f55aeb75daf107789e70bae27475f82b104436378fc1fea3eb4f1

See more details on using hashes here.

File details

Details for the file spacy_symspell-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: spacy_symspell-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.3

File hashes

Hashes for spacy_symspell-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d2c30268095fba6c344bd16c92f1362478402872874069b6224b068c27b7c694
MD5 29d52a759a66ddfcee3d67f40c92c841
BLAKE2b-256 82c19dce7f0f0e2d02692d444ef8b1746b837cf4c1532a2dd3012a8be7f7ba58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page