Skip to main content

Spelling Correction

Project description

SymSpellJPy

This is a python wrapper module for a Java implementation of the SymSpell library.

Dependencies

  1. python3.6: conda create --name <ENV_NAME> python=3.6
  2. Java 1.8 SDK

Install

  1. Install Dependencies
  2. Activate the Python virtual environment: conda activate <ENV_NAME>
  3. Install SymSpellJPy: pip install symspelljpy

Usage

import symspelljpy

spell_client = symspelljpy.SymSpellClient(distance_type='QWE')
print(spell_client.lookup('plase correcme'))
{"inputText":"plase correcme","output":[{"outputText":"please correct me ","mode":"COMPOUND","distance":3.0,"count":4.4467949E7}]}

This python wrapper is build on top of the following jar file:

$ java -jar ./symspell-console/target/spellcheckclient-jar-with-dependencies.jar -h
usage: java -jar
            symspell-console-6.6-SNAPSHOT-jar-with-dependencies.jar.jar
 -b,--bigram <arg>     bi-gram dictionary file path
 -d,--distance <arg>   spelling correction distance type:
                       'VDL': vanilla Damerau Levenshtein distance.
                       'WDL': weighted Damerau Levenshtein distance.
                       'QWE': qwerty distance.
 -e,--edits <arg>      maximum number of edits (default 2)
 -h,--help             this help message
 -k,--topk <arg>       number of candidates to output (default 5)
 -m,--mode <arg>       spelling correction mode: 'SMART'(Default), 'ALL',
                       'WORD', 'COMPOUND' or 'SEGMENTATION'.
                       WORD: Individual word spelling correction.
                       COMPOUND: Compound splitting/decompounding +
                       Automatic spelling correction. Space can only be
                       inserted/deleted for a token once.
                       SEGMENTATION: Word segmentation  + Automatic
                       spelling correction. Existing spaces are allowed
                       and considered for optimum segmentation.
                       SMART: when there is no space in the input text and
                       the text length is over the maximum word length,
                       enable word segmenation. Otherwise choose COMPOUND
                       word correction model.
                       ALL: COMPOUND + SEGMENTATION.
 -t,--timer            execution time per input in milliseconds.
 -u,--unigram <arg>    uni-gram dictionary file path
 -w,--word <arg>       maximum word length for word segmentation (default
                       10)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

symspelljpy-0.4-py3-none-any.whl (15.1 MB view details)

Uploaded Python 3

File details

Details for the file symspelljpy-0.4-py3-none-any.whl.

File metadata

  • Download URL: symspelljpy-0.4-py3-none-any.whl
  • Upload date:
  • Size: 15.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.10

File hashes

Hashes for symspelljpy-0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 91927419cf382e4244c6b55460137cb1278412e8198dce26974314e095a813d7
MD5 238e2ecb6d6e2b67d890ac872ec0a982
BLAKE2b-256 e05e7100af6eaead5f501a1e9e573360defd502bbce75c97498940e91c985385

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page