Skip to main content

Spelling Correction

Project description

SymSpellJPy

This is a python wrapper module for a Java implementation of the SymSpell library.

Dependencies

  1. python3.6: conda create --name <ENV_NAME> python=3.6
  2. Java 1.8 SDK

Install

  1. Install Dependencies
  2. Activate the Python virtual environment: conda activate <ENV_NAME>
  3. Install SymSpellJPy: pip install symspelljpy

Usage

import symspelljpy

spell_client = symspelljpy.SymSpellClient(distance_type='QWE')
print(spell_client.lookup('plase correcme'))
{"inputText":"plase correcme","output":[{"outputText":"please correct me ","mode":"COMPOUND","distance":3.0,"count":4.4467949E7}]}

This python wrapper is build on top of the following jar file:

$ java -jar ./symspell-console/target/spellcheckclient-jar-with-dependencies.jar -h
usage: java -jar
            symspell-console-6.6-SNAPSHOT-jar-with-dependencies.jar.jar
 -b,--bigram <arg>     bi-gram dictionary file path
 -d,--distance <arg>   spelling correction distance type:
                       'VDL': vanilla Damerau Levenshtein distance.
                       'WDL': weighted Damerau Levenshtein distance.
                       'QWE': qwerty distance.
 -e,--edits <arg>      maximum number of edits (default 2)
 -h,--help             this help message
 -k,--topk <arg>       number of candidates to output (default 5)
 -m,--mode <arg>       spelling correction mode: 'SMART'(Default), 'ALL',
                       'WORD', 'COMPOUND' or 'SEGMENTATION'.
                       WORD: Individual word spelling correction.
                       COMPOUND: Compound splitting/decompounding +
                       Automatic spelling correction. Space can only be
                       inserted/deleted for a token once.
                       SEGMENTATION: Word segmentation  + Automatic
                       spelling correction. Existing spaces are allowed
                       and considered for optimum segmentation.
                       SMART: when there is no space in the input text and
                       the text length is over the maximum word length,
                       enable word segmenation. Otherwise choose COMPOUND
                       word correction model.
                       ALL: COMPOUND + SEGMENTATION.
 -t,--timer            execution time per input in milliseconds.
 -u,--unigram <arg>    uni-gram dictionary file path
 -w,--word <arg>       maximum word length for word segmentation (default
                       10)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Built Distribution

symspelljpy-0.4-py3-none-any.whl (15.1 MB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page