Skip to main content

Spelling Correction

Project description

SymSpellJPy

This is a python wrapper module for a Java implementation of the SymSpell library.

Dependencies

  1. python3.6: conda create --name <ENV_NAME> python=3.6
  2. Java 1.8 SDK

Install

  1. Install Dependencies
  2. Activate the Python virtual environment: conda activate <ENV_NAME>
  3. Install SymSpellJPy: pip install symspelljpy

Usage

import symspelljpy

spell_client = symspelljpy.SymSpellClient(distance_type='QWE')
print(spell_client.lookup('plase correcme'))
{"inputText":"plase correcme","output":[{"outputText":"please correct me ","mode":"COMPOUND","distance":3.0,"count":4.4467949E7}]}

This python wrapper is build on top of the following jar file:

$ java -jar ./symspell-console/target/spellcheckclient-jar-with-dependencies.jar -h
usage: java -jar
            symspell-console-6.6-SNAPSHOT-jar-with-dependencies.jar.jar
 -b,--bigram <arg>     bi-gram dictionary file path
 -d,--distance <arg>   spelling correction distance type:
                       'VDL': vanilla Damerau Levenshtein distance.
                       'WDL': weighted Damerau Levenshtein distance.
                       'QWE': qwerty distance.
 -e,--edits <arg>      maximum number of edits (default 2)
 -h,--help             this help message
 -k,--topk <arg>       number of candidates to output (default 5)
 -m,--mode <arg>       spelling correction mode: 'SMART'(Default), 'ALL',
                       'WORD', 'COMPOUND' or 'SEGMENTATION'.
                       WORD: Individual word spelling correction.
                       COMPOUND: Compound splitting/decompounding +
                       Automatic spelling correction. Space can only be
                       inserted/deleted for a token once.
                       SEGMENTATION: Word segmentation  + Automatic
                       spelling correction. Existing spaces are allowed
                       and considered for optimum segmentation.
                       SMART: when there is no space in the input text and
                       the text length is over the maximum word length,
                       enable word segmenation. Otherwise choose COMPOUND
                       word correction model.
                       ALL: COMPOUND + SEGMENTATION.
 -t,--timer            execution time per input in milliseconds.
 -u,--unigram <arg>    uni-gram dictionary file path
 -w,--word <arg>       maximum word length for word segmentation (default
                       10)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for symspelljpy, version 0.4
Filename, size File type Python version Upload date Hashes
Filename, size symspelljpy-0.4-py3-none-any.whl (15.1 MB) File type Wheel Python version py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page