Skip to main content

A common toolkit for Grammatical Error Correction

Project description

GECOMMON: A common toolkit for Grammatical Error Correcion

You can install from PyPi:

pip install gecommon
python -m spacy download en_core_web_sm

Or, from github:

git clone https://github.com/gotutiyan/gecommon.git
cd gecommon
pip install -e ./
python -m spacy download en_core_web_sm

Features

  • gecommon.CachedERRANT: Class to use ERRANT faster by caching.
  • gecommon.Parallel (docs): Class to handle parallel and M2 format in the same interface.
  • gecommon.utils.apply_edits: A function to apply an errant.edit.Edit sequence to a sentence.

Use cases

gecommon.CachedERRANT

You can replace parse() and annotate() of original ERRANT with .extract_edits().
This also caches the results of parse() and annotate(), thus it works faster when processing the same sentence or parallel sentence two or more times.

from gecommon import CachedERRANT
errant = CachedERRANT()
edits = errant.extract_edits('This is a sample sentences .', 'These are sample sentences .')
print(edits)

gecommon.Parallel

  • The most important feature is the ability to handle both M2 and parallel formats in the same interface.
from gecommon import Parallel
# If the input is M2 format
gec = Parallel.from_m2(
    m2=<a m2 file path>,
    ref_id=0
)
# If parallel format
gec = Parallel.from_parallel(
    src=<a src file path>,
    trg=<a trg file path>
)
# After that, you can handle the input data in the same interface.
assert gec.srcs is not None
assert gec.trgs is not None
assert gec.edits_list is not None
  • To generate error detection labels
    • You can use not only binary labels but also 4-class, 25-class, 55-class like [Yuan+ 21].
from gecommon import Parallel
gec = Parallel.from_demo()
# Sentence-level labels
print(gec.ged_labels_sent()) 
# [['INCORRECT'], ['INCORRECT'], ['CORRECT']]

# Token-level labels
print(gec.ged_labels_token(mode='cat3'))
# [['CORRECT', 'R:VERB:SVA', 'R:SPELL', 'CORRECT', 'CORRECT'],
# ['CORRECT', 'CORRECT', 'U:VERB', 'CORRECT', 'R:ORTH', 'R:ORTH', 'CORRECT', 'CORRECT'],
# ['CORRECT', 'CORRECT', 'CORRECT', 'CORRECT', 'CORRECT']]
from gecommon import Parallel
gec = Parallel.from_demo()
for edits in gec.edits_list:
    for e in edits:
        print(e.o_start, e.o_end, e.c_str)
    print('---')

# 1 2 is
# 2 2 a
# 2 3 grammatical
# ---
# 2 3 
# 4 6 grammatical
# ---
# ---

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gecommon-0.2.0.tar.gz (23.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gecommon-0.2.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file gecommon-0.2.0.tar.gz.

File metadata

  • Download URL: gecommon-0.2.0.tar.gz
  • Upload date:
  • Size: 23.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.11

File hashes

Hashes for gecommon-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6f35c9f88fa2903fed37128eb0ebcac49f1954fd3e53f14c4db137da15aa7498
MD5 c22f8fc5529dcc739c728c8d41bff896
BLAKE2b-256 af62e3125c4387bbdc605307f21236d6a9adac5b639210c89cb8db234a5d0663

See more details on using hashes here.

File details

Details for the file gecommon-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: gecommon-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.11

File hashes

Hashes for gecommon-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 73fc473a484fd3b5758f77aae4c71ed4c60f76a2085160e8bf33aa08163672e3
MD5 d8d0fcf50156ca14c3b50251dffe96e7
BLAKE2b-256 828d785cbae78665b5e7bddb6ac07158be0f62eebab185192efa42abf8af6e61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page