Skip to main content

A common toolkit for Grammatical Error Correction

Project description

GECOMMON: A common toolkit for Grammatical Error Correcion

You can install from PyPi:

pip install gecommon
python -m spacy download en_core_web_sm

Or, from github:

git clone https://github.com/gotutiyan/gecommon.git
cd gecommon
pip install -e ./
python -m spacy download en_core_web_sm

Features

  • CachedERRANT: Class to use ERRANT faster by caching.
  • Parallel (docs): Class to handle parallel and M2 format in the same interface.

Use cases

gecommon.CachedERRANT

You can replace parse() and annotate() of original ERRANT with .extract_edits().
This also caches the results of parse() and annotate(), thus it works faster when processing the same sentence or parallel sentence two or more times.

from gecommon import CachedERRANT
errant = CachedERRANT()
edits = errant.extract_edits('This is a sample sentences .', 'These are sample sentences .')
print(edits)

gecommon.Parallel

  • The most important feature is the ability to handle both M2 and parallel formats in the same interface.
from gecommon import Parallel
# If the input is M2 format
gec = Parallel.from_m2(
    m2=<a m2 file path>,
    ref_id=0
)
# If parallel format
gec = Parallel.from_parallel(
    src=<a src file path>,
    trg=<a trg file path>
)
# After that, you can handle the input data in the same interface.
assert gec.srcs is not None
assert gec.trgs is not None
assert gec.edits_list is not None
  • To generate error detection labels
    • You can use not only binary labels but also 4-class, 25-class, 55-class like [Yuan+ 21].
from gecommon import Parallel
gec = Parallel.from_demo()
# Sentence-level labels
print(gec.ged_labels_sent()) 
# [['INCORRECT'], ['INCORRECT'], ['CORRECT']]

# Token-level labels
print(gec.ged_labels_token(mode='cat3'))
# [['CORRECT', 'R:VERB:SVA', 'R:SPELL', 'CORRECT', 'CORRECT'],
# ['CORRECT', 'CORRECT', 'U:VERB', 'CORRECT', 'R:ORTH', 'R:ORTH', 'CORRECT', 'CORRECT'],
# ['CORRECT', 'CORRECT', 'CORRECT', 'CORRECT', 'CORRECT']]
from gecommon import Parallel
gec = Parallel.from_demo()
for edits in gec.edits_list:
    for e in edits:
        print(e.o_start, e.o_end, e.c_str)
    print('---')

# 1 2 is
# 2 2 a
# 2 3 grammatical
# ---
# 2 3 
# 4 6 grammatical
# ---
# ---

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gecommon-0.1.0.tar.gz (23.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gecommon-0.1.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file gecommon-0.1.0.tar.gz.

File metadata

  • Download URL: gecommon-0.1.0.tar.gz
  • Upload date:
  • Size: 23.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.11

File hashes

Hashes for gecommon-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d997bcd9bccd7fd4b295a6950f8dc3697bb316702562ae5a43ab4bed4ce6ea76
MD5 8ea12bba72020421b45b69e4a76fe9c6
BLAKE2b-256 971fdac960795d98f10953be432a2c1a57b9ae230066d74e73724b4eb45e0413

See more details on using hashes here.

File details

Details for the file gecommon-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gecommon-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.11

File hashes

Hashes for gecommon-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d813400b8fc0e7244bfd3649d20daff24a95a4aff87ecc09f9b08db2e1d4ee3
MD5 3b8ed6fc5f86b9da518d6cbe26d5216e
BLAKE2b-256 111f0f71830fef4988809e284061aac925f9cf4a624b88ea58e80cb3e312cf5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page