A common toolkit for Grammatical Error Correction
Project description
GECOMMON: A common toolkit for Grammatical Error Correcion
You can install from PyPi:
pip install gecommon
python -m spacy download en_core_web_sm
Or, from github:
git clone https://github.com/gotutiyan/gecommon.git
cd gecommon
pip install -e ./
python -m spacy download en_core_web_sm
Features
gecommon.CachedERRANT: Class to use ERRANT faster by caching.- gecommon.Parallel (docs): Class to handle parallel and M2 format in the same interface.
gecommon.utils.apply_edits: A function to apply an errant.edit.Edit sequence to a sentence.
Use cases
gecommon.CachedERRANT
You can replace parse() and annotate() of original ERRANT with .extract_edits().
This also caches the results of parse() and annotate(), thus it works faster when processing the same sentence or parallel sentence two or more times.
from gecommon import CachedERRANT
errant = CachedERRANT()
edits = errant.extract_edits('This is a sample sentences .', 'These are sample sentences .')
print(edits)
gecommon.Parallel
- The most important feature is the ability to handle both M2 and parallel formats in the same interface.
from gecommon import Parallel
# If the input is M2 format
gec = Parallel.from_m2(
m2=<a m2 file path>,
ref_id=0
)
# If parallel format
gec = Parallel.from_parallel(
src=<a src file path>,
trg=<a trg file path>
)
# After that, you can handle the input data in the same interface.
assert gec.srcs is not None
assert gec.trgs is not None
assert gec.edits_list is not None
- To generate error detection labels
- You can use not only binary labels but also 4-class, 25-class, 55-class like [Yuan+ 21].
from gecommon import Parallel
gec = Parallel.from_demo()
# Sentence-level labels
print(gec.ged_labels_sent())
# [['INCORRECT'], ['INCORRECT'], ['CORRECT']]
# Token-level labels
print(gec.ged_labels_token(mode='cat3'))
# [['CORRECT', 'R:VERB:SVA', 'R:SPELL', 'CORRECT', 'CORRECT'],
# ['CORRECT', 'CORRECT', 'U:VERB', 'CORRECT', 'R:ORTH', 'R:ORTH', 'CORRECT', 'CORRECT'],
# ['CORRECT', 'CORRECT', 'CORRECT', 'CORRECT', 'CORRECT']]
- To use edits information
- This is useful for pre-processing that requires editing information, like [Chen+ 20], [Li+ 23] and [Bout+ 23].
from gecommon import Parallel
gec = Parallel.from_demo()
for edits in gec.edits_list:
for e in edits:
print(e.o_start, e.o_end, e.c_str)
print('---')
# 1 2 is
# 2 2 a
# 2 3 grammatical
# ---
# 2 3
# 4 6 grammatical
# ---
# ---
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gecommon-0.2.0.tar.gz.
File metadata
- Download URL: gecommon-0.2.0.tar.gz
- Upload date:
- Size: 23.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f35c9f88fa2903fed37128eb0ebcac49f1954fd3e53f14c4db137da15aa7498
|
|
| MD5 |
c22f8fc5529dcc739c728c8d41bff896
|
|
| BLAKE2b-256 |
af62e3125c4387bbdc605307f21236d6a9adac5b639210c89cb8db234a5d0663
|
File details
Details for the file gecommon-0.2.0-py3-none-any.whl.
File metadata
- Download URL: gecommon-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73fc473a484fd3b5758f77aae4c71ed4c60f76a2085160e8bf33aa08163672e3
|
|
| MD5 |
d8d0fcf50156ca14c3b50251dffe96e7
|
|
| BLAKE2b-256 |
828d785cbae78665b5e7bddb6ac07158be0f62eebab185192efa42abf8af6e61
|