Skip to main content

Easy Natural Language Processing

Project description

Easy Natural Language Processing

eznlp is a PyTorch-based package for neural natural language processing, currently supporting:

  • Text Classification
  • Named Entity Recognition
    • Sequence Tagging
    • Span Classification
    • Boundary Selection
  • Relation Extraction
  • Attribute Extraction
  • Machine Translation
  • Image Captioning

Experiment Results

Text Classification

Dataset Language Our Best Acc. Model
IMDb English 95.78 RoBERTa-base + Attention
Yelp Full English 71.55 RoBERTa-base + Attention
Yelp 2013 English 70.80 RoBERTa-base + Attention
ChnSentiCorp Chinese 95.83 BERT-base + Attention
THUCNews-10 Chinese 98.98 RoBERTa-base + Attention

See Text Classification for more details.

Named Entity Recognition

Dataset Language Our Best F1 Model
CoNLL 2003 English 93.26 RoBERTa-large + LSTM + CRF
OntoNotes 5 English 91.05 RoBERTa-base + LSTM + CRF
MSRA Chinese 96.18 BERT + LSTM + CRF
WeiboNER v2 Chinese 70.48 BERT + LSTM + CRF
ResumeNER Chinese 95.97 BERT + LSTM + CRF
OntoNotes 4 Chinese 82.29 BERT + LSTM + CRF
OntoNotes 5 Chinese 80.31 BERT + LSTM + CRF

See Named Entity Recognition for more details.

Relation Extraction

Dataset Language Our Best F1
(Ent / Rel / Rel+)
Model
CoNLL 2004 English 89.17 / - / 75.03 SpERT (w/ RoBERTa-base + LSTM)
SciERC English 69.29 / 48.93 / 36.65 SpERT (w/ RoBERTa-base)

See Relation Extraction for more details.

Installation

With pip

$ pip install eznlp

From source

$ python setup.py sdist
$ pip install dist/eznlp-<version>.tar.gz

Running the Code

Text classification

$ python scripts/text_classification.py --dataset <dataset> [options]

Entity recognition

$ python scripts/entity_recognition.py --dataset <dataset> [options]

Relation extraction

$ python scripts/relation_extraction.py --dataset <dataset> [options]

Attribute extraction

$ python scripts/attribute_extraction.py --dataset <dataset> [options]

Citation

If you find our code useful, please cite the following papers:

@article{zhu2021boundary,
  title={Boundary Smoothing for Named Entity Recognition},
  author={Zhu, Enwei and Cai, Ting and Li, Jinpeng},
  journal={Working Paper},
  year={2021}
}
@article{zhu2021framework,
  title={A Unified Framework of Medical Information Annotation and Extraction for {Chinese} Clinical Text},
  author={Zhu, Enwei and Sheng, Qilin and Yang, Huanwan and Li, Jinpeng},
  journal={Working Paper},
  year={2021}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eznlp-0.2.3rc1.tar.gz (101.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page