Skip to main content

Easy Natural Language Processing

Project description

Easy Natural Language Processing

eznlp is a PyTorch-based package for neural natural language processing, currently supporting:

  • Text classification
  • Named Entity Recognition
    • Sequence tagging
    • Span classification
    • Boundary Selection
  • Relation extraction
  • Attribute extraction

Experiment Results

Text Classification

Dataset Language Our Best Imp. Acc. Model Specification
IMDb English 95.78 RoBERTa-base + Attention
Yelp Full English 71.55 RoBERTa-base + Attention
Yelp 2013 English 70.80 RoBERTa-base + Attention
ChnSentiCorp Chinese 95.83 BERT-base + Attention
THUCNews-10 Chinese 98.98 RoBERTa-base + Attention

See Text Classification for more details.

Named Entity Recognition

Dataset Language Our Best Imp. F1 Model Specification
CoNLL 2003 English 93.26 RoBERTa-large + LSTM + CRF
OntoNotes v5 English 91.05 RoBERTa-base + LSTM + CRF
MSRA Chinese 96.18 BERT + LSTM + CRF
WeiboNER v2 Chinese 70.48 BERT + LSTM + CRF
ResumeNER Chinese 95.97 BERT + LSTM + CRF
OntoNotes v5 Chinese 80.31 BERT + LSTM + CRF

See Named Entity Recognition for more details.

Relation Extraction

Dataset Language Our Best Imp. F1 Model Specification
CoNLL 2004 English 89.17 / 75.03 SpERT (with RoBERTa-base + LSTM)
SciERC English 69.29 / 36.65 SpERT (with RoBERTa-base)

See Relation Extraction for more details.

Installation

With pip

$ pip install eznlp

From source

$ python setup.py sdist
$ pip install dist/eznlp-<version>.tar.gz

Running the Code

Text classification

$ python scripts/text_classification.py --dataset <dataset> [options]

Entity recognition

$ python scripts/entity_recognition.py --dataset <dataset> [options]

Relation extraction

$ python scripts/relation_extraction.py --dataset <dataset> [options]

Attribute extraction

$ python scripts/attribute_extraction.py --dataset <dataset> [options]

Citation

If you find our code useful, please cite the following paper:

@article{zhu2021framework,
  title={A Unified Framework of Medical Information Annotation and Extraction for {Chinese} Clinical Text},
  author={Zhu, Enwei and Sheng, Qilin and Yang, Huanwan and Li, Jinpeng},
  journal={Working Paper},
  year={2021}
}

Future Plans

  • SoftLexicon
  • Radical-Level Features
  • Experiments on Chinese NER datasets
  • Experiments on text classification datasets
  • Focal loss (and combined to CRF?)
  • Dice loss
  • Relation Extraction
  • Span-based models (e.g., SpERT)
  • NER / RE as MRC
  • Pair selection (multi-head selection; RE for flat entities)
  • Data Augmentation
  • LR finder for MSRA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eznlp-0.2.1rc1.tar.gz (89.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page