Skip to main content

Easy Natural Language Processing

Project description

Easy Natural Language Processing

eznlp is a PyTorch-based package for neural natural language processing, currently supporting:

  • Text classification
  • Named Entity Recognition
    • Sequence tagging
    • Span classification
    • Boundary Selection
  • Relation extraction
  • Attribute extraction

Experiment Results

Text Classification

Dataset Language Our Best Imp. Acc. Model Specification
IMDb English 95.78 RoBERTa-base + Attention
Yelp Full English 71.55 RoBERTa-base + Attention
Yelp 2013 English 70.80 RoBERTa-base + Attention
ChnSentiCorp Chinese 95.83 BERT-base + Attention
THUCNews-10 Chinese 98.98 RoBERTa-base + Attention

See Text Classification for more details.

Named Entity Recognition

Dataset Language Our Best Imp. F1 Model Specification
CoNLL 2003 English 93.26 RoBERTa-large + LSTM + CRF
OntoNotes v5 English 91.05 RoBERTa-base + LSTM + CRF
MSRA Chinese 96.18 BERT + LSTM + CRF
WeiboNER v2 Chinese 70.48 BERT + LSTM + CRF
ResumeNER Chinese 95.97 BERT + LSTM + CRF
OntoNotes v5 Chinese 80.31 BERT + LSTM + CRF

See Named Entity Recognition for more details.

Relation Extraction

Dataset Language Our Best Imp. F1 Model Specification
CoNLL 2004 English 89.17 / 75.03 SpERT (with RoBERTa-base + LSTM)
SciERC English 69.29 / 36.65 SpERT (with RoBERTa-base)

See Relation Extraction for more details.

Installation

With pip

$ pip install eznlp

From source

$ python setup.py sdist
$ pip install dist/eznlp-<version>.tar.gz

Running the Code

Text classification

$ python scripts/text_classification.py --dataset <dataset> [options]

Entity recognition

$ python scripts/entity_recognition.py --dataset <dataset> [options]

Relation extraction

$ python scripts/relation_extraction.py --dataset <dataset> [options]

Attribute extraction

$ python scripts/attribute_extraction.py --dataset <dataset> [options]

Citation

If you find our code useful, please cite the following paper:

@article{zhu2021framework,
  title={A Unified Framework of Medical Information Annotation and Extraction for {Chinese} Clinical Text},
  author={Zhu, Enwei and Sheng, Qilin and Yang, Huanwan and Li, Jinpeng},
  journal={Working Paper},
  year={2021}
}

Future Plans

  • SoftLexicon
  • Radical-Level Features
  • Experiments on Chinese NER datasets
  • Experiments on text classification datasets
  • Focal loss (and combined to CRF?)
  • Dice loss
  • Relation Extraction
  • Span-based models (e.g., SpERT)
  • NER / RE as MRC
  • Pair selection (multi-head selection; RE for flat entities)
  • Data Augmentation
  • LR finder for MSRA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eznlp-0.2.1rc1.tar.gz (89.4 kB view details)

Uploaded Source

File details

Details for the file eznlp-0.2.1rc1.tar.gz.

File metadata

  • Download URL: eznlp-0.2.1rc1.tar.gz
  • Upload date:
  • Size: 89.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.5.0.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for eznlp-0.2.1rc1.tar.gz
Algorithm Hash digest
SHA256 0d1cbcd9a5ad9b28df06f5c7696540eecff86da1a2629730c8dd444567b780b7
MD5 e5963cac0ffbf234723b6f20f4a1c6f1
BLAKE2b-256 ef83dba009158b890d99bcca260ce89e879cec0ef46c98da2ca09e7745b19aff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page