Skip to main content

CLaF: Clova Language Framework

Project description

Clova Language Framework

Documentation Status Code style: black


CLaF: Clova Language Framework

CLaF is a Language Framework built on PyTorch that provides following two high-level features:

  • Experiment enables the control of training flow in general NLP by offering various TokenMaker methods.
    • CLaF is inspired by the design principle of AllenNLP such as the higher level concepts and reusable code, but mostly based on PyTorch’s common module, so that user can easily modify the code on their demands.
  • Machine helps to combine various modules to build a NLP Machine in one place.
    • There are knowledge-based, components and trained experiments which infer 1-example in modules.

Features

Task Language Dataset Model
Natural Language Understanding English GLUE Benchmark BERT, RoBERTa
Named Entity Recognition English CoNLL 2003 BERT
Question Answering Korean KorQuAD v1.0 BiDAF, DocQA, BERT
Question Answering Engilsh SQuAD v1.1 and v2.0 - v1.1: BiDAF, DrQA, DocQA, DocQA+ELMo, QANet
- v2.0: BiDAF + No Answer, DocQA + No Answer
Semantic Parsing English WikiSQL SQLNet

Table of Contents


Installation

Requirements

  • Python 3.6
  • PyTorch >= 0.4.1
  • MeCab for Korean Tokenizer
    • sh script/install_mecab.sh

It is recommended to use the virtual environment.
Conda is the easiest way to set up a virtual environment.

conda create -n claf python=3.6
conda activate claf

(claf) ✗ pip install -r requirements.txt

Install via pip

Commands to install via pip

pip install claf

Overview

  • Multilingual modeling support (currently, English and Korean are supported).
  • Light weighted Systemization and Modularization.
  • Easy extension and implementation of models.
  • A wide variation of Experiments with reproducible and comprehensive logging
  • The metrics for services such as "1-example inference latency" are provided.
  • Easy to build of a NLP Machine by combining modules.

Experiment

  • Training Flow

images

Usage

Training

images

  1. only Arguments

    python train.py --train_file_path {file_path} --valid_file_path {file_path} --model_name {name} ...
    
  2. only BaseConfig (skip /base_config path)

    python train.py --base_config {base_config}
    
  3. BaseConfig + Arguments

    python train.py --base_config {base_config} --learning_rate 0.002
    
    • Load BaseConfig then overwrite learning_rate to 0.002

BaseConfig

Declarative experiment config (.json)

  • Simply matching with object's parameters
  • Exists samples in /base_config directory
Defined BaseConfig
Base Config:
  --base_config BASE_CONFIG
    Use pre-defined base_config:
    []


    * CoNLL 2003:
    ['conll2003/bert_large_cased']

    * GLUE:
    ['glue/qqp_roberta_base', 'glue/qnli_bert_base', 'glue/rte_bert_base', 'glue/wnli_roberta_base', 'glue/mnlim_roberta_base', 'glue/wnli_bert_base', 'glue/mnlimm_roberta_base', 'glue/cola_bert_base', 'glue/mrpc_bert_base', 'glue/mnlimm_bert_base', 'glue/stsb_bert_base', 'glue/mnlim_bert_base', 'glue/qqp_bert_base', 'glue/rte_roberta_base', 'glue/qnli_roberta_base', 'glue/sst_bert_base', 'glue/mrpc_roberta_base', 'glue/cola_roberta_base', 'glue/stsb_roberta_base', 'glue/sst_roberta_base']

    * KorQuAD:
    ['korquad/bert_base_multilingual_cased', 'korquad/bidaf', 'korquad/bert_base_multilingual_uncased', 'korquad/docqa']

    * SQuAD:
    ['squad/bert_large_uncased', 'squad/bidaf', 'squad/drqa_paper', 'squad/drqa', 'squad/bert_base_uncased', 'squad/qanet', 'squad/docqa+elmo', 'squad/bidaf_no_answer', 'squad/docqa_no_answer', 'squad/qanet_paper', 'squad/bidaf+elmo', 'squad/docqa']

    * WikiSQL:
    ['wikisql/sqlnet']

Evaluate

python eval.py <data_path> <model_checkpoint_path>
  • Example
✗ python eval.py data/squad/dev-v1.1.json logs/squad/bidaf/checkpoint/model_19.pkl
...
[INFO] - {
    "valid/loss": 2.59111491665019,
    "valid/epoch_time": 60.7434446811676,
    "valid/start_acc": 63.17880794701987,
    "valid/end_acc": 67.19016083254493,
    "valid/span_acc": 54.45600756859035,
    "valid/em": 68.10785241248817,
    "valid/f1": 77.77963381714842
}
# write predictions files (<log_dir>/predictions/predictions-valid-19.json)
  • 1-example Inference Latency (Summary)
✗ python eval.py data/squad/dev-v1.1.json logs/squad/bidaf/checkpoint/model_19.pkl
...
# Evaluate Inference Latency Mode.
...
[INFO] - saved inference_latency results. bidaf-cpu.json  # file_format: {model_name}-{env}.json

Predict

python predict.py <model_checkpoint_path> --<arguments>
  • Example
✗ python predict.py logs/squad/bidaf/checkpoint/model_19.pkl \
    --question "When was the last Super Bowl in California?" \
    --context "On May 21, 2013, NFL owners at their spring meetings in Boston voted and awarded the game to Levi's Stadium. The $1.2 billion stadium opened in 2014. It is the first Super Bowl held in the San Francisco Bay Area since Super Bowl XIX in 1985, and the first in California since Super Bowl XXXVII took place in San Diego in 2003."

>>> Predict: {'text': '2003', 'score': 4.1640071868896484}

Docker Images

  • Docker Hub
  • Run with Docker Image
    • Pull docker image ✗ docker pull claf/claf:latest
    • Run docker run --rm -i -t claf/claf:latest /bin/bash

Machine

  • Machine Architecture

images

Usage

  • Define the config file (.json) like BaseConfig in machine_config/ directory
  • Run CLaF Machine (skip /machine_config path)
✗ python machine.py --machine_config {machine_config}
  • The list of pre-defined Machine:
Machine Config:
  --machine_config MACHINE_CONFIG
    Use pre-defined machine_config (.json (.json))

    ['ko_wiki', 'nlu']

Open QA (DrQA Style)

DrQA is a system for reading comprehension applied to open-domain question answering. The system has to combine the challenges of document retrieval (finding the relevant documents) with that of machine comprehension of text (identifying the answers from those documents).

  • ko_wiki: Korean Wiki Version
✗ python machine.py --machine_config ko_wiki
...
Completed!
Question > 동학의 2대 교주 이름은?
--------------------------------------------------
Doc Scores:
 - 교주 : 0.5347289443016052
 - 이교주 : 0.4967213571071625
 - 교주도 : 0.49036136269569397
 - 동학 : 0.4800325632095337
 - 동학중학교 : 0.4352934956550598
--------------------------------------------------
Answer: [
    {
        "text": "최시형",
        "score": 11.073444366455078
    },
    {
        "text": "충주목",
        "score": 9.443866729736328
    },
    {
        "text": "반월동",
        "score": 9.37778091430664
    },
    {
        "text": "환조 이자춘",
        "score": 4.64817476272583
    },
    {
        "text": "합포군",
        "score": 3.3186707496643066
    }
]

NLU (Dialog)

The reason why NLU machine does not return the full response is that response generation may require various task-specific post-processing techniques or additional logic(e.g. API calls, template-decision rules, template filling rules, nn-based response generation model) Therefore, for flexible usage, NLU machine returns only the NLU result.

✗ python machine.py --machine_config nlu
...
Utterance > "looking for a flight from Boston to Seoul or Incheon"

NLU Result: {
    "intent": "flight",
    "slots": {
        "city.depart": ["Boston"],
        "city.dest": ["Seoul", "Incheon"]
    }
}

Contributing

Thanks for your interest in contributing! There are many ways to contribute to this project.
Get started here.

Maintainers

CLaF is currently maintained by

Citing

If you use CLaF for your work, please cite:

@misc{claf,
  author = {Lee, Dongjun and Yang, Sohee and Kim, Minjeong},
  title = {CLaF: Open-Source Clova Language Framework},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/naver/claf}}
}

We will update this bibtex with our paper.

Acknowledgements

docs/ directory which includes documentation created by Sphinx.

License

MIT license

Copyright (c) 2019-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy 
of this software and associated documentation files (the "Software"), to deal 
in the Software without restriction, including without limitation the rights 
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 
copies of the Software, and to permit persons to whom the Software is 
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all 
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 
SOFTWARE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claf-0.2.0.tar.gz (154.0 kB view details)

Uploaded Source

Built Distribution

claf-0.2.0-py2.py3-none-any.whl (247.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file claf-0.2.0.tar.gz.

File metadata

  • Download URL: claf-0.2.0.tar.gz
  • Upload date:
  • Size: 154.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for claf-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b8a72cbc0cbb0aeba7fb0395d5b7fa5dccffa2b665c85da4810cbd31022a50f8
MD5 a1979ac58939a319aaca97656df87330
BLAKE2b-256 eab8221f1eeb26c278fced8cdb9d17a22f1abb091673a333cc59edd70235d2fe

See more details on using hashes here.

File details

Details for the file claf-0.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: claf-0.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 247.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for claf-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 97091d1ad651581301e553d6d9c2d71bf43da300cce9e9e553b23de17bd0c5ed
MD5 af10dc5d6b998543cfe835ae127ca29d
BLAKE2b-256 66f5bb4e8f7e2c00a4017346725cca2a65f121a8746904f4cc732bbc81bc53a5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page