Skip to main content

A package to use chinese word net to achieve word sense disambigution task

Project description

Word Sense Disambiguaion by Chinese Word Net

Chinese word sense disambiguation has been known to a very difficult problem since Chinese is a complicated language. A word can have dozens or even hundreds of meanings on different occasions. Manually labels the senses of the words is labor-intensive and inefficient.

In this project, we aim to solve this problem by state-of-the-art Bert model. It gives us huge performance gains and can score roughly 82% accuracy on Chinese word sense disambiguation problem.

Prerequest

  • Input should be tokenized first. POS Tagging is preferred but not required.
  • Suppose we have m sentences and each sentence has $n_m$ words.
    • list_of_sentence[ [list_of_word[[target, pos, sense_id, sense] * $n_m$ ] *m ]

    • The following is an example that has 2 sentences, input data should be formed as following

        [[["他","Nh","",""],["由","P","",""],["昏沈","VH","",""],["的","DE","",""],["睡夢","Na","",""],["中","Ng","",""],["醒來","VH","",""],[",","COMMACATEGORY","",""]],
         [["臉","Na","",""],["上","Ncd","",""],["濕涼","VH","",""],["的","DE","",""],["騷動","Nv","",""],["是","SHI","",""],["淚","Na","",""],["。","PERIODCATEGORY","",""]]]
      

How to get sense

  • At Project root directory (same as setup.py)

      pip3 install .
      import CWN_WSD
      data = read_somewhere() #list of sentence, and sentence is composed as list of word
      sense = CWN_WSD.wsd(data)
    
  • example can be found under example folder

Acknowledgement

We thank Po-Wen Chen (b05902117@ntu.edu.tw) and Yu-Yu Wu (b06902104@ntu.edu.tw) for contributions in model development.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for CwnSenseTagger, version 0.1.6
Filename, size File type Python version Upload date Hashes
Filename, size CwnSenseTagger-0.1.6.tar.gz (12.4 kB) File type Source Python version None Upload date Hashes View
Filename, size CwnSenseTagger-0.1.6-py3-none-any.whl (28.6 kB) File type Wheel Python version py3 Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page