Skip to main content

A package to use chinese word net to achieve word sense disambigution task

Project description

Word Sense Disambiguaion by Chinese Word Net

Chinese word sense disambiguation has been known to a very difficult problem since Chinese is a complicated language. A word can have dozens or even hundreds of meanings on different occasions. Manually labels the senses of the words is labor-intensive and inefficient.

In this project, we aim to solve this problem by state-of-the-art Bert model. It gives us huge performance gains and can score roughly 82% accuracy on Chinese word sense disambiguation problem.

Prerequest

  • Input should be tokenized first. POS Tagging is preferred but not required.
  • Suppose we have m sentences and each sentence has $n_m$ words.
    • list_of_sentence[ [list_of_word[[target, pos, sense_id, sense] * $n_m$ ] *m ]

    • The following is an example that has 2 sentences, input data should be formed as following

        [[["他","Nh","",""],["由","P","",""],["昏沈","VH","",""],["的","DE","",""],["睡夢","Na","",""],["中","Ng","",""],["醒來","VH","",""],[",","COMMACATEGORY","",""]],
         [["臉","Na","",""],["上","Ncd","",""],["濕涼","VH","",""],["的","DE","",""],["騷動","Nv","",""],["是","SHI","",""],["淚","Na","",""],["。","PERIODCATEGORY","",""]]]
      

How to get sense

  • At Project root directory (same as setup.py)

      pip3 install .
      import CWN_WSD
      data = read_somewhere() #list of sentence, and sentence is composed as list of word
      sense = CWN_WSD.wsd(data)
    
  • example can be found under example folder

Acknowledgement

We thank Po-Wen Chen (b05902117@ntu.edu.tw) and Yu-Yu Wu (b06902104@ntu.edu.tw) for contributions in model development.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CwnSenseTagger-0.1.6.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

CwnSenseTagger-0.1.6-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file CwnSenseTagger-0.1.6.tar.gz.

File metadata

  • Download URL: CwnSenseTagger-0.1.6.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for CwnSenseTagger-0.1.6.tar.gz
Algorithm Hash digest
SHA256 f3a7a720ab81c452cf87294f84bd9e62357000127db39c32526da19fdeddafa0
MD5 3498e3e6edafbb8282ee5bae001a4142
BLAKE2b-256 ebc625c631074892e0a89df17a9731fd9c40931021de59afaf76fdd69f2e6a04

See more details on using hashes here.

File details

Details for the file CwnSenseTagger-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: CwnSenseTagger-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for CwnSenseTagger-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 00727031fa61585d8b21465630b134b35be37138b6edbad7b7f43c76769a1ca0
MD5 b3ae37e46504bdc5d4706fb28048fe37
BLAKE2b-256 be16a62b5adce6ce01af9a4a34beebde09adc5cd220a95f0ca4f0e8e224f38e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page