Skip to main content

Multi-lingual Word Sense Disambiguation.

Project description

main License: MIT PyPI - Version

Word Sense Disambiguation

Installation

The easiest way to install wsd is to use pip:

pip install wsd

You will also need the The JMDict Project dictionary. You can use the following helper to download the file:

python -m wsd download jmdict

Getting Started

Currently, only JMDict model is available. The model has not been trained yet and will currently returns all matching entries found in the The JMDict Project.

The JMDict model can be imported from the wsd.models module:

from wsd.models import JMDict

jmdict = JMDict()

From there, you can use it to search all relevant entries in the dictionary:

for entry in jmdict.search("かんじ"):
    print(entry)
# Output:
# Entry(ent_seq='1210280', ...
# Entry(ent_seq='1211690', ...
# ...

Alternatively, you can use the predict method to get the unique ent_seq of the best entry:

jmdict.search("かんじ")
# Output:
# '1210280'

Adding more data

The training data for JMDict is sourced from the WSD Data Annotation Project.

To contribute more data:

  • Create an account on Linhub
  • Add your Linhub token to the .env file as LINHUB_TOKEN
  • Run the annotation interface with the following command: task annotate

The annotation interface will be available at https://localhost:32123/.

Training a model from scratch

TODO: Add instructions.

Build using Docker

See Using Docker

Attribution and LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsd-0.0.1rc0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wsd-0.0.1rc0-py3-none-any.whl (3.4 kB view details)

Uploaded Python 3

File details

Details for the file wsd-0.0.1rc0.tar.gz.

File metadata

  • Download URL: wsd-0.0.1rc0.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for wsd-0.0.1rc0.tar.gz
Algorithm Hash digest
SHA256 0e091adb81647cdc654b7d0de4deac46008f20ce314f4c86edd95877ce7f0cf3
MD5 79b693fc63f9f5f7c8ea7b052048a503
BLAKE2b-256 181031053da3d4c5cf8d7ad80be7b0ba1e932600b1f9a3b3acf8d51f066d1fc4

See more details on using hashes here.

Provenance

The following attestation bundles were made for wsd-0.0.1rc0.tar.gz:

Publisher: pypi.yml on linalgo/wsd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wsd-0.0.1rc0-py3-none-any.whl.

File metadata

  • Download URL: wsd-0.0.1rc0-py3-none-any.whl
  • Upload date:
  • Size: 3.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for wsd-0.0.1rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 4eb2aad6d344cd4380df59fcf2c10e2a434ac8c69655840c1b5ca08c5a8bbea2
MD5 9bd9e7298eb5ea7f1a930e0cd27322dc
BLAKE2b-256 b88ba4d96467351027657c0bf05f643ab2b0c126882db65391b0e0d0c0d93ca8

See more details on using hashes here.

Provenance

The following attestation bundles were made for wsd-0.0.1rc0-py3-none-any.whl:

Publisher: pypi.yml on linalgo/wsd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page