Multi-lingual Word Sense Disambiguation.
Project description
Word Sense Disambiguation
Installation
The easiest way to install wsd is to use pip:
pip install wsd
You will also need the The JMDict Project dictionary. You can use the following helper to download the file:
python -m wsd download jmdict
Getting Started
Currently, only JMDict model is available.
The model has not been trained yet and will currently returns all matching
entries found in the The JMDict Project.
The JMDict model can be imported from the wsd.models module:
from wsd.models import JMDict
jmdict = JMDict()
From there, you can use it to search all relevant entries in the dictionary:
for entry in jmdict.search("かんじ"):
print(entry)
# Output:
# Entry(ent_seq='1210280', ...
# Entry(ent_seq='1211690', ...
# ...
Alternatively, you can use the predict method to get the unique ent_seq of
the best entry:
jmdict.search("かんじ")
# Output:
# '1210280'
Adding more data
The training data for JMDict is sourced from the WSD Data Annotation Project.
To contribute more data:
- Create an account on Linhub
- Add your Linhub token to the
.envfile asLINHUB_TOKEN - Run the annotation interface with the following command:
task annotate
The annotation interface will be available at https://localhost:32123/.
Training a model from scratch
TODO: Add instructions.
Build using Docker
See Using Docker
Attribution and LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wsd-0.0.1rc0.tar.gz.
File metadata
- Download URL: wsd-0.0.1rc0.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e091adb81647cdc654b7d0de4deac46008f20ce314f4c86edd95877ce7f0cf3
|
|
| MD5 |
79b693fc63f9f5f7c8ea7b052048a503
|
|
| BLAKE2b-256 |
181031053da3d4c5cf8d7ad80be7b0ba1e932600b1f9a3b3acf8d51f066d1fc4
|
Provenance
The following attestation bundles were made for wsd-0.0.1rc0.tar.gz:
Publisher:
pypi.yml on linalgo/wsd
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wsd-0.0.1rc0.tar.gz -
Subject digest:
0e091adb81647cdc654b7d0de4deac46008f20ce314f4c86edd95877ce7f0cf3 - Sigstore transparency entry: 188767626
- Sigstore integration time:
-
Permalink:
linalgo/wsd@9e332876fabc8e6865055804b7eff5d8a6c82bdd -
Branch / Tag:
refs/heads/prod - Owner: https://github.com/linalgo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@9e332876fabc8e6865055804b7eff5d8a6c82bdd -
Trigger Event:
push
-
Statement type:
File details
Details for the file wsd-0.0.1rc0-py3-none-any.whl.
File metadata
- Download URL: wsd-0.0.1rc0-py3-none-any.whl
- Upload date:
- Size: 3.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4eb2aad6d344cd4380df59fcf2c10e2a434ac8c69655840c1b5ca08c5a8bbea2
|
|
| MD5 |
9bd9e7298eb5ea7f1a930e0cd27322dc
|
|
| BLAKE2b-256 |
b88ba4d96467351027657c0bf05f643ab2b0c126882db65391b0e0d0c0d93ca8
|
Provenance
The following attestation bundles were made for wsd-0.0.1rc0-py3-none-any.whl:
Publisher:
pypi.yml on linalgo/wsd
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wsd-0.0.1rc0-py3-none-any.whl -
Subject digest:
4eb2aad6d344cd4380df59fcf2c10e2a434ac8c69655840c1b5ca08c5a8bbea2 - Sigstore transparency entry: 188767627
- Sigstore integration time:
-
Permalink:
linalgo/wsd@9e332876fabc8e6865055804b7eff5d8a6c82bdd -
Branch / Tag:
refs/heads/prod - Owner: https://github.com/linalgo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@9e332876fabc8e6865055804b7eff5d8a6c82bdd -
Trigger Event:
push
-
Statement type: