A fast Python implementation of the extended LESK algorithm for Word-Sense Disambiguation (WSD)
Project description
Le's Lesk
A fast Python 3 Word-Sense Disambiguation package (WSD) using the extended LESK algorithm
Install
lelesk
is available on PyPI and can be installed using pip
pip install lelesk
Lelesk uses NLTK lemmatizer and yawlib wordnet API.
To install NLTK data, start a Python prompt, import nltk
and then download the required data
$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download(['stopwords', 'punkt', 'averaged_perceptron_tagger', 'wordnet'])
Download and extract yawlib
pre-built databases to ~/wordnet
.
For more information:
- Installing NLTK data: https://www.nltk.org/data.html
- Installing Yawlib wordnets: https://pypi.org/project/yawlib/
Command-line tools
To disambiguate a sentence, run this command on the terminal:
python3 -m lelesk wsd "I go to the bank to get money."
To perform word-sense disambiguation on a text file, prepare a text file with each line is a sentence.
For example here is the content of the file demo.txt
I go to the bank to withdraw money.
I sat at the river bank.
you then can run the following command
# output to TTL/JSON (a single file)
python3 -m lelesk file demo.txt demo_wsd_output.json --ttl json
# output to TTL/TSV (multiple TSV files)
python3 -m lelesk file demo.txt demo_wsd_output.json --ttl tsv
Issues
If you have any issue, please report at https://github.com/letuananh/lelesk/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file lelesk-0.1.tar.gz
.
File metadata
- Download URL: lelesk-0.1.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf45ae30f9b39232def0aaf4a620d855667452cb12e4a1c69007d667b73c76a8 |
|
MD5 | a2e9334fdec5e59ea09b12aa1d439638 |
|
BLAKE2b-256 | f761397f475c65557b82aaf725e9e5724eeb38208fa1b44d925c9482e5946617 |