Skip to main content

Python library for accessing major wordnet releases using relational databases for high performance batch processing

Project description

YAWlib - Yet Another WordNet library for Python

A Python library for accessing major WordNet releases using relational databases for high performance batch processing.

  • Princeton Wordnet 3.0
  • NTU Open Multilingual WordNet
  • Gloss WordNet
  • and more to be added in future versions

Installation

Yawlib is available on PyPI

pip install yawlib

Prebuilt database files are available on the author's Open Science Framework project page: https://osf.io/9udjk/. Download them and extract those to your home folder at ~/wordnet/. On Linux it should look something like:

/home/username/wordnet/
    - glosstag.db
    - sqlite-30.db
    - wn-ntumc.db

# or on Mac OS
/Users/username/wordnet/
    - glosstag.db
    - sqlite-30.db
    - wn-ntumc.db

On Windows

C:\Users\<username>\wordnet\
    - glosstag.db
    - sqlite-30.db
    - wn-ntumc.db

To verify that yawlib is working properly, you can use the info command.

# Show yawlib information
python3 -m yawlib info

Command-line tools

yawlib includes a command-line tool for querying wordnets directly from terminal.

For example, to search synsets by the lemma research one may use python3 -m yawlib lemma research

python3 -m yawlib lemma research

Looking for synsets by term (Provided: research | pos = None)

〔Synset〕00636921-n 〔Lemmas〕research 〔Keys〕research%1:04:00::
------------------------------------------------------------
(def) “systematic investigation to establish facts;”


〔Synset〕05797597-n 〔Lemmas〕inquiry; enquiry; research 〔Keys〕inquiry%1:09:01:: enquiry%1:09:00:: research%1:09:00::
------------------------------------------------------------
(def) “a search for knowledge;”
(ex) their pottery deserves more research than it has received;


〔Synset〕00648224-v 〔Lemmas〕research; search; explore 〔Keys〕research%2:31:00:: search%2:31:00:: explore%2:31:00::
------------------------------------------------------------
(def) “inquire into;”
(ex) the students had to research the history of the Second World War for their history project;
(ex) He searched for information on his relatives on the web;
(ex) Scientists are exploring the nature of consciousness;


〔Synset〕00877327-v 〔Lemmas〕research 〔Keys〕research%2:32:00::
------------------------------------------------------------
(def) “attempt to find out in a systematically and scientific manner;”
(ex) The student researched the history of that word;

Found 4 synset(s)

Development

Go to yawlib folder, execute the config script and then run wntk.sh to generate the glosstab DB file.

git clone https://github.com/letuananh/yawlib
cd yawlib

# create virtual environment
python3 -m venv yawlib_py3
. yawlib_py3/bin/activate

# install required packages
pip install -r requirements.txt
pip install -r requirements-optional.txt

# to show information
python -m yawlib info

Compiling glosstag.db from source

Make sure that glosstag source folder and sqlite-30.db are available in ~/wordnet. The directory should look like this:

/home/user/wordnet
├── glosstag
│   ├── dtd
│   │   └── glosstag.dtd
│   ├── LICENSE.txt
│   ├── merged
│   │   ├── adj.xml
│   │   ├── adv.xml
│   │   ├── noun.xml
│   │   └── verb.xml
│   ├── README.txt
│   ├── standoff
│   │   ├── 00
│   │   ├── 01
│   │   ├── 02
│   │   ├── ....
│   │   ├── index.byid.tab
│   │   ├── index.bylem.adj.tab
│   │   ├── index.bylem.adv.tab
│   │   ├── index.bylem.noun.tab
│   │   ├── index.bylem.tab
│   │   ├── index.bylem.verb.tab
│   │   └── index.bysk.tab
│   └── statistics.tab
├── glosstag.db
├── sqlite-30.db
├── wn-ntumc.db

The run the create command to generate the database

python -m yawlib create

Original sources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yawlib-0.1.tar.gz (38.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page