Wordnet interface library (forked version with perfomance improvements)
Project description
a Python library for wordnets
Available Wordnets
| Documentation
| FAQ
| Migrating from NLTK
| Roadmap
Wn is a Python library for exploring information in wordnets. Install it from PyPI and download some data:
$ pip install wn
$ python -m wn download oewn:2021 # the Open English WordNet 2021
Then start exploring:
>>> import wn
>>> en = wn.Wordnet('oewn:2021') # Create Wordnet object to query
>>> ss = en.synsets('win')[0] # Get the first synset for 'win'
>>> ss.definition() # Get the synset's definition
'be the winner in a contest or competition; be victorious'
Features
- Multilingual by design; first-class support for wordnets in any language
- Interlingual queries via the Collaborative Interlingual Index
- Six similarity metrics
- Functions for exploring taxonomies
- Support for lemmatization (Morphy for English is built-in) and unicode normalization
- Full support of the WN-LMF 1.1 format, including word pronunciations and lexicon extensions
- SQL-based backend offers very fast startup and improved performance on many kinds of queries
Available Wordnets
Any WN-LMF-formatted wordnet can be added to Wn's database from a local file or remote URL, but Wn also maintains an index (see wn/index.toml) of available projects, similar to a package manager for software, to aid in the discovery and downloading of new wordnets. The projects in this index are listed below.
English Wordnets
There are several English wordnets available. In general it is recommended to use the latest Open English Wordnet, but if you have stricter compatibility needs for, e.g., experiment replicability, you may try the OMW English Wordnet based on WordNet 3.0 (compatible with the Princeton WordNet 3.0 and with the NLTK), or OpenWordnet-EN (for use with the Portuguese wordnet OpenWordnet-PT).
Name | Specifier | # Synsets | Notes |
---|---|---|---|
Open English WordNet | oewn:2021 ewn:2020 ewn:2019 |
120039 120053 117791 |
Recommended |
OMW English Wordnet based on WordNet 3.0 | omw-en:1.4 |
117659 | Included with omw:1.4 |
OMW English Wordnet based on WordNet 3.1 | omw-en31:1.4 |
117791 | |
OpenWordnet-EN | own-en:1.0.0 |
117659 | Included with own:1.0.0 |
Other Wordnets and Collections
These are standalone non-English wordnets and collections. The wordnets of each collection are listed further down.
Name | Specifier | # Synsets | Language |
---|---|---|---|
Open Multilingual Wordnet | omw:1.4 |
n/a | multiple [mul] |
Open German WordNet | odenet:1.4 odenet:1.3 |
36268 36159 |
German [de] |
Open Wordnets for Portuguese and English | own:1.0.0 |
n/a | multiple [mul] |
KurdNet | kurdnet:1.0 |
2144 | Kurdish [ckb] |
Open Multilingual Wordnet (OMW) Collection
The Open Multilingual Wordnet collection (omw:1.4
) installs the
following lexicons (from
here) which can
also be downloaded and installed independently:
Name | Specifier | # Synsets | Language |
---|---|---|---|
Albanet | omw-sq:1.4 |
4675 | Albanian [sq] |
Arabic WordNet (AWN v2) | omw-arb:1.4 |
9916 | Arabic [arb] |
BulTreeBank Wordnet (BTB-WN) | omw-bg:1.4 |
4959 | Bulgarian [bg] |
Chinese Open Wordnet | omw-cmn:1.4 |
42312 | Mandarin (Simplified) [cmn-Hans] |
Croatian Wordnet | omw-hr:1.4 |
23120 | Croatian [hr] |
DanNet | omw-da:1.4 |
4476 | Danish [da] |
FinnWordNet | omw-fi:1.4 |
116763 | Finnish [fi] |
Greek Wordnet | omw-el:1.4 |
18049 | Greek [el] |
Hebrew Wordnet | omw-he:1.4 |
5448 | Hebrew [he] |
IceWordNet | omw-is:1.4 |
4951 | Icelandic [is] |
Italian Wordnet | omw-iwn:1.4 |
15563 | Italian [it] |
Japanese Wordnet | omw-ja:1.4 |
57184 | Japanese [ja] |
Lithuanian WordNet | omw-lt:1.4 |
9462 | Lithuanian [lt] |
Multilingual Central Repository | omw-ca:1.4 |
45826 | Catalan [ca] |
Multilingual Central Repository | omw-eu:1.4 |
29413 | Basque [eu] |
Multilingual Central Repository | omw-gl:1.4 |
19312 | Galician [gl] |
Multilingual Central Repository | omw-es:1.4 |
38512 | Spanish [es] |
MultiWordNet | omw-it:1.4 |
35001 | Italian [it] |
Norwegian Wordnet | omw-nb:1.4 |
4455 | Norwegian (Bokmål) [nb] |
Norwegian Wordnet | omw-nn:1.4 |
3671 | Norwegian (Nynorsk) [nn] |
OMW English Wordnet based on WordNet 3.0 | omw-en:1.4 |
117659 | English [en] |
Open Dutch WordNet | omw-nl:1.4 |
30177 | Dutch [nl] |
OpenWN-PT | omw-pt:1.4 |
43895 | Portuguese [pt] |
plWordNet | omw-pl:1.4 |
33826 | Polish [pl] |
Romanian Wordnet | omw-ro:1.4 |
56026 | Romanian [ro] |
Slovak WordNet | omw-sk:1.4 |
18507 | Slovak [sk] |
sloWNet | omw-sl:1.4 |
42583 | Slovenian [sl] |
Swedish (SALDO) | omw-sv:1.4 |
6796 | Swedish [sv] |
Thai Wordnet | omw-th:1.4 |
73350 | Thai [th] |
WOLF (Wordnet Libre du Français) | omw-fr:1.4 |
59091 | French [fr] |
Wordnet Bahasa | omw-id:1.4 |
38085 | Indonesian [id] |
Wordnet Bahasa | omw-zsm:1.4 |
36911 | Malaysian [zsm] |
Open Wordnet (OWN) Collection
The Open Wordnets for Portuguese and English collection (own:1.0.0
)
installs the following lexicons (from
here)
which can also be downloaded and installed independently:
Name | Specifier | # Synsets | Language |
---|---|---|---|
OpenWordnet-PT | own-pt:1.0.0 |
52670 | Portuguese [pt] |
OpenWordnet-EN | own-en:1.0.0 |
117659 | English [en] |
Collaborative Interlingual Index
While not a wordnet, the Collaborative Interlingual Index (CILI) represents the interlingual backbone of many wordnets. Wn, including interlingual queries, will function without CILI loaded, but adding it to the database makes available the full list of concepts, their status (active, deprecated, etc.), and their definitions.
Name | Specifier | # Concepts |
---|---|---|
Collaborative Interlingual Index | cili:1.0 |
117659 |
Changes to the Index
ewn
→ oewn
The 2021 version of the Open English WordNet (oewn:2021
) has
changed its lexicon ID from ewn
to oewn
, so the index is updated
accordingly. The previous versions are still available as ewn:2019
and ewn:2020
.
pwn
→ omw-en
, omw-en31
The wordnet formerly called the Princeton WordNet (pwn:3.0
,
pwn:3.1
) is now called the OMW English Wordnet based on WordNet
3.0 (omw-en
) and the OMW English Wordnet based on WordNet 3.1
(omw-en31
). This is more accurate, as it is a OMW-produced
derivative of the original WordNet data, and it also avoids license or
trademark issues.
*wn
→ omw-*
for OMW wordnets
All OMW wordnets have changed their ID scheme from ...wn
to omw-..
and the version no longer
includes +omw
(e.g., bulwn:1.3+omw
is now omw-bg:1.4
).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for wn_fast-0.9.5.post4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e51d948803f1244363cfb352ff15bc5e87d2fd45b873397073581934f723e06 |
|
MD5 | 4eccb1eea8c451e20a445bd1abff3721 |
|
BLAKE2b-256 | dc9479583567073b57b295eee0078621d0ab718c172d5fc5248762806a6c9be5 |