Library for using the Polish Wordnet in the plwnxml format
Project description
Polish Wordnet Python library
Simple, easy-to-use and reasonably fast library for using the Słowosieć - a lexico-semantic database of the Polish language.
I created this library, because since version 2.9, Słowosieć cannot be easily loaded into Python (for example with nltk
), as it is only provided in a custom plwnxml
format.
Usage
Load wordnet from an XML
file (this will take about 20 seconds), and print basic statistics.
import plwordnet
wn = plwordnet.load('plwordnet_4_2.xml')
print(wn)
Expected output:
PlWordnet
lexical units: 513410
synsets: 353586
relation types: 306
synset relations: 1477849
lexical relations: 393137
Find lexical units with name leśny
and print all relations, where where that unit is in the subject/parent position.
for lu in wn.lemmas('leśny'):
for s, p, o in wn.lexical_relations_where(subject=lu):
print(p.format(s, o))
Expected output:
leśny.2 tworzy kolokację z polana.1
leśny.2 jest synonimem mpar. do las.1
leśny.3 przypomina las.1
leśny.4 jest derywatem od las.1
leśny.5 jest derywatem od las.1
leśny.6 przypomina las.1
Print all relation types and their ids:
for id, rel in wn.relation_types.items():
print(id, rel.name)
Expected output:
10 hiponimia
11 hiperonimia
12 antonimia
13 konwersja
...
Installation
Note: plwordnet
requires at Python 3.7 or newer.
pip install plwordnet
Version support
This library should be able to read future versions of Słowosieć without modification, even if more relation types are added. Still, if you use this library with a version of Słowosieć that is not listed below, please consider contributing information if it is supported.
- Słowosieć 4.2 - YES (requires manually correcting the XML file)
- Simple XML syntax errors
- Typo in one attribute key
- Typo in one
id
attribute
- Słowosieć 3.2 - YES
- Słowosieć 3.0 - YES
Documentation
See plwordnet/wordnet.py
for RelationType
, Synset
and LexicalUnit
class definitions.
Wordnet
instance properties
lexical_relations
: list of (subject, predicate, object) triplessynset_relations
: list of (subject, predicate, object) triplesrelation_types
: mapping from relation type id to objectlexical_units
: mapping from lexical unit id to unit objectsynsets
: mapping from synset id to object(lexical|synset)_relations_(s|o|p)
: mapping from id of subject/object/predicate to a set of matching lexical unit/synset relation idslexical_units_by_name
: mapping from lexical unit name to a set of matching lexical unit ids
Wordnet
methods
lemmas(value)
: returns a list ofLexicalUnit
, where the name is equal tovalue
load(source)
: reads and indexes Wordnet, wheresource
is a path to the wordnet XML file, or a file object opened in binary mode (useful for loading compressed XML files)lexical_relations_where(subject, predicate, object)
: returns lexical relation triples, with matching subject or/and predicate or/and object. Subject, predicate and object arguments can be integer ids orLexicalUnit
andRelationType
objects.synset_relations_where(subject, predicate, object)
: returns synset relation triples, with matching subject or/and predicate or/and object. Subject, predicate and object arguments can be integer ids orSynset
andRelationType
objects.
RelationType
methods
format(x, y, short=False)
: substitutesx
andy
into theRelationType
display formatdisplay
. Ifshort
,x
andy
are separated by the short relation nameshortcut
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file plwordnet-0.1.1.tar.gz
.
File metadata
- Download URL: plwordnet-0.1.1.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97508d1e72a5f986d2b00c0ce749604e59a18d702faff1687bcc5916b27792be |
|
MD5 | 02e68c78fe295abef1352908f4896b72 |
|
BLAKE2b-256 | 32cab02fd4fc3ef2afb0bc6cc411a49ef31b372c14c9217f4486890e42c970b4 |
File details
Details for the file plwordnet-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: plwordnet-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3c7572218ebc9d3c9f197bef0ff76d47cd05ca4265d68f2a4fd9aa90ff556ac |
|
MD5 | ce514ca195dd88f4bdf754a5951daefd |
|
BLAKE2b-256 | 42f72f2d4599c2d9e85f57a35ac271315b7931c4f900a36662a0210ad52a23e7 |