Skip to main content

Library for using the Polish Wordnet in the plwnxml format

Project description

Polish Wordnet Python library

Simple, easy-to-use and reasonably fast library for using the Słowosieć - a lexico-semantic database of the Polish language.

I created this library, because since version 2.9, Słowosieć cannot be easily loaded into Python (for example with nltk), as it is only provided in a custom plwnxml format.

Usage

Load wordnet from an XML file (this will take about 20 seconds), and print basic statistics.

import plwordnet
wn = plwordnet.load('plwordnet_4_2.xml')
print(wn)

Expected output:

PlWordnet
  lexical units: 513410
  synsets: 353586
  relation types: 306
  synset relations: 1477849
  lexical relations: 393137

Find lexical units with name leśny and print all relations, where where that unit is in the subject/parent position.

for lu in wn.lemmas('leśny'):
    for s, p, o in wn.lexical_relations_where(subject=lu):
        print(p.format(s, o))

Expected output:

leśny.2 tworzy kolokację z polana.1
leśny.2 jest synonimem mpar. do las.1
leśny.3 przypomina las.1
leśny.4 jest derywatem od las.1
leśny.5 jest derywatem od las.1
leśny.6 przypomina las.1

Print all relation types and their ids:

for id, rel in wn.relation_types.items():
    print(id, rel.name)

Expected output:

10 hiponimia
11 hiperonimia
12 antonimia
13 konwersja
...

Installation

Note: plwordnet requires at Python 3.7 or newer.

pip install plwordnet

Version support

This library should be able to read future versions of Słowosieć without modification, even if more relation types are added. Still, if you use this library with a version of Słowosieć that is not listed below, please consider contributing information if it is supported.

  • Słowosieć 4.2 - YES (requires manually correcting the XML file)
    • Simple XML syntax errors
    • Typo in one attribute key
    • Typo in one id attribute
  • Słowosieć 3.2 - YES
  • Słowosieć 3.0 - YES

Documentation

See plwordnet/wordnet.py for RelationType, Synset and LexicalUnit class definitions.

Wordnet instance properties

  • lexical_relations: list of (subject, predicate, object) triples
  • synset_relations: list of (subject, predicate, object) triples
  • relation_types: mapping from relation type id to object
  • lexical_units: mapping from lexical unit id to unit object
  • synsets: mapping from synset id to object
  • (lexical|synset)_relations_(s|o|p): mapping from id of subject/object/predicate to a set of matching lexical unit/synset relation ids
  • lexical_units_by_name: mapping from lexical unit name to a set of matching lexical unit ids

Wordnet methods

  • lemmas(value): returns a list of LexicalUnit, where the name is equal to value
  • load(source): reads and indexes Wordnet, where source is a path to the wordnet XML file, or a file object opened in binary mode (useful for loading compressed XML files)
  • lexical_relations_where(subject, predicate, object): returns lexical relation triples, with matching subject or/and predicate or/and object. Subject, predicate and object arguments can be integer ids or LexicalUnit and RelationType objects.
  • synset_relations_where(subject, predicate, object): returns synset relation triples, with matching subject or/and predicate or/and object. Subject, predicate and object arguments can be integer ids or Synset and RelationType objects.

RelationType methods

  • format(x, y, short=False): substitutes x and y into the RelationType display format display. If short, x and y are separated by the short relation name shortcut.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plwordnet-0.1.1.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

plwordnet-0.1.1-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file plwordnet-0.1.1.tar.gz.

File metadata

  • Download URL: plwordnet-0.1.1.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for plwordnet-0.1.1.tar.gz
Algorithm Hash digest
SHA256 97508d1e72a5f986d2b00c0ce749604e59a18d702faff1687bcc5916b27792be
MD5 02e68c78fe295abef1352908f4896b72
BLAKE2b-256 32cab02fd4fc3ef2afb0bc6cc411a49ef31b372c14c9217f4486890e42c970b4

See more details on using hashes here.

File details

Details for the file plwordnet-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: plwordnet-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for plwordnet-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a3c7572218ebc9d3c9f197bef0ff76d47cd05ca4265d68f2a4fd9aa90ff556ac
MD5 ce514ca195dd88f4bdf754a5951daefd
BLAKE2b-256 42f72f2d4599c2d9e85f57a35ac271315b7931c4f900a36662a0210ad52a23e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page