Skip to main content

Python wrapper for obtaining synonyms in the German language from OpenThesaurus

Project description

Python wrapper for obtaining synonyms in the German language from OpenThesaurus

Codacy Badge License: MIT Build Status

When working in Natural Language Processing (NLP) area, synonyms can be an essential part of the data augmentation process. The task of obtaining synonyms for the German language is currently limited since there are no easily accessible lexical databases for the German language. Compared to the WordNet lexical database for the English language, which is available as an nltk package, GermaNet represents only one German lexical database alternative. However, to use GermaNet for further research purposes, it is necessary to obtain the license manually.

This repository represents a Python wrapper implementation for obtaining synonyms in a faster and easier way, using the German synonym database and API from OpenThesaurus.

Installation

The library can be installed from PyPI:

pip install py-openthesaurus

Download open-thesaurus database dump

Download the official open-thesaurus database dump from the following link. If the link is not working, please visit the following page and download the up-to-date database dump.

Setup mysql and import open-thesaurus database dump

To install mysql-server on Ubuntu run:

sudo apt-get update
sudo apt-get install mysql-server

Create a new database:

mysql -u user_name -p
mysql> create database database_name;
mysql> exit

Extract the downloaded database dump file to import it using the following command:

mysql -u user_name -p database_name < openthesaurus_dump.sql

To use the mysqlclient library in Python on Ubuntu, install the following dependencies:

sudo apt-get install python3-pip python3-dev libmysqlclient-dev

This library, as a dependency, uses mysqlclient Python library. For support on other systems, please check the following link.

Usage

As a Python library, retrieving results from a previously imported database:

from py_openthesaurus import OpenThesaurusDb

open_thesaurus = OpenThesaurusDb(host="host", user="user", passwd="passwd", db_name="database_name")

# to get the short version of synonyms as a list
synonyms = open_thesaurus.get_synonyms(word="München")

# to get the long version of synonyms as a list
synonyms_long = open_thesaurus.get_synonyms(word="München", form="long")

As a Python library, retrieving results from a web end-point:

from py_openthesaurus import OpenThesaurusWeb

open_thesaurus = OpenThesaurusWeb()

# to get the short version of synonyms as a list
synonyms = open_thesaurus.get_synonyms(word="München")

# to get the long version of synonyms as a list
synonyms_long = open_thesaurus.get_synonyms(word="München", form="long")

As a command-line tool (which currently obtains results from a web API):

usage: py_openthesaurus [-h] [--form {long,short}] --word WORD

Get synonyms of German words from www.openthesaurus.de

optional arguments:
  -h, --help           show this help message and exit
  --form {long,short}  Defaults to form=short which means that short versions
                       of synonyms will be returned, without nach/zu
                       prefixes/suffixes. On the other hand, form=long returns
                       the full versions of synonyms, including nach/zu, sich
                       prefixes/suffixes.

required arguments:
  --word WORD          A word from which synonyms will be obtained

Acknowledgments

  • OpenThesaurus for developing a German synonym database with API from which synonyms for the German language can be obtained

Licence

Even though this project is under MIT license, please check information about OpenThesaurus licensing and API limitations (only 60 requests per minute are currently supported) from the following link, in the case your software will need an exhaustive amount of web API requests in a short period of time.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_openthesaurus-1.0.5.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

py_openthesaurus-1.0.5-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file py_openthesaurus-1.0.5.tar.gz.

File metadata

  • Download URL: py_openthesaurus-1.0.5.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.1 CPython/3.8.5

File hashes

Hashes for py_openthesaurus-1.0.5.tar.gz
Algorithm Hash digest
SHA256 8b557cdf2a2b9b5f46f8d5d448ad1e818ebdd0683f896ca2304ded6ff4cde94a
MD5 b014c1d6c52a1d3d99627648e0eccedf
BLAKE2b-256 7e4160c9c28a9161a0e336d88e9574de72387637ee988547d0d80f75beb70a0a

See more details on using hashes here.

File details

Details for the file py_openthesaurus-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: py_openthesaurus-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.1 CPython/3.8.5

File hashes

Hashes for py_openthesaurus-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 068d466b1d898fb9d109494227d38f51caa0fbb9568c08692cf2aacaf353ba33
MD5 5ab786399ff6c9e581602b065acb3fe1
BLAKE2b-256 20922f54f974aee078e37ca964478ce40beb7853d060865f81df9516c3309df0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page