Python wrapper for obtaining synonyms in the German language from OpenThesaurus
Project description
Python wrapper for obtaining synonyms in the German language from OpenThesaurus
When working in Natural Language Processing (NLP) area, synonyms can be an essential part of the data augmentation process. The task of obtaining synonyms for the German language is currently limited since there are no easily accessible lexical databases for the German language. Compared to the WordNet lexical database for the English language, which is available as an nltk package, GermaNet represents only one German lexical database alternative. However, to use GermaNet for further research purposes, it is necessary to obtain the license manually.
This repository represents a Python wrapper implementation for obtaining synonyms in a faster and easier way, using the German synonym database and API from OpenThesaurus.
Installation
The library can be installed from PyPI:
pip install py-openthesaurus
Download open-thesaurus database dump
Download the official open-thesaurus database dump from the following link. If the link is not working, please visit the following page and download the up-to-date database dump.
Setup mysql and import open-thesaurus database dump
To install mysql-server on Ubuntu run:
sudo apt-get update
sudo apt-get install mysql-server
Create a new database:
mysql -u user_name -p
mysql> create database database_name;
mysql> exit
Extract the downloaded database dump file to import it using the following command:
mysql -u user_name -p database_name < openthesaurus_dump.sql
To use the mysqlclient
library in Python on Ubuntu, install the following dependencies:
sudo apt-get install python3-pip python3-dev libmysqlclient-dev
This library, as a dependency, uses mysqlclient
Python library. For support on other systems, please check the following link.
Usage
As a Python library, retrieving results from a previously imported database:
from py_openthesaurus import OpenThesaurusDb
open_thesaurus = OpenThesaurusDb(host="host", user="user", passwd="passwd", db_name="database_name")
# to get the short version of synonyms as a list
synonyms = open_thesaurus.get_synonyms(word="München")
# to get the long version of synonyms as a list
synonyms_long = open_thesaurus.get_synonyms(word="München", form="long")
As a Python library, retrieving results from a web end-point:
from py_openthesaurus import OpenThesaurusWeb
open_thesaurus = OpenThesaurusWeb()
# to get the short version of synonyms as a list
synonyms = open_thesaurus.get_synonyms(word="München")
# to get the long version of synonyms as a list
synonyms_long = open_thesaurus.get_synonyms(word="München", form="long")
As a command-line tool (which currently obtains results from a web API):
usage: py_openthesaurus [-h] [--form {long,short}] --word WORD
Get synonyms of German words from www.openthesaurus.de
optional arguments:
-h, --help show this help message and exit
--form {long,short} Defaults to form=short which means that short versions
of synonyms will be returned, without nach/zu
prefixes/suffixes. On the other hand, form=long returns
the full versions of synonyms, including nach/zu, sich
prefixes/suffixes.
required arguments:
--word WORD A word from which synonyms will be obtained
Acknowledgments
- OpenThesaurus for developing a German synonym database with API from which synonyms for the German language can be obtained
Licence
Even though this project is under MIT license, please check information about OpenThesaurus licensing and API limitations (only 60 requests per minute are currently supported) from the following link, in the case your software will need an exhaustive amount of web API requests in a short period of time.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for py_openthesaurus-1.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 068d466b1d898fb9d109494227d38f51caa0fbb9568c08692cf2aacaf353ba33 |
|
MD5 | 5ab786399ff6c9e581602b065acb3fe1 |
|
BLAKE2b-256 | 20922f54f974aee078e37ca964478ce40beb7853d060865f81df9516c3309df0 |