Skip to main content

A comprehensive lexical discovery application that is useful for finding semantic relationships such as, the antonyms, synonyms, hypernyms, hyponyms, homophones and definitions for a specific word.

Project description

Overview

The Oxford Dictionary defines wordhoard as a supply of words or a lexicon. Wordhoard is a Python 3 module that can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions for words in the English language.

This Python package was spawned from a Stack Overflow bountied question. That question forced me to look into the best practices for obtaining a comprehensive lists of synonyms for a given word. During my research, I developed the repository synonym discovery and aggregation and decided to create wordhoard.

Primary Use Case

Textual analysis is a broad term for various research methodologies used to qualitatively describe, interpret and understand text data. These methodologies are mainly used in academic research to analyze content related to media and communication studies, popular culture, sociology, and philosophy. Textual analysis allows these researchers to quickly obtain relevant insights from unstructured data. All types of information can be gleaned from textual data, especially from social media posts or news articles. Some of this information includes the overall concept of the subtext, symbolism within the text, assumptions being made and potential relative value to a subject (e.g. data science). In some cases it is possible to deduce the relative historical and cultural context of a body of text using analysis techniques coupled with knowledge from different disciplines, like linguistics and semiotics.

Word frequency is the technique used in textual analysis to measure the frequency of a specific word or word grouping within unstructured data. Measuring the number of word occurrences in a corpus allows a researcher to garner interesting insights about the text. A subset of word frequency is the correlation between a given word and that word's relationship to either antonyms and synonyms within the specific corpus being analyzed. Knowing these relationships is critical to improving word frequencies and topic modeling.

Wordhoard was designed to assist researchers performing textual analysis to build more comprehensive lists of antonyms, synonyms, hypernyms, hyponyms and homophones.

Installation

Install the distribution via pip:

pip3 install wordhoard

General Package Utilization

Antonyms Module Usage

An antonym is word that has the exact opposite meaning of another word or its antonym.

Antonym examples:

  • bad and good
  • fast and slow
  • stop and go

from wordhoard import Antonyms

antonym = Antonyms('mother')
antonym_results = antonym.find_antonyms()
print(antonym_results)
['dad', 'daddy', 'father', 'old man', 'pa', 'papa', 'pop', 'poppa']

Antonyms written to Python dictionary

from wordhoard import Antonyms

antonyms_results = {}
list_of_words = ['mother', 'daughter', 'father', 'son']

for word in list_of_words:
    antonym = Antonyms(word)
    results = antonym.find_antonyms()
    antonyms_results[word] = results

for key, value in antonyms_results.items():
    print(key, value)
    
    mother['dad', 'daddy', 'father', 'old man', 'pa', 'papa', 'pop', 'poppa']

    daughter['son']

    father['biological mother', 'birth mother', 'ma', 'mama', 'mom', 'momma', 'mommy', 'mother', 'mum', 'mummy', 
    'progenitress', 'progenitrix']

    son['daughter']

Synonyms Module Usage

A synonym is a word or phrase that means exactly or nearly the same as another word or phrase in the same language.

Synonym examples:

  • happy, joyful, elated, cheerful
  • bad, evil, rotten, corrupt
  • cold, chilly, freezing, frosty

from wordhoard import Synonyms

synonym = Synonyms('mother')
synonym_results = synonym.find_synonyms()
print(synonym_results)
['ancestor', 'biological mother', 'birth mother', 'child-bearer', 'creator', 'dam', 'female parent', 
'forebearer', 'foster mother', 'ma', 'mama', 'mamma', 'mammy', 'mater', 'mom', 'momma', 'mommy', 
'mother-in-law', 'mum', 'mummy', 'old lady', 'old woman', 'origin', 'para i', 'parent', 'predecessor', 
'primipara', 'procreator', 'progenitor', 'puerpera', 'quadripara', 'quintipara', 'source', 'supermom', 
'surrogate mother']

Synonyms written to Python dictionary

from wordhoard import Synonyms

synonyms_results = {}
list_of_words = ['mother', 'daughter', 'father', 'son']

for word in list_of_words:
    synonym = Synonyms(word)
    results = synonym.find_synonyms()
    synonyms_results[word] = results

for key, value in synonyms_results.items():
    print(key, value)
    
    mother['ancestor', 'biological mother', 'birth mother', 'child-bearer', 'creator', 'dam', 'female parent', 
    'forebearer', 'foster mother', 'ma', 'mama', 'mamma', 'mammy', 'mater', 'mom', 'momma', 'mommy', 
    'mother-in-law', 'mum', 'mummy', 'old lady', 'old woman', 'origin', 'para i', 'parent', 'predecessor', 
    'primipara', 'procreator', 'progenitor', 'puerpera', 'quadripara', 'quintipara', 'source', 'supermom', 
    'surrogate mother']

    daughter['female child', 'female offspring', 'girl', 'lass', "mother's daughter", 'offspring', 'woman']
    
    father['ancestor', 'begetter', 'beginner', 'biological father', 'birth father', 'church father', 'dad', 
    'dada', 'daddy', 'don', 'father of the church', 'father-god', 'father-in-law', 'fatherhood', 'forebearer', 
    'forefather', 'foster father', 'founder', 'founding father', 'governor', 'male parent', 'old boy', 'old man', 
    'origin', 'pa', 'padre', 'papa', 'pappa', 'parent', 'pater', 'paterfamilias', 'patriarch', 'pop', 'poppa', 
    'predecessor', 'procreator', 'progenitor', 'sire', 'source']

    son['boy', 'dependent', 'descendant', 'heir', 'jnr', 'jr', 'junior', 'lad', 'logos', 'male child', 
    'male offspring', "mama's boy", "mamma's boy", "mother's boy", 'offspring', 'scion', 'son and heir']

Hypernyms Module Usage

Hypernym: (semantics) A word or phrase whose referents form a set including as a subset the referents of a subordinate term. Musical instrument is a hypernym of "guitar" because a guitar is a musical instrument.

A hypernym is a word with a broad meaning that more specific words fall under. Other names for hypernym include umbrella term and blanket term.

Hypernym examples:

  • diamond is a hypernym of gem
  • eagle is a hypernym of bird
  • red is a hypernym of color

from wordhoard import Hypernyms

hypernym = Hypernyms('red')
hypernym_results = hypernym.find_hypernyms()
print(hypernym_results)
['amount', 'amount of money', 'card games', 'chromatic color', 'chromatic colour', 'color', 
 'colour', 'cooking', 'geographical name', 'hair', 'hair color', 'lake', 'person', 'radical', 
 'rainbow', 'river', 'spectral color', 'spectral colour', 'sum', 'sum of money']

Hyponyms Module Usage

A hyponym is a word of more specific meaning than a general or superordinate term applicable to it.

Hyponym examples:

  • horse is a hyponym of animal
  • table is a hyponym of furniture
  • maple is a hyponym of tree

from wordhoard import Hyponyms

hyponym = Hyponyms('horse')
hyponym_results = hyponym.find_hyponyms()
print(hyponym_results)
['american saddlebred', 'andalusian horse', 'arabian horse', 'azteca horse', 'barb horse', 'belgian horse',
 'belgian warmblood', 'clydesdale horse', 'coldblood trotter', 'curly horse', 'dutch warmblood', 'ethiopian horses',
 'falabella', 'fjord horse', 'friesian horse', 'gypsy horse', 'lusitano', "przewalski's horse", 'shire horse',
 'wild horse']

Homophones Module Usage

A homophone is a word that is pronounced the same as another word but differs in meaning.

Homophone examples:

  • one is a homophone of won
  • ate is a homophone of eight
  • meet is a homophone of meat

from wordhoard import Homophones

homophone = Homophones('horse')
homophone_results = homophone.find_homophones()
print(homophone_results)
['horse is a homophone of hoarse']

Definitions Module Usage

A definition is a statement of the exact meaning of a word, especially in a dictionary.

from wordhoard import Definitions

definition = Definitions('mother')
definition_results = definition.find_definitions()
print(definition_results)
["a person's own mother", 'a woman who has given birth to a child (also used as a term of address to your mother)',
 'female person who has borne children']

Advanced Usage

Language Translation

The majority of the sources that wordhoard queries are primarily in the English language. To find antonyms, synonyms, hypernyms, hyponyms and homophones for other languages wordhoard has 3 translation service modules. These modules support Google Translate, DeepL Translate and MyMemory Translate.

The example below uses the Google Translate module within wordhoard to translate Spanish language words to English language and then back into Spanish.

from wordhoard import Antonyms
from wordhoard.utilities.google_translator import Translator

words = ['buena', 'contenta', 'suave']
for word in words:
    translated_word = Translator(source_language='es', str_to_translate=word).translate_word()
    antonyms = Antonyms(translated_word).find_antonyms()
    reverse_translations = []
    for antonym in antonyms:
        reverse_translated_word = Translator(source_language='es', str_to_translate=antonym).reverse_translate()
        reverse_translations.append(reverse_translated_word)
    output_dict = {word: sorted(reverse_translations)}
    print(output_dict)
   {'buena': ['Dios espantoso', 'OK', 'abominable', 'aborrecible', 'acogedor', 'agravante', 'amenazante', 
   'angustioso', 'antiestético', 'asqueroso', 'basura', 'carente', 'contaminado', 'de segunda', 'decepcionante', 
   'defectuoso', 'deficiente', 'deplorable', 'deprimente', 'desagradable', 'desaliñado', 'descorazonador', 
   'desfavorecido', 'desgarbado', 'desgarrador', 'detestable', 'doloroso', 'duro', 'débil', 'enfermo', 
   'enfureciendo', 'enloquecedor', 'espantoso', 'esperado', 'exasperante', 'falsificado', 'falso', 'falta', 
   'falto', 'feo', 'frustrante', 'grotesco', 'horrible', 'hostil', 'impactante', 'imperfecto', 'inaceptable', 
   'inadecuado', 'inadmisible', 'inaguantable', 'incensar', 'incompetente', 'incongruente', 'inconsecuente', 
   'incorrecto', 'indeseable', 'indignante', 'indigno', 'indigno de', 'infeliz', 'inferior', 'infernal', 'inflamando', 
   'inmoral', 'insalubre', 'insatisfactorio', 'insignificante', 'insoportable', 'insuficiente', 'insufrible', 
   'intimidante', 'inútil', 'irreal', 'irritante', 'lamentable', 'lúgubre', 'maldad', 'malo', 'malvado', 'malísimo', 
   'mediocre', 'menor', 'miserable', 'molesto', 'nauseabundo', 'no a la par', 'no atractivo', 'no capacitado', 
   'no es bueno', 'no es suficiente', 'no fidedigno', 'no satisfactorio', 'nocivo', 'objetable', 'odioso', 'ofensiva', 
   'ordinario', 'pacotilla', 'patético', 'pecaminoso', 'perturbador', 'pobre', 'poco agraciado', 'poco apetecible', 
   'poco hermoso', 'poco imponente', 'poco satisfactorio', 'poco virtuoso', 'podrido', 'portarse mal', 'preocupante', 
   'repelente', 'repugnante', 'repulsivo', 'sencillo', 'significar', 'sin forma', 'sin importancia', 'sin placer', 
   'sin valor', 'sombrío', 'subóptimo', 'sucio', 'terrible', 'triste', 'trágico', 'vicioso', 'vil']}
   
    truncated...

The example below uses the Deep Translate module within wordhoard to translate Spanish language words to English language and then back into Spanish.

from wordhoard import Antonyms
from wordhoard.utilities.deep_translator import Translator

words = ['buena', 'contenta', 'suave']
for word in words:
    translated_word = Translator(source_language='es', str_to_translate=word,
                                 api_key='your_api_key').translate_word()
    antonyms = Antonyms(translated_word).find_antonyms()
    reverse_translations = []
    for antonym in antonyms:
        reverse_translated_word = Translator(source_language='es', str_to_translate=antonym,
                                             api_key='your_api_key').reverse_translate()
        reverse_translations.append(reverse_translated_word)
    output_dict = {word: sorted(set(reverse_translations))}
    print(output_dict)
    {'buena': ['abominable', 'agravante', 'angustia', 'antiestético', 'antipático', 'asqueroso', 'basura', 'casero', 
    'contaminado', 'crummy', 'de mala calidad', 'de segunda categoría', 'decepcionante', 'defectuoso', 'deficiente', 
    'deplorable', 'deprimente', 'desagradable', 'descorazonador', 'desgarrador', 'detestable', 'dios-horrible', 
    'doloroso', 'duro', 'débil', 'en llamas', 'enfermo', 'enfureciendo a', 'enloquecedor', 'equivocada', 'espantoso', 
    'esperado', 'exasperante', 'falso', 'falta', 'feo', 'forjado', 'frumpish', 'frumpy', 'frustrante', 'grotesco', 
    'horrible', 'hostil', 'impactante', 'imperfecto', 'impermisible', 'inaceptable', 'inadecuado', 'inadmisible', 
    'incandescente', 'incompetente', 'incongruente', 'indeseable', 'indignante', 'indigno', 'infeliz', 'inferior', 
    'infernal', 'inflamando', 'inmoral', 'inquietante', 'insalubre', 'insatisfactorio', 'insignificante', 'insoportable', 
    'insostenible', 'insuficiente', 'insufrible', 'intimidante', 'intrascendente', 'irreal', 'irritante', 'lamentable', 
    'llano', 'lúgubre', 'mal', 'mal favorecido', 'malvado', 'media', 'mediocre', 'menor', 'miserable', 'molestos', 
    'nauseabundo', 'no apto', 'no cualificado', 'no es agradable', 'no es bienvenido', 'no es bueno', 
    'no es lo suficientemente bueno', 'no es sano', 'no está a la altura', 'no hay que olvidar que', 'no se puede confiar en', 
    'nocivo', 'objetable', 'odioso', 'ofensiva', 'ok', 'ordinario', 'patético', 'pecaminoso', 'perturbando', 'pobre', 
    'poco apetecible', 'poco atractivo', 'poco encantador', 'poco imponente', 'poco útil', 'podrido', 'problemático', 
    'prohibiendo', 'pésimo', 'que molesta', 'queriendo', 'rankling', 'repelente', 'repugnante', 'repulsivo', 'rilando', 
    'se comportan mal', 'sin alegría', 'sin duda', 'sin forma', 'sin importancia', 'sin placer', 'sin pretensiones', 
    'sin sentido', 'sin valor', 'sombrío', 'subestándar', 'subóptima', 'terrible', 'triste', 'trágico', 'uncute', 'unvirtuoso', 
    'vicioso', 'vil', 'yukky']}
    
     truncated...

The example below uses the MyMemory Translate module within wordhoard to translate Spanish language words to English language and then back into Spanish.

from wordhoard import Antonyms
from wordhoard.utilities.mymemory_translator import Translator

words = ['buena', 'contenta', 'suave']
for word in words:
    translated_word = Translator(source_language='es', str_to_translate=word,
                                 email_address='your_email_address').translate_word()
    antonyms = Antonyms(translated_word).find_antonyms()
    reverse_translations = []
    for antonym in antonyms:
        reverse_translated_word = Translator(source_language='es', str_to_translate=antonym,
                                             email_address='your_email_address').reverse_translate()
        reverse_translations.append(reverse_translated_word)
    output_dict = {word: sorted(set(reverse_translations))}
    print(output_dict)
    {'buena': ['abominable', 'aborrecible', 'aceptar', 'afligido', 'agravante', 'amenazante', 'ansia nauseosa', 
    'antiestético', 'asco', 'asqueroso', 'atroz', 'basura', 'caballo que padece tiro', 'carente', 'chocante', 
    'consternador', 'de baja calidad', 'decepcionando', 'defectuoso', 'deficiente', 'deprimentes', 'desagradable', 
    'desaliñado', 'descorazonador', 'desfavorecido', 'desgarbado', 'desgarrador', 'desgraciado', 'detestable', 
    'dios espantoso', 'doloroso', 'duelo psicológico', 'débil', 'enfermas', 'enfermo', 'enfureciendo', 'enloquecedor', 
    'es lo suficientemente buena', 'espantoso', 'esperado', 'está por el suelo', 'exasperante', 'fake', 'familiar', 
    'feo', 'forjado', 'fúnebre', 'grutesco', 'horrible', 'hostil', 'impropio', 'inaceptable', 'inadecuado', 'inadmisible', 
    'inaguantable', 'incensar', 'incomible', 'incompetente', 'incongruente', 'indeseable', 'indignante', 'indigno', 
    'inexperto', 'infeliz', 'inferior', 'infernal', 'inflamando', 'inmoral', 'inquietante', 'insatisfactorio', 'insignificante', 
    'insoportable', 'insuficientes', 'insufrible', 'insípido', 'intimidante', 'intrascendente', 'irreal', 'irritante', 
    'lamentable', 'lúgubre', 'mal', 'mal acogido', 'malo', 'media', 'mezquino', 'molesto', 'nauseabundo', 'no es bueno', 
    'no satisfactorio', 'no útil', 'nocivo', 'o antipatico', 'odioso', 'ofensivo', 'parcialmente podrido', 'patético', 
    'pecador', 'penoso', 'pequeños', 'perturbador', 'piojoso', 'poco agraciado', 'poco apetecible', 'poco atractivo', 
    'poco fiable', 'poco hermoso', 'poco imponente', 'poco satisfactorio', 'poco virtuoso', 'podrido', 'portarse mal', 
    'preocupante', 'pretérito imperfecto', 'puede ser frustrante', 'querer', 'repelente', 'repugnante', 'repulsivo', 
    'residuos de lana', 'riling', 'ser agrupado con', 'simple', 'sin forma', 'sin importancia', 'sin placer', 'sin valor', 
    'sombría', 'subóptimo', 'sucio', 'tarifa segunda', 'temperatura', 'terrible', 'triste', 'trágico', 'tu bienvenida mi hermano', 
    'un error', 'vano', 'vicioso', 'vil', '¡horrible', 'áspero']}
    
      truncated...

It is worth noting that none of the translation services are perfect, thus it can make “lost in translation” translation mistakes. These mistakes are usually related to the translation service not having an in-depth understanding of the language or not being able to under the context of these words being translated. In some cases there will be nonsensical literal translations. So any translations should be reviewed for these common mistakes.

Natural Language Processing

One of the example scripts uses the Natural Language Toolkit (NLTK) to parse a block of text. Part of the parsing process includes removing punctuation and numeral characters for the text. It also includes removing common English language stop words. After the text has been cleaned the script looks for synonyms for each word (aka token).

Additional Features

In-memory cache

Wordhoard uses an in-memory cache, which helps prevent redundant queries to an individual resource for the same word. These caches are currently being erased after each session.

Rate limiting

Some sources have ratelimits, which can impact querying and extraction for that source. In some cases exceeding these ratelimits will trigger a Cloudflare challenge session. Errors related to these blocked sessions are written the wordhoard_error.yaml file. Such entries can have a status code of 521, which is a Cloudflare-specific error message. The maintainers of wordhoard have added ratelimits to mutiple modules. These ratelimits can be modified, but reducing these predefined limits can lead to querying sessions being dropped or blocked by a source.

Currently there are 2 parameters that can be set:

  • max_number_of_requests
  • rate_limit_timeout_period

These parameters are currently set to 30 requests every 60 seconds. Requests is a misnomer, because within the Synonyms module 150 queries will be made. The reason here is that there are 5 sources, which will be called 30 times each in the 60 seconds timeout period.

When a ratelimit is trigger a warning message is written to both the console and the wordhoard_error.yaml file. The ratelimit will automatically reset after a set time period, which currently cannot be modified using a parameter passed in a Class object.

from wordhoard import Synonyms
synonym = Synonyms(search_string='mother', max_number_of_requests=30, rate_limit_timeout_period=60)
results = synonym.find_synonyms()   

Proxy usage

Wordhoard provides out of the box usage of proxies. Just define your proxies config as a dictionary and pass it to the corresponding module as shown below.

from wordhoard import Synonyms
proxies_example = {
    "http": "your http proxy if available" # example: http://149.28.94.152:8080
    "https": "your https proxy"  # example: https://128.230.60.178:3128
}

synonym = Synonyms(search_string='mother', proxies=proxies_example)
results = synonym.find_synonyms()  

There is a known bug in urllib3 between versions 1.26.0 and 1.26.7, which will raise different errors. Wordhoard will be using urllib3 version 1.25.11 until the bug is fixed in a future release.

Output Formatting

The default output of wordhoard is a Python List. The output format can be changed to use a Python dictionary. The code example below shows how to change the formatting.

from wordhoard import Antonyms

words = ['good', 'bad', 'happy']
for word in words:
    antonym_dict = Antonyms(search_string=word, output_format='dictionary').find_antonyms()
    print(antonym_dict)
    {'good': ['detestable', 'evil', 'fake', 'forged', 'immoral', 'inadequate', 'incompetent', 'inconsequential',
              'inconsiderable', 'mean', 'misbehaving', 'noxious', 'rotten', 'sinful', 'tainted', 'unpleasant', 'unreal',
              'unreliable', 'unskilled', 'unsuitable', 'unvirtuous', 'vicious', 'vile', 'wicked']}
    {'bad': ['advantageous', 'beneficial', 'benevolent', 'honest', 'just', 'profitable', 'reputable', 'right', 'true',
             'undecayed', 'upright', 'virtuous', 'worthy']}
    {'happy': ['discouraged', 'dissatisfied', 'forsaken', 'hopeless', 'morose', 'pained', 'unfortunate', 'unlucky']}

Logging

This application also uses Python logging to both the terminal and to the logfile wordhoard_error.yaml. The maintainers of Wordhoard have attempted to catch any potential exception and write these error messages to the logfile. The logfile is useful to troubleshooting any issue with this package or with the sources being queried by Wordhoard.

Sources

This package is designed to query these online sources for antonyms, synonyms, hypernyms, hyponyms and definitions:

  1. classicthesaurus.com
  2. collinsdictionary.com
  3. merriam-webster.com
  4. synonym.com
  5. thesaurus.com
  6. wordhippo.com
  7. wordnet.princeton.edu

Dependencies

This package has these core dependencies:

  1. backoff
  2. BeautifulSoup
  3. deckar01-ratelimit
  4. deepl
  5. lxml
  6. requests
  7. urllib3

License

The MIT License (MIT). Please see License File for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordhoard-1.5.0.tar.gz (265.3 kB view hashes)

Uploaded source

Built Distribution

wordhoard-1.5.0-py3-none-any.whl (284.8 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page