Skip to main content

extract domain thesaurus automatically

Project description


DomainThesaurus is a python package offering a techniques of extracting domain-specific thesaurus which is commonly used in Natural Language Processing. Except for domain-specific thesaurus, the package also provide several useful modules, for example, DomainTerm for extracting domain-specific term and WordClassification for classifying words (e.g. abbreviation, synonyms).

Domain-Specific term

DomainTerm can automatically extract domain-specific terms from domain corpus. For example, Javascript in computer science and technology and karush kuhn tucker in mathematics.

Abbreviations and Synonyms

The module WordClassification can divide semantic-related words into different types. For example, ie is the abbreviation of internet explorer and javascripts is the synonym of javascript.


DomainThesaurus is tested to work under Python 3.x. We will try to support Python 2.x.

Dependency requirements:

  • gensim(>=3.6.0)

  • networkx(>=2.1)

DomainThesaurus is currently available on the PyPi’s repository and you can install it via pip:

pip install DomainThesaurus

If you prefer, you can clone it and run the file. Use the following command to get a copy from GitHub:

git clone


A simple example::
>>> dst = DomainThesaurus(domain_specific_corpus_path="your domain specific corpus path",
>>>                       general_corpus_path="your general corpus path",
>>>                       outputDir="path of output")
>>> # extract domain thesauruss
>>> dst.extract()

The code design is flexible, you can replace the default function class with your own function class to get a better performance. You can can find more usages in

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DomainThesaurus-1.1.0.tar.gz (16.9 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page