DomainThesaurus

extract domain thesaurus automatically

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- MacOS
- Microsoft :: Windows
- POSIX
- Unix
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

Introduction

DomainThesaurus is a python package offering a technique of extracting domain-specific thesaurus which is commonly used in Natural Language Processing. Here is one item of generated thesaurus:

{ "internet explorer":
{"abbreviation":["ie"],
"synonym":["internet explorers", "internet explorere", "internetexplorer"],
"other":["firefox","chrome","opera"]}
}

Except for domain-specific thesaurus, the package also provides several useful modules. For example, DomainTerm for extracting domain-specific term and WordDiscrimination for discriminate words (e.g. abbreviation, synonyms). Details of the implemented approaches can be found in our publication: SEthesaurus: WordNet in Software Engineering. (IEEE Transactions on Software Engineering 2019)

Domain-Specific term

DomainTerm can automatically extract domain-specific terms from domain corpus. For example, Javascript in the domain of computer science and technology and karush kuhn tucker in domain of mathematics.

Abbreviations and Synonyms

The module WordDiscrimination can divide semantic-related words into different types. The default module can recognize semantic-related words as abbreviation and synonym. Note that, in our module, the synonym means that two words are semantic-related word and they are morphologically similar. For example, ie is the abbreviation of internet explorer and javascripts is the synonym of javascript.

Installation

DomainThesaurus is tested to work under Python 3.x. Please use it in Python 3.x. We will try to support Python 2.x.

Dependency requirements:

gensim(>=3.6.0)
networkx(>=2.1)

DomainThesaurus is currently available on the PyPi’s repository and you can install it via pip:

pip install DomainThesaurus

If you prefer, you can clone it and run the setup.py file. Use the following command to get a copy from GitHub:

git clone https://github.com/DunZhang/DomainSpecificThesaurus.git

Usage

A simple example::

>>> dst = DomainThesaurus(domain_specific_corpus_path="your domain specific corpus path",
>>>                       general_vocab_path="your general vocab path",
>>>                       outputDir="path of output")
>>> # extract domain thesauruss
>>> your_thesaurus = dst.extract()

If you don’t have any datasets, you can copy and run this code: https://github.com/DunZhang/DomainSpecificThesaurus/blob/master/docs/notebooks/domain_thesaurus.ipynb . This code will automatically download datasets for you. The code design is flexible, you can replace the default function class with your own function class to get better performance. You can find more usage in https://github.com/DunZhang/DomainSpecificThesaurus/blob/master/docs/notebooks

Acknowledgements

In this project, we use Levenshtein Distance and GoogleDriveDownloader from https://pypi.org/project/jellyfish/ and https://github.com/ndrplz/google-drive-downloader. Thanks for their code.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- MacOS
- Microsoft :: Windows
- POSIX
- Unix
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

This version

1.2.3

Jan 19, 2020

1.2.2

Jan 18, 2020

1.2.1

Jan 18, 2020

1.2.0

Dec 7, 2019

1.1.9

Apr 14, 2019

1.1.8

Apr 11, 2019

1.1.7

Apr 9, 2019

1.1.6

Mar 16, 2019

1.1.5

Mar 16, 2019

1.1.4

Mar 16, 2019

1.1.3

Feb 11, 2019

1.1.2

Feb 10, 2019

1.1.1

Feb 6, 2019

1.1.0

Feb 5, 2019

1.0.9

Jan 28, 2019

1.0.8

Jan 28, 2019

1.0.7

Jan 28, 2019

1.0.6

Jan 28, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DomainThesaurus-1.2.3.tar.gz (18.0 kB view details)

Uploaded Jan 19, 2020 Source

File details

Details for the file DomainThesaurus-1.2.3.tar.gz.

File metadata

Download URL: DomainThesaurus-1.2.3.tar.gz
Upload date: Jan 19, 2020
Size: 18.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: Python-urllib/3.7

File hashes

Hashes for DomainThesaurus-1.2.3.tar.gz
Algorithm	Hash digest
SHA256	`89e9efacc2197640fc32ac9b4a5bfa45b2c5545d23bcd1cde2a437f0c5136049`
MD5	`e2a85de906b07a5e913b7b5e3f873613`
BLAKE2b-256	`3af6acc9c3d71cee375b7f089ff04f564969830d4a58c6922cae682c3f3238d7`

See more details on using hashes here.

DomainThesaurus 1.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Introduction

Domain-Specific term

Abbreviations and Synonyms

Installation

Usage

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes