API to the Roget thesaurus
Project description
Roget's Thesaurus
Parses Roget's Thesaurus and makes it accessible through an API.
the text of the Roget thesaurus was downloaded from here https://archive.org/details/rogetsthesauruso10681gut
Written by Michael Moser (c) 2015
at pypi: link
For usage examples see the test
running the test:
pip install RogetThesaurus
python3 tests/test_roget.py
Also see package on pypi
class RogetBuilder The main entry point of this library; builds an instances of RogetThesaurus
Methods defined here:
__init__(self, verbose=0)
parse(self)
parse the roget thesaursus
returns an instance of RogetThesaurus
Note that that file 10681-body.txt must be in the same directory as the script roget.py
load(self, file)
loads an instance of roget thesaurus (if possible from pickled/serialized form)
if file does not exist
parse roget thesaursus
store pickled form to file
else
load pickled form from file
returns instance of RogetThesaurus
don't use this! surprisingly it takes less time to parse it from the text file.
(even with this inefficient parser)
Reason for this seems to be that pickled format is much larger then text file;
pickle adds the type of the class as first element of sexpression -
so there is a lot of redundancy and pickled file is much larger than text file.
class RogetThesaurus Methods defined here:
__init__(self, rootNode=None, headWordIndex=None, senseIndex=None)
semanticSimilarity(self, seq1, seq2)
computes the semantic similarity between two terms,
returns the following tuple (similarity-score, common-node-in-roget-thesaurus)
the similarity score:
100 - both terms appear in the same SenseGroup node
90 - both terms he the same head word
80 - both terms appear in the same leaf category
0 - everything else
common-node-in-roget-thesaurus: is None if the score is 0;
otherwise it is the common node that the score is based on
Data descriptors defined here:
headWordIndex
the index of head words - maps a head word to its node in the ontology
rootNode
the root node of the ontology
senseIndex
the index of word senses - maps the word sense to a list of nodes in the ontology
class RogetNode RogetNode - the base class of all nodes maintained by Roget thesaurus
Methods defined here:
__init__(self, type, description, parent=None)
toString(self)
typeToString(self)
returns the type o this node as a string
Data descriptors defined here:
child
returns the array of child nodes
description
returns an optional description (in the text this appears as [ .... ] )
internalId
each node has its own internal id
key
the meaning/key of this node
parent
returns the parent node (one up in the ontology)
type
returns the type of this node as a integer
class RogetThesaurusFormatterXML
class for formatting of Roget thesaurus as xml
Methods defined here:
show(self, roget, file)
class Sense(RogetNode) a single sense (the leaf node of the Roget Thesaurus
Methods defined here:
__init__(self, type, parent)
toString(self)
Data descriptors defined here:
comment
an optional comment (in the text this is the text that appears in brackets )
link
optional link to a node of type HeadWord (in the text this appears as "&c; 111" - link to headword with id 111
linkComment
optional comment on a link
wordType
optional word type annotation
Methods inherited from RogetNode:
typeToString(self)
returns the type o this node as a string
Data descriptors inherited from RogetNode:
child
returns the array of child nodes
description
returns an optional description (in the text this appears as [ .... ] )
internalId
each node has its own internal id
key
the meaning/key of this node
parent
returns the parent node (one up in the ontology)
type
returns the type of this node as a integer
class HeadWord(Sense) A headword
Method resolution order:
HeadWord
Sense
RogetNode
Methods defined here:
__init__(self, HeadIndex, parent)
toString(self)
Data descriptors defined here:
index
the string id that identifies the headword in the Roget thesaurus
Data descriptors inherited from Sense:
comment
an optional comment (in the text this is the text that appears in brackets )
link
optional link to a node of type HeadWord (in the text this appears as "&c; 111" - link to headword with id 111
linkComment
optional comment on a link
wordType
optional word type annotation
Methods inherited from RogetNode:
typeToString(self)
returns the type o this node as a string
Data descriptors inherited from RogetNode:
child
returns the array of child nodes
description
returns an optional description (in the text this appears as [ .... ] )
internalId
each node has its own internal id
key
the meaning/key of this node
parent
returns the parent node (one up in the ontology)
type
returns the type of this node as a integer
class RogetThesaususFormatterText
class for formatting of Roget thesaurus as text report
Methods defined here:
show(self, roget, file, mask=15)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file RogetThesaurus-0.0.8.tar.gz
.
File metadata
- Download URL: RogetThesaurus-0.0.8.tar.gz
- Upload date:
- Size: 598.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82b305e9fab3655a97dd2683dde48acab3527a4e46178aeb4e1ccc65579d31e3 |
|
MD5 | d0eea21b74d0550c0265e634ff47877e |
|
BLAKE2b-256 | c200f9e33126a85ced7aefdbc12accfcead688e51c86ae4366f9960284245717 |
File details
Details for the file RogetThesaurus-0.0.8-py3-none-any.whl
.
File metadata
- Download URL: RogetThesaurus-0.0.8-py3-none-any.whl
- Upload date:
- Size: 598.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8166c1f02573dc35a9112013f34d181af9a1149b87afacc098832c9a119bf53 |
|
MD5 | fad1b9993db6376510faeb5b71a4d96c |
|
BLAKE2b-256 | 1e300222d878880c465e416c5bd31f601f37f5a9376e1c053629226df9a01aa2 |