Skip to main content

API to the Roget thesaurus

Project description

Roget's Thesaurus

Parses Roget's Thesaurus and makes it accessible through an API.

the text of the Roget thesaurus was downloaded from here https://archive.org/details/rogetsthesauruso10681gut

Written by Michael Moser (c) 2015

at pypi: link

For usage examples see the test

running the test:

pip install RogetThesaurus
python3 tests/test_roget.py

Also see package on pypi


class RogetBuilder The main entry point of this library; builds an instances of RogetThesaurus

Methods defined here:

    __init__(self, verbose=0)

    parse(self)
    parse the roget thesaursus
    returns an instance of RogetThesaurus

    Note that that file 10681-body.txt  must be in the same directory as the script roget.py

    load(self, file)
    loads an instance of roget thesaurus (if possible from pickled/serialized form)

    if file does not exist
        parse roget thesaursus
        store pickled form to file
    else
        load pickled form from file
    returns instance of RogetThesaurus

    don't use this! surprisingly it takes less time to parse it from the text file.
    (even with this inefficient parser)

    Reason for this seems to be that pickled format is much larger then text file;
    pickle adds the type of the class as first element of sexpression -
    so there is a lot of redundancy and pickled file is much larger than text file.

class RogetThesaurus Methods defined here:

__init__(self, rootNode=None, headWordIndex=None, senseIndex=None)


semanticSimilarity(self, seq1, seq2)
    computes the semantic similarity between two terms,

    returns the following tuple (similarity-score, common-node-in-roget-thesaurus)


    the similarity score:
    100 - both terms appear in the same SenseGroup node
     90 - both terms he the same head word
     80 - both terms appear in the same leaf category
      0 - everything else

    common-node-in-roget-thesaurus: is None if the score is 0;
    otherwise it is the common node that the score is based on

Data descriptors defined here:

headWordIndex
    the index of head words - maps a head word to its node in the ontology

rootNode
    the root node of the ontology

senseIndex
    the index of word senses - maps the word sense to a list of nodes in the ontology

class RogetNode RogetNode - the base class of all nodes maintained by Roget thesaurus

Methods defined here:
__init__(self, type, description, parent=None)

toString(self)

typeToString(self)
    returns the type o this node as a string

Data descriptors defined here:


child
    returns the array of child nodes

description
    returns an optional description (in the text this appears as [ .... ] )

internalId
    each node has its own internal id

key
    the meaning/key of this node

parent
    returns the parent node (one up in the ontology)

type
    returns the type of this node as a integer

class RogetThesaurusFormatterXML
class for formatting of Roget thesaurus as xml

Methods defined here:
    show(self, roget, file)

class Sense(RogetNode) a single sense (the leaf node of the Roget Thesaurus

Methods defined here:
    __init__(self, type, parent)

    toString(self)

Data descriptors defined here:

    comment
    an optional comment (in the text this is the text that appears in brackets )


    link
    optional link to a node of type HeadWord (in the text this appears as "&c; 111" - link to headword with id 111


    linkComment
    optional comment on a link

    wordType
    optional word type annotation

Methods inherited from RogetNode:
    typeToString(self)
    returns the type o this node as a string

Data descriptors inherited from RogetNode:
    child
    returns the array of child nodes

    description
    returns an optional description (in the text this appears as [ .... ] )

    internalId
    each node has its own internal id

    key
    the meaning/key of this node

    parent
    returns the parent node (one up in the ontology)

    type
    returns the type of this node as a integer

class HeadWord(Sense) A headword

Method resolution order:
HeadWord
Sense
RogetNode


Methods defined here:
__init__(self, HeadIndex, parent)

toString(self)

Data descriptors defined here:

index
    the string id that identifies the headword in the Roget thesaurus

Data descriptors inherited from Sense:
comment
    an optional comment (in the text this is the text that appears in brackets )

link
    optional link to a node of type HeadWord (in the text this appears as "&c; 111" - link to headword with id 111

linkComment
    optional comment on a link

wordType
    optional word type annotation

Methods inherited from RogetNode:
typeToString(self)
    returns the type o this node as a string

Data descriptors inherited from RogetNode:
child
    returns the array of child nodes

description
    returns an optional description (in the text this appears as [ .... ] )

internalId
    each node has its own internal id

key
    the meaning/key of this node

parent
    returns the parent node (one up in the ontology)

type
    returns the type of this node as a integer

class RogetThesaususFormatterText
class for formatting of Roget thesaurus as text report

Methods defined here:
    show(self, roget, file, mask=15)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

RogetThesaurus-0.0.8.tar.gz (598.3 kB view details)

Uploaded Source

Built Distribution

RogetThesaurus-0.0.8-py3-none-any.whl (598.5 kB view details)

Uploaded Python 3

File details

Details for the file RogetThesaurus-0.0.8.tar.gz.

File metadata

  • Download URL: RogetThesaurus-0.0.8.tar.gz
  • Upload date:
  • Size: 598.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6

File hashes

Hashes for RogetThesaurus-0.0.8.tar.gz
Algorithm Hash digest
SHA256 82b305e9fab3655a97dd2683dde48acab3527a4e46178aeb4e1ccc65579d31e3
MD5 d0eea21b74d0550c0265e634ff47877e
BLAKE2b-256 c200f9e33126a85ced7aefdbc12accfcead688e51c86ae4366f9960284245717

See more details on using hashes here.

File details

Details for the file RogetThesaurus-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: RogetThesaurus-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 598.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6

File hashes

Hashes for RogetThesaurus-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d8166c1f02573dc35a9112013f34d181af9a1149b87afacc098832c9a119bf53
MD5 fad1b9993db6376510faeb5b71a4d96c
BLAKE2b-256 1e300222d878880c465e416c5bd31f601f37f5a9376e1c053629226df9a01aa2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page