Skip to main content

Library for parsing terms tree from indented text file and searching texts for the tree terms

Project description

Terms Tree Library

Library to build terms tree from indented text file and to search terms in that tree.

May be used for text labeling/classification tasks.

See also: termstree library

Example

Demo Script:

import termstree

TERMS_TREE_SRC = """
# comment

Asia
    Japan
        Tokyo [url="https://en.wikipedia.org/wiki/Tokyo"]
        Osaka
    China
        Beijing
        Shanghai
Europe
    England
        London

    Germany [url="https://en.wikipedia.org/wiki/Germany"]
        Berlin
        Munich
"""

terms_tree = termstree.build(TERMS_TREE_SRC, terms_normalizer=None)

text = 'During the 16th century, Munich was a centre of the German counter reformation. Europe ...'

for hit in terms_tree.search_in(text):
    print(hit)

Result (list of 'hits' - terms found in the text):

Hit(node=Node('Munich'), dhits=1, ihits=0)
Hit(node=Node('Europe'), dhits=1, ihits=1)
Hit(node=Node('Germany', {'url': 'https://en.wikipedia.org/wiki/Germany'}), dhits=0, ihits=1)

Every hit corresponds to a term from terms tree and has next attributes:

  • node - found term
  • dhits (direct hits) - number of direct term occurrences in the text
  • ihits (indirect hits) - number of term's children occurrences

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

termstree-0.2.tar.gz (7.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page