Skip to main content

Library for parsing terms tree from indented text file and searching texts for the tree terms

Project description

Terms Tree Library

Library to build terms tree from indented text file and to search terms in that tree.

May be used for text labeling/classification tasks.

See also: termstree library

Example

Demo Script:

import termstree

TERMS_TREE_SRC = """
# comment

Asia
    Japan
        Tokyo [url="https://en.wikipedia.org/wiki/Tokyo"]
        Osaka
    China
        Beijing
        Shanghai
Europe
    England
        London

    Germany [url="https://en.wikipedia.org/wiki/Germany"]
        Berlin
        Munich
"""

terms_tree = termstree.build(TERMS_TREE_SRC, terms_normalizer=None)

text = 'During the 16th century, Munich was a centre of the German counter reformation. Europe ...'

for hit in terms_tree.search_in(text):
    print(hit)

Result (list of 'hits' - terms found in the text):

Hit(node=Node('Munich'), dhits=1, ihits=0)
Hit(node=Node('Europe'), dhits=1, ihits=1)
Hit(node=Node('Germany', {'url': 'https://en.wikipedia.org/wiki/Germany'}), dhits=0, ihits=1)

Every hit corresponds to a term from terms tree and has next attributes:

  • node - found term
  • dhits (direct hits) - number of direct term occurrences in the text
  • ihits (indirect hits) - number of term's children occurrences

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for termstree, version 0.2
Filename, size File type Python version Upload date Hashes
Filename, size termstree-0.2.tar.gz (7.2 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page