Skip to main content

Library for parsing terms tree from indented text file and searching texts for the tree terms

Project description

Terms Tree Library

Library to build terms tree from indented text file and to search terms in that tree.

May be used for text labeling/classification tasks.

See also: termstree library

Example

Demo Script:

import termstree

TERMS_TREE_SRC = """
# comment

Asia
    Japan
        Tokyo [url="https://en.wikipedia.org/wiki/Tokyo"]
        Osaka
    China
        Beijing
        Shanghai
Europe
    England
        London

    Germany [url="https://en.wikipedia.org/wiki/Germany"]
        Berlin
        Munich
"""

terms_tree = termstree.build(TERMS_TREE_SRC, terms_normalizer=None)

text = 'During the 16th century, Munich was a centre of the German counter reformation. Europe ...'

for hit in terms_tree.search_in(text):
    print(hit)

Result (list of 'hits' - terms found in the text):

Hit(node=Node('Munich'), dhits=1, ihits=0)
Hit(node=Node('Europe'), dhits=1, ihits=1)
Hit(node=Node('Germany', {'url': 'https://en.wikipedia.org/wiki/Germany'}), dhits=0, ihits=1)

Every hit corresponds to a term from terms tree and has next attributes:

  • node - found term
  • dhits (direct hits) - number of direct term occurrences in the text
  • ihits (indirect hits) - number of term's children occurrences

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

termstree-0.2.tar.gz (7.2 kB view details)

Uploaded Source

File details

Details for the file termstree-0.2.tar.gz.

File metadata

  • Download URL: termstree-0.2.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for termstree-0.2.tar.gz
Algorithm Hash digest
SHA256 1e4197cfb99b517e85e9644a456e368f5a0ac66a15f2670cfb17577295f4de35
MD5 c7d1be8feb435a4b4fdef5a3eeeac435
BLAKE2b-256 4c1b37528e404f0f48d9a2cd6c41a92e82aa6c919d2e4951d83e1c9a4845905b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page