Library for parsing terms tree from indented text file and searching texts for the tree terms
Project description
Terms Tree Library
Library to build terms tree from indented text file and to search terms in that tree.
May be used for text labeling/classification tasks.
See also: termstree library
Example
Demo Script:
import termstree TERMS_TREE_SRC = """ # comment Asia Japan Tokyo [url="https://en.wikipedia.org/wiki/Tokyo"] Osaka China Beijing Shanghai Europe England London Germany [url="https://en.wikipedia.org/wiki/Germany"] Berlin Munich """ terms_tree = termstree.build(TERMS_TREE_SRC, terms_normalizer=None) text = 'During the 16th century, Munich was a centre of the German counter reformation. Europe ...' for hit in terms_tree.search_in(text): print(hit)
Result (list of 'hits' - terms found in the text):
Hit(node=Node('Munich'), dhits=1, ihits=0)
Hit(node=Node('Europe'), dhits=1, ihits=1)
Hit(node=Node('Germany', {'url': 'https://en.wikipedia.org/wiki/Germany'}), dhits=0, ihits=1)
Every hit corresponds to a term from terms tree and has next attributes:
- node - found term
- dhits (direct hits) - number of direct term occurrences in the text
- ihits (indirect hits) - number of term's children occurrences
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size termstree-0.2.tar.gz (7.2 kB) | File type Source | Python version None | Upload date | Hashes View |