Skip to main content

Simple, minimal, but powerful tools to handle any kind of hierarchical (tree) structures

Project description

Generic tree utilities for Python

Trees are one of the most ubiquitous data structures. It is amazing how often we as programmers tend to reimplement the same algorithms for different tree formats and stuctures.

This module defines generic tree-traverse and tree-reduce algorithms that can be used with any tree-like object such as filesystem paths, lists, nested dictionaries an expression tree or even specialized tree classes! The only thing that must be provided is a function to get child nodes from a parent node.

Also, trees are usually represented in some fields (such as bioinformatics) in the newick format, which is nontrivial to parse, so this module includes a function to do this.

Usage and examples

Install from PyPi:

pip install treet

Import the basic functions, traverse, reduce and parse_newick:

import treet

Use with any kind of structured tree!

Any kind of structured data is supported, in this case, nested dictionaries:

tree = {
    'label':'A', 'children':[
        {'label':'B', 'children':[]},
        {'label':'C', 'children': [
            {'label':'D', 'children':[]}, 
            {'label':'E', 'children':[]}
        ]}
    ]
}

def children(node):
    return node['children']

[node['label'] 
    for node in treet.traverse(tree, children, mode='inorder')]

# Output --> ['B, 'A', 'D', 'C', 'E']

def as_list(node, children):
    if not children:
        return node['label']
    else:
        return children

treet.reduce(tree, children, reduce_fn=as_list)

# Output --> ['B, ['D', 'E']]

Even with user-defined classes!

Dump a tree in a specialized class format to a string in the newick format.

class Tree:
    def __init__(self, label, children=None):
        self.label = label
        self.children = children if children else []

    def is_leaf(self):
        return len(self.children) == 0

tree = Tree('A', [
        Tree('B'),
        Tree('C',[Tree('D'),Tree('E')])
    ]
)

def get_children(node):
    return node.children

def node_to_newick(node, children):
    if node.is_leaf():
        return node.label
    else:
        return f"({','.join(children)})"


treet.reduce(tree, get_children, node_to_newick)

# Output --> '(B,(D,E))'

Parse a newick-formatted tree structure

Assemble the Newick string to a custom data format:

def parse_node_data(data_string):
    '''
    Example: 
      'data1=xx,data2=yy' 
        -> {'data1':'xx', 'data2': 'yy'}
    '''
    items = data_string.split(',')
    key_value_pairs = (item.split('=') for item in items)
    return dict(key_value_pairs)

def parse_branch_length(length_str):
    return float(length_str) if length_str else 0.0

def tree_builder(label, children, branch_length, node_data):
    return {
        'label': label,
        'length': branch_length,
        'data': node_data,
        'children': children}

newick = "(A:0.2[dat=23,other=45], B:12.4[dat=122,other=xyz])root[x=y];"

treet.parse_newick(
    newick,
    aggregator=tree_builder,
    feature_parser=parse_node_data,
    distance_parser=parse_branch_length
)

# Output ->
{'label': 'root', 'length':0.0, 'data': {'x':'y'},
 'children': [
    {'label': 'A', 'length':0.2, 'data':{'dat':'23','other':'45'}, 
     'children': []},
    {'label': 'B', 'length':12.4, 'data':{'dat':'122','other':'xyz'},
     'children': []}, 
]}

Compose to perform complex algorithms

Get the subtree induced by a subset of the leaves:

tree = (('A',('B',('C','D'))),'E')

def is_leaf(node): 
    return isinstance(node, str)

def get_children(node):
    return node if not is_leaf(node) else []

def induced_subtree(leafs):
    def induced_subtree_generator(node, children):
        if children:
            return tuple(ch for ch in children if not ch is None)
        else:
            return node if node in leafs else None
    return induced_subtree_generator

leafs = ['B', 'D', 'E']
induced = treet.reduce(tree, get_children, induced_subtree(leafs))
print(induced)

# Output --> ((('B',('D',)),),'E')


def merge_unary_nodes(node, children):
    if is_leaf(node):
        return node

    new_children = [
        ch[0] if (len(ch) == 1) else ch
        for ch in children
    ]
    return tuple(new_children)

treet.reduce(induced, get_children, merge_unary_nodes)

# Output --> (('B','D'),'E')

Use even with filesystem paths!

Traverse the /usr directory in breadth-first order:

from pathlib import Path

def enter_folder(path):
    path = Path(path)
    return list(path.iterdir()) if path.is_dir() else []

for item in treet.traverse('/usr', enter_folder, mode='breadth_first'):
    print(item)

# Output -->
# /
# /proc
# /usr
# ...
# /usr/share
# /usr/bin
# /usr/sbin
# ...
# /usr/bin/jinfo
# /usr/bin/m2400w
# ...

Meta

Author: Ad115 - Githuba.garcia230395@gmail.com

Distributed under the MIT license. See LICENSE more information.

Contributing

To run tests: pytest treet/* --hypothesis-show-statistics --verbose

To run static type check: mypy treet/*.py

To run coverage analysis: coverage run --source=. -m pytest treet/* --hypothesis-show-statistics --verbose

  1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
  2. Fork the repository on GitHub to start making your changes to a feature branch, derived from the master branch.
  3. Write a test which shows that the bug was fixed or that the feature works as expected.
  4. Send a pull request and bug the maintainer until it gets merged and published.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treet-1.0.5.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

treet-1.0.5-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file treet-1.0.5.tar.gz.

File metadata

  • Download URL: treet-1.0.5.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.3

File hashes

Hashes for treet-1.0.5.tar.gz
Algorithm Hash digest
SHA256 11f72f8b66120ab57317b138f587a296d41338a50178cf6469ffb701a4a29877
MD5 9ccb185cfec6f659bc9b10c0d3abf38e
BLAKE2b-256 59e884cb731c2c0a736de73dc98d8a22cb90afaa6535f0ece54ec4cf348c0880

See more details on using hashes here.

File details

Details for the file treet-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: treet-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.3

File hashes

Hashes for treet-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f6e9d7f0ae92b2b502dea1dcf400206d58c55b42e4370c382283a9bd2a95366f
MD5 92498ac2fd87bf6557b3461113a61554
BLAKE2b-256 1a043f01f958ee13e64a86b9cea611f02bca6584c59bf05e04e5ad3c9db98cc0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page