A python module to read and write the Newick format
Project description
python-newick
python package to read and write the Newick format.
Reading Newick
Since Newick specifies a format for a set of trees, all functions to read Newick return
a list
of newick.Node
objects.
-
Reading from a string:
>>> from newick import loads >>> trees = loads('(A,B,(C,D)E)F;') >>> trees[0].name 'F' >>> [n.name for n in trees[0].descendants] ['A', 'B', 'E']
-
Reading from a
file
-like object:>>> import io >>> from newick import load >>> with io.open('fname', encoding='utf8') as fp: ... trees = load(fp)
-
Reading from a path:
>>> from newick import read >>> trees = read('fname') >>> import pathlib >>> trees = read(pathlib.Path('fname'))
Supported Newick dialects
The "Newick specification" states
Comments are enclosed in square brackets and may appear anywhere
This has spawned a host of ad-hoc mechanisms to insert additional data into Newick trees.
The newick
package allows to deal with comments in two ways.
- Ignoring comments:
>>> newick.loads('[a comment](a,b)c;', strip_comments=True)[0].newick '(a,b)c'
- Reading comments as node annotations: Several software packages use Newick comments to
store node annotations, e.g. *BEAST, MrBayes or TreeAnnotator. Provided there are no
comments in places where they cannot be interpreted as node annotations,
newick
supports reading and writing these annotations:>>> newick.loads('(a[annotation],b)c;')[0].descendants[0].name 'a' >>> newick.loads('(a[annotation],b)c;')[0].descendants[0].comment 'annotation' >>> newick.loads('(a[annotation],b)c;')[0].newick '(a[annotation],b)c'
Annotations may come before or after the:
which separates node label and length: -
>>> newick.loads('(a[annotation]:2,b)c;')[0].descendants[0].length 2.0 >>> newick.loads('(a:[annotation]2,b)c;')[0].descendants[0].length 2.0
but if they preceed the colon, they must not contain:
: -
>>> newick.loads('(a[annotation:]:2,b)c;')[0].descendants[0].comment ... ValueError: Node names or branch lengths must not contain ":"
Note that square brackets inside quoted labels will not be interpreted as comments or annotations:
>>> newick.loads("('a[label]',b)c;")[0].descendants[0].name
"'a[label]'"
>>> newick.loads("('a[label]',b)c;")[0].newick
"('a[label]',b)c"
Writing Newick
In parallel to the read operations there are three functions to serialize a single Node
object or a list
of Node
objects to Newick format:
dumps(trees) -> str
dump(trees, fp)
write(trees, 'fname')
A tree may be assembled using the factory methods of the Node
class:
Node.__init__
Node.create
Node.add_descendant
Manipulating trees
- Displaying tree topology in the terminal:
>>> import newick >>> tree = newick.loads('(b,(c,(d,(e,(f,g))h)i)a)')[0] >>> print(tree.ascii_art()) ┌─b ────┤ │ ┌─c └─a─┤ │ ┌─d └─i─┤ │ ┌─e └─h─┤ │ ┌─f └───┤ └─g
- Pruning trees: The example below prunes the tree such that
b
,c
andi
are the only remaining leafs.>>> tree.prune_by_names(['b', 'c', 'i'], inverse=True) >>> print(tree.ascii_art()) ┌─b ────┤ │ ┌─c └─a─┤ └─i
- Running a callable on a filtered set of nodes:
>>> tree.visit(lambda n: setattr(n, 'name', n.name.upper()), lambda n: n.name in ['a', 'b']) >>> print(tree.ascii_art()) ┌─B ────┤ │ ┌─c └─A─┤ └─i
- Removing (topologically) redundant internal nodes:
>>> tree.prune_by_names(['B', 'c'], inverse=True) >>> print(tree.ascii_art()) ┌─B ────┤ └─A ──c >>> tree.remove_redundant_nodes(keep_leaf_name=True) >>> print(tree.ascii_art()) ┌─B ────┤ └─c
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for newick-1.3.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fced6f48b045225ccc4ee4cdbfe78c42b538274f1b866c6add349cf02752fdd |
|
MD5 | 336e85251d8acdd62a9fccd8e48dda26 |
|
BLAKE2b-256 | 95114db3f40c604b6cb1a574cbf06f66e217af03835385cca1605530d58cacc7 |