Skip to main content

A python tool for generating a Newick formatted tree from alist of classifications

Project description

treemaker

A Python library for creating a Newick formatted tree from a set of classification strings (e.g. a taxonomy)

Build Status Coverage Status DOI status

treemaker is a Python library to convert a text-based classification schema into a Newick file for use in phylogenetic and bioinformatic programs.

Research in linguistics or cultural evolution often produces or uses tree taxonomies or classifications. However, these are usually not in a format readily available for use in programs that can understand and manipulate trees. For example, the global taxonomy of languages published by the Ethnologue classifies languages into families and subgroups using a taxonomy string e.g. the language Kalam is classified as "Trans-New Guinea, Madang, Kalam-Kobon", while Mauwake is classified as "Trans-New Guinea, Madang, Croisilles, Pihom", and Kare is "Trans-New Guinea, Madang, Croisilles, Kare". This classification indicates that while all these languages are part of the Madang subgroup of the Trans-New Guinea language family, Kare and Mauwake are more closely related (as they belong to the Croisilles subgroup).

Other publications use a tabular indented format to demarcate relationships, such as the example in Figure 1 from Stephen Wurm's classification of his proposed Yele-Solomons language phylum (Wurm 1975).

Both the taxonomy string and tabular format however are hard to load into software packages that can analyse, compare, visualise and manipulate trees. treemaker aims to make this easy by converting taxonomic data into Newick and Nexus (Maddison 1997) formats commonly used by phylogenetic manipulation programs.

Converting a Taxonomy to a Tree:

treemaker can convert a text file with a taxonomy to a tree. These taxonomies can easily be obtained from Ethnologue or manually entered, such as this example from Wurm's (outdated) classification of Yele-Solomons in Figure 1:

Bilua       Yele-Solomons, Central Solomon
Baniata     Yele-Solomons, Central Solomon
Lavukaleve  Yele-Solomons, Central Solomon
Savosavo    Yele-Solomons, Central Solomon
Kazukuru    Yele-Solomons, Kazukuru
Guliguli    Yele-Solomons, Kazukuru
Dororo      Yele-Solomons, Kazukuru
Yele        Yele-Solomons

treemaker can then generate a Newick representation:

((Baniata,Bilua,Lavukaleve,Savosavo),(Dororo,Guliguli,Kazukuru),Yele);

...which can then be loaded into phylogenetic programs to visualise or manipulate as in Figure 2.

treemaker has been used to enable the analyses in (Bromham et al. 2018), and a number of forthcoming articles.

Example of a language taxonomy in indented format from Wurm (1975).

Tree visualisation of the relationships between the putative Yele-Solomons languages.

Installation:

Installation is only a pip install away:

pip install treemaker

Or from git:

git clone https://github.com/SimonGreenhill/treemaker/ treemaker
cd treemaker
python setup.py install

Usage: Command line:

Basic usage:

> treemaker

usage: treemaker [-h] [-o OUTPUT] [-m {nexus,newick}] [--labels] input

e.g. Given a text file:

LangA   Indo-European, Germanic
LangB   Indo-European, Germanic
LangC   Indo-European, Romance
LangD   Indo-European, Anatolian

... then you can build a taxonomy/classification tree from that as follows:

> treemaker classification.txt
(LangD,(LangA,LangB),LangC);

# with nodelabels:
> treemaker --labels classification.txt
(LangD,(LangA,LangB)Germanic,LangC)Indo-European;

> treemaker -m nexus classification.txt

#NEXUS

begin trees;
   tree root = (LangD,(LangA,LangB),LangC);
end;

To write to file:

> treemaker classification.txt
(LangD,(LangA,LangB),LangC);

> treemaker classification.txt -o classification.nex

Usage: Library:

from treemaker import TreeMaker

generate a tree manually:

from treemaker import TreeMaker

t = TreeMaker()
t.add('A1', 'family a, subgroup 1')
t.add('A2', 'family a, subgroup 2')
t.add('B1a', 'family b, subgroup 1')
t.add('B1b', 'family b, subgroup 1')
t.add('B2', 'family b, subgroup 2')

print(t.write())

Add from a list:

from treemaker import TreeMaker

taxa = [
    ('A1', 'family a, subgroup 1'),
    ('A2', 'family a, subgroup 2'),
    ('B1a', 'family b, subgroup 1'),
    ('B1b', 'family b, subgroup 1'),
    ('B2', 'family b, subgroup 2'),
]

t = TreeMaker()
t.add_from(taxa)

print(t.write())

API Documentation:

The API is documented here.

Running treemaker's tests:

To run treemaker's tests simply run:

> make test
# or
> python setup.py test
# or
> python treemaker/test_treemaker.py

Version History:

  • v1.4: fix bug with no terminating semicolon in nexus file output.
  • v1.3: add nodelabels support, add some rudimentary input checking.

Support:

For questions on how to use or update this, feel free to open an issue. I'll get to it as soon as I can.

Acknowledgements:

Thank you to Richard Littauer, Mitsuhiro Nakamura, and Dillon Niederhut.

References:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treemaker-1.4.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

treemaker-1.4-py2.py3-none-any.whl (11.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file treemaker-1.4.tar.gz.

File metadata

  • Download URL: treemaker-1.4.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.5.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.0

File hashes

Hashes for treemaker-1.4.tar.gz
Algorithm Hash digest
SHA256 2f5bc4669c5a49b35f877c3a317c0f683caeae77ec4509df095969a684d154a1
MD5 08576edf55ea17ca84675f96e0d0350e
BLAKE2b-256 2ed720558877bc642ba232bc7a5778482be0015db20ddf62fb929a0630ac76e5

See more details on using hashes here.

File details

Details for the file treemaker-1.4-py2.py3-none-any.whl.

File metadata

  • Download URL: treemaker-1.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.5.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.0

File hashes

Hashes for treemaker-1.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c9153e89d21180d8d2609138ec5e08ab2e91abb6b52a98073fc54b058d063bdd
MD5 5db531527921efe833e54aad1546cda4
BLAKE2b-256 eb49da6006e87f188338ecff5329b1f042bd3e2f5f83155d7f841db1eff70245

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page