A tool for converting and manipulating phylogenetic data

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

panphylo

panphylo is a free and open-source tool for converting and manipulating phylogenetic data, especially for non-biological datasets.

Panfilo, from Boccaccio's Decameron, as pictured in Bodleian Library MS. Holkham 49, fol. 148r

There are a wide variety of similar tools for both local and remote execution. panphylo is distinguished by its focus on phylogenetic data of non-biological origin, especially in the fields of historical linguistics and stemmatology. The standard data type is the standard with attention to multistate characters and one of the most supported formats is textual tabular (e.g. CSV), allowing an easier integration with the tools used in these areas. Likewise, our library offers off-the-shelf support for data manipulation, such as automatic binarization (with or without addition of ascertainment characters), label adaptation to the restrictions of many programs (for example, remapping Unicode sequences to ASCII but keeping the uniqueness of the identifiers), removal of constant features, addition of characters for ascertainment correction, and more.

The library is organized following a structure inspired by the well known pandoc tool for converting between textual document formats, that is, in "filters" that convert different formats to an internal representation with multistates. Other "filters" allow you to convert this internal representation into different formats and dialects, carrying out the manipulations requested by the user. Since the same data format can be indicated as input and output, the tool can also be used to tidy up existing files.

The library's name is an homage and reference to "pandoc". It is also a reminder of its origins in the field of stemmatology, referring to Panfilo ("the lover of all"), one of the protagonists of Boccaccio's Decameron. The picture used in this documentatio is taken from a manuscript of the work, the beautiful Bodleian Library MS. Holkham 49 (fol. 148r).

Installation

In any standard Python environment, panphylo can be installed with:

pip install panphylo

Using `panphylo`

If no input file is specified, input is read from stdin. Output goes to stdout by default. For output to a file, use the -o option:

panphylo -o data.nex data.csv

The format of the input and output can be specified explicitly using command-line options. The input format can be specified using the -f/--from option, the output format using the -t/--to option. Thus, to convert data.nex from NEXUS to PHYLIP, you could type:

panphylo -f nexus -t phylip data.nex

Supported input and output formats are listed below under "Options" (see -f for input formats and -t for output formats). If the input format is not specified explicitly, panphylo will attempt to guess it from the contents of the file. If the output format is not specified, it will attempt to guess it from the extension of the filename, defaulting to CSV.

As for character encoding, panphylo uses the UTF-8 character encoding for output, which in most cases will be restricted to ASCII characters. If your local character encoding is not UTF-8, you should pipe the output through tools such as iconv. The input encoding can be specified with the -e option, and will be autodetected (with the chardet library) if not provided.

The internal representation used by panphylo is exclusively multistate, even when converting from and to binary data, and defaults to multistate output. To binarize the data (or to "rebinarize" it allowing to perform the implemented manipulations), the b option can be used.

Options

Option	Help
`--input` FILE	Read input from FILE. If FILE is `-`, input will come from stdin.
`-o`, `--output` FILE	Write output to FILE instead of stdout. If FILE is `-`, output will go to stdout.
`-b`, `--binarize`	Binarizes the output. The specification on whether and how to add ascertainment correction is specified by the `--ascertainment` option.
`-f`, `--from` FORMAT	Specify the input format. Valid FORMAT choices are `auto`, `tabular`, `csv`, `tsv`, `nexus`, and `phylip`; `auto` will attempt to autodetect the format from the contents of the file, while `tabular` will attempt to detected the delimiter (comma or tabulation) in tabular textual files. Defaults to `auto`.
`-t`, `--to` FORMAT	Specify the output format. Valid FORMAT choices are `auto`, `csv`, `tsv`, `nexus`, and `phylip`; `auto` will decide on the format based ont he file extension. Defaults to `csv`.
`-e`, `--encoding` ENCODING	Specify the character encoding for the input, using the standard character encoding names in Python. Defaults to autodetection with the `chardet` library.
`--i-taxa` LABEL	Input label, column, or name for taxa. If not provided, the library will attempt to autodetect it. Does not apply to all formats.
`--i-char` LABEL	Input label, column, or name for characters. If not provided, the library will attempt to autodetect it. Does not apply to all formats.
`--i-state` LABEL	Input label, column, or name for states. If not provided, the library will attempt to autodetect it. Does not apply to all formats.
`--o-taxa` LABEL	Output label, column, or name for taxa. If not provided, defaults to `"Taxon"`. Does not apply to all formats.
`--o-char` LABEL	Output label, column, or name for characters. If not provided, defaults to `"Character"`. Does not apply to all formats.
`--o-state` LABEL	Output label, column, or name for states. If not provided, defaults to `"State"`. Does not apply to all formats.
`--slug_taxa` LEVEL	Level of "slugging" (simplification) of taxa names. Valid LEVEL options are `none`, `simple`, and `full`.
`--slug_chars` LEVEL	Level of "slugging" (simplification) of character names. Valid LEVEL options are `none`, `simple`, and `full`.
`-v`, `--verbosity` LEVEL	Set the logging level. Valid LEVEL options, following the Python `logging` library, are `"debug"`, `"info"`, `"warning"`, `"error"`, `"critical"`.

Alternatives

As mentioned, there are many tools available for both local and remote execution that somehow overlap with panphylo. They usually support more formats and provide better support for genetic data, but don't always offer methods for data manipulation such as binarization and debinarization, or label conversion. Among the most used tools are:

The most used tool, readseq, available at a number of online interfaces such as [https://mafft.cbrc.jp/alignment/server/cgi-bin/readseq.txt] and [http://avermitilis.ls.kitasato-u.ac.jp/readseq.cgi]
The EMBOSS seqret tool, partly derived from readseq, with an online interface at [https://www.ebi.ac.uk/Tools/sfc/emboss_seqret/]
The web interface at LIRMM [http://phylogeny.lirmm.fr/phylo_cgi/data_converter.cgi]
The phyDat methods in the phangorn R library, at [https://rdrr.io/cran/phangorn/man/phyDat.html]

Changelog

Version 0.2:

Add Brython support for running locally in a browser and in the web interface
Corrections to output generation, mostly related to multistate data (note that it is not recommended to run on multistate data yet)

Version 0.1:

First public release

Community guidelines

While the author can be contacted directly for support, it is recommended that third parties use GitHub standard features, such as issues and pull requests, to contribute, report problems, or seek support.

Contributing guidelines, including a code of conduct, can be found in the CONTRIBUTING.md file.

Author and citation

The library is developed by Tiago Tresoldi (tiago.tresoldi@lingfil.uu.se). The library is developed in the context of the Cultural Evolution of Texts project, with funding from the Riksbankens Jubileumsfond (grant agreement ID: MXM19-1087:1).

If you use panphylo, please cite it as:

Tresoldi, T., (2022). panphylo: a tool for converting and manipulating phylogenetic data. Version 0.3. Uppsala: Uppsala Universitet

In BibTeX:

@misc{Tresoldi2021panphylo,
  url = {https://github.com/tresoldi/panphylo},
  year = {2022},
  author = {Tiago Tresoldi},
  title = {panphylo: a tool for converting and manipulating phylogenetic data. Version 0.3.},
  address = {Uppsala},
  publisher = {Uppsala Universitet}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.3

Jul 15, 2022

0.2

Dec 1, 2021

0.1

Nov 26, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

panphylo-0.3.tar.gz (23.8 kB view hashes)

Uploaded Jul 15, 2022 Source

Built Distribution

panphylo-0.3-py3-none-any.whl (23.8 kB view hashes)

Uploaded Jul 15, 2022 Python 3

Hashes for panphylo-0.3.tar.gz

Hashes for panphylo-0.3.tar.gz
Algorithm	Hash digest
SHA256	`4f00439c42398e156290861d20a0f77616acd9ba7ee9d583d11116fc9ae39195`
MD5	`d4454d27be298a25a4bdb292a98fed6e`
BLAKE2b-256	`7c8cc893fb4c4ecb03ac036e77fc51e2ae06eba5d86010b893bce6a601a49797`

Hashes for panphylo-0.3-py3-none-any.whl

Hashes for panphylo-0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`36bf23bb9398dcee82757fb27121a38362cf6f721ad7bf9a697f49abb611469a`
MD5	`ebf1341d1c99f0dbfc0c8c464c12fa20`
BLAKE2b-256	`43f515004e746c0647addae1ce0e345aa7b1be799fb7a1859106bbe42f0c42b6`

panphylo 0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

panphylo

Installation

Using `panphylo`

Options

Alternatives

Changelog

Community guidelines

Author and citation

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

panphylo 0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

panphylo

Installation

Using panphylo

Options

Alternatives

Changelog

Community guidelines

Author and citation

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Using `panphylo`