Skip to main content

A nexus (phylogenetics) file reader (.nex, .trees)

Project description

python-nexus

A Generic nexus (.nex, .trees) reader/writer for python.

Build Status codecov PyPI DOI

Description

python-nexus provides simple nexus file-format reading/writing tools, and a small collection of nexus manipulation scripts.

Versions:

  • v2.0:
    • Refactored cli. The package now installs a single command nexus, providing several subcommands.
    • Dropped python 2 compatibility.
  • v1.7:
    • added rudimentary tree handling to NexusWriter objects:

      nex = NexusWriter() nex.trees.append("tree tree1 = (a,b);")

    • added the ability to combine nexuses containing trees

  • v1.63:
    • fixed an issue where the bin directory wasn't packed on py2.7 (thanks @xrotwang)
  • v1.62:
    • cached DataHandler's characters property to speed up.
    • cached DataHandler's symbol property to speed up.
    • cached DataHandler's site parser to speed up.
  • v1.61:
    • fixed an install issue caused by refactoring.
  • v1.6:
    • remove some over-engineered checking on the NexusReader.DataMatrix.characters property
    • major refactoring of reader.py into a handlers subpackage
    • NexusReader.read_string now returns self, such that it can be used as a factory-style method.
    • added rudimentary support for taxon annotations in taxa blocks.
  • v1.53:
    • the character block format string symbols generated by NexusReader.write() no longer includes missing or gap symbols.
    • fix parsing glitch in NexusReader.DataHandler.parse_format_line.
  • v1.51:
    • characters and data block now retain their character labels in NexusReader
  • v1.5:
  • v1.42: minor fix to remove a stray debugging print statement
  • v1.41: minor fix to remove a stray debugging print statement
  • v1.40: major speed enhancement in NexusReader -- a 2 order of magnitude decrease in reading most nexus data blocks.
  • v1.35: fixed nexus_nexusmanip.py utility to handle multiple arguments, and to delete arbitrary sites.
  • v1.34: fixed parsing of malformed taxa blocks.
  • v1.33: fixed bug in taxa labels parser when taxa are listed on one line.

Usage

Reading a Nexus:

>>> from nexus import NexusReader
>>> n = NexusReader.from_file('nexus/examples/example.nex')

You can also load from a string:

>>> n = NexusReader.from_string('#NEXUS\n\nbegin foo; ... end;')

NexusReader will load each of the nexus blocks it identifies using specific handlers.

>>> n.blocks
{'foo': <nexus.handlers.GenericHandler object at 0x7f55d94140f0>}
>>> n = NexusReader('nexus/examples/example.nex')
>>> n.blocks
{'data': <NexusDataBlock: 2 characters from 4 taxa>}

A dictionary mapping blocks to handlers is available at .handlers:

>>> n.handlers
{
    'trees': <class 'nexus.handlers.tree.TreeHandler'>, 
    'taxa': <class 'nexus.handlers.taxa.TaxaHandler'>, 
    'characters': <class 'nexus.handlers.data.CharacterHandler'>, 
    'data': <class 'nexus.handlers.data.DataHandler'>
}

Any blocks that aren't in this dictionary will be parsed using GenericHandler.

NexusReader can then write the nexus to a string using .write() or to another file using .write_to_file(filename):

>>> output = n.write()
>>> # or 
>>> n.write_to_file("mynewnexus.nex")

NOTE: if you want more fine-grained control over generating nexus files, then try NexusWriter discussed below.

Block Handlers:

There are specific "Handlers" to parse certain known nexus blocks, including the common 'data', 'trees', and 'taxa' blocks. Any blocks that are unknown will be parsed with GenericHandler.

ALL handlers extend the GenericHandler class and have the following methods.

  • parse(self, data) parse is called by NexusReader to parse the contents of the block (in data) appropriately.

  • write(self) write is called by NexusReader to write the contents of a block to a string (i.e. for regenerating the nexus format for saving a file to disk)

All blocks have access to the following:

  • The raw block content (as a list of lines) in n.blockname.block
  • A helper function to remove all the comments in a nexus file. n.block.remove_comments

To find out what file the nexus was loaded from:

n.filename
n.short_filename
'example.nex'

generic block handler

The generic block handler simply stores each line of the block in .block:

n.blockname.block
['line1', 'line2', ... ]

data block handler

These are the main blocks encountered in nexus files - and contain the data matrix.

So, given the following nexus file with a data block:

#NEXUS 

Begin data;
Dimensions ntax=4 nchar=2;
Format datatype=standard symbols="01" gap=-;
    Matrix
Harry              00
Simon              01
Betty              10
Louise             11
    ;
End;

begin trees;
    tree A = ((Harry:0.1,Simon:0.2):0.1,Betty:0.2):Louise:0.1);
    tree B = ((Simon:0.1,Harry:0.2):0.1,Betty:0.2):Louise:0.1);
end;

You can do the following:

Find out how many characters:

n.data.nchar
2

Ask about how many taxa:

n.data.ntaxa
4

Get the taxa names:

n.data.taxa
['Harry', 'Simon', 'Betty', 'Louise']

Get the format info:

n.data.format
{'datatype': 'standard', 'symbols': '01', 'gap': '-'}

The actual data matrix is a dictionary, which you can get to in .matrix:

n.data.matrix
{
    'Simon': ['0', '1'],
    'Louise': ['1', '1'],
    'Betty': ['1', '0'],
    'Harry': ['0', '0']
}

Or, you could access the data matrix via taxon:

n.data.matrix['Simon']
['0', '1']

Or even loop over it like this:

for taxon, characters in n.data:
    print taxon, characters

You can also iterate over the sites (rather than the taxa):

for site, data in n.data.characters.items():
    print(site, data)

0 {'Simon': '0', 'Louise': '1', 'Betty': '1', 'Harry': '0'}
1 {'Simon': '1', 'Louise': '1', 'Betty': '0', 'Harry': '0'}

..or you can access the characters matrix directly:

n.data.characters[0]
{'Simon': '0', 'Louise': '1', 'Betty': '1', 'Harry': '0'}

NOTE: that sites are zero-indexed!

trees block handler

If there's a trees block, then you can do the following

You can get the number of trees:

n.trees.ntrees
2

You can access the trees via the .trees dictionary:

n.trees.trees[0]
'tree A = ((Harry:0.1,Simon:0.2):0.1,Betty:0.2):Louise:0.1);'

Or loop over them:

for tree in n.trees:
    print(tree)

taxa block handler

These are the alternate nexus file format found in programs like SplitsTree:

BEGIN Taxa;
DIMENSIONS ntax=4;
TAXLABELS
[1] 'John'
[2] 'Paul'
[3] 'George'
[4] 'Ringo'
;
END; [Taxa]

In a taxa block you can get the number of taxa and the taxa list:

n.taxa.ntaxa
4
n.taxa.taxa
['John', 'Paul', 'George', 'Ringo']

NOTE: with this alternate nexus format the Characters blocks should be parsed by DataHandler.

Writing a Nexus File using NexusWriter

NexusWriter provides more fine-grained control over writing nexus files, and is useful if you're programmatically generating a nexus file rather than loading a pre-existing one.

from nexus import NexusWriter
n = NexusWriter()
#Add a comment to appear in the header of the file
n.add_comment("I am a comment")

Data are added by using the "add" function - which takes 3 arguments, a taxon, a character name, and a value.

n.add('taxon1', 'Character1', 'A')
n.data
{'Character1': {'taxon1': 'A'}}

n.add('taxon2', 'Character1', 'C')
n.add('taxon3', 'Character1', 'A')

Characters and values can be strings or integers

n.add('taxon1', 2, 1)
n.add('taxon2', 2, 2)
n.add('taxon3', 2, 3)

NexusWriter will interpolate missing entries (i.e. taxon2 in this case)

n.add('taxon1', "Char3", '4')
n.add('taxon3', "Char3", '4')

... when you're ready, you can generate the nexus using make_nexus or write_to_file:

data = n.make_nexus(interleave=True, charblock=True)
n.write_to_file(filename="output.nex", interleave=True, charblock=True)

... you can make an interleaved nexus by setting interleave to True, and you can include a character block in the nexus (if you have character labels for example) by setting charblock to True.

There is rudimentary support for handling trees e.g.:

n.trees.append("tree tree1 = (a,b,c);")
n.trees.append("tree tree2 = (a,b,c);")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for python-nexus, version 2.0.1
Filename, size File type Python version Upload date Hashes
Filename, size python_nexus-2.0.1-py2.py3-none-any.whl (36.6 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size python-nexus-2.0.1.tar.gz (27.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page