Skip to main content

Takes SeqRecordExpanded objects and creates datasets for phylogenetic software

Project description

Dataset-creator

Dataset creator for phylogenetic software

tests

Travis-CI Build Status Requirements Status Coverage Status
Code issues

package

PyPI Package latest release PyPI Wheel Supported versions Supported implementations

Takes SeqRecordExpanded objects and creates datasets for phylogenetic software

  • Free software: BSD license

Installation

pip install dataset_creator

Usage

The list of SeqRecordExpanded objects should be sorted by gene_code first then by voucher_code.

>>> from seqrecord_expanded import SeqRecord
>>> from dataset_creator import Dataset
>>>
>>> # `table` is the Translation Table code based on NCBI
>>> seq_record1 = SeqRecord('ACTACCTA', reading_frame=2, gene_code='RpS5',
...                         table=1, voucher_code='CP100-10',
...                         taxonomy={'genus': 'Aus', 'species': 'bus'})
>>>
>>> seq_record2 = SeqRecord('ACTACCTA', reading_frame=2, gene_code='RpS5',
...                         table=1, voucher_code='CP100-10',
...                         taxonomy={'genus': 'Aus', 'species': 'bus'})
>>>
>>> seq_record3 = SeqRecord('ACTACCTA', reading_frame=2, gene_code='wingless',
...                         table=1, voucher_code='CP100-10',
...                         taxonomy={'genus': 'Aus', 'species': 'bus'})
>>>
>>> seq_record4 = SeqRecord('ACTACCTA', reading_frame=2, gene_code='winglesss',
...                         table=1, voucher_code='CP100-10',
...                         taxonomy={'genus': 'Aus', 'species': 'bus'})
>>>
>>> seq_records = [
...    seq_record1, seq_record2, seq_record3, seq_record4,
... ]
>>> # codon positions can be 1st, 2nd, 3rd, 1st-2nd, ALL (default)
>>> dataset = Dataset(seq_records, format='NEXUS', partitioning='by gene',
...                   codon_positions='1st',
...                   )
>>> print(dataset.dataset_str)
"""#NEXUS
blah blah
"""

Development

To run the all tests run:

tox

Changelog

0.3.4 (2015-10-02)

  • Fixed bug that did not show DATATYPE=PROTEIN in Nexus files when aminoacid sequences were requested by user.

0.3.3 (2015-10-02)

  • Fixed bug that raised an exception when SeqExpandedRecords did not have data in the taxonomy field.

0.3.2 (2015-10-01)

  • Fixed bug that raised an exception when user wanted partitioned dataset as 1st-2nd and 3rd codon positions of only one codon.

0.3.1 (2015-10-01)

  • Fixed bug that raised an exception when user wanted partitioned dataset by codon positions of only one codon.

0.3.0 (2015-10-01)

  • Accepts voucher code as string that will be used to generate the outgroup string needed for NEXUS and TNT files.

0.2.0 (2015-09-30)

  • Creates datasets as degenerated sequences using the method by Zwick et al.

0.1.1 (2015-09-30)

  • It will issue errors if reading frames are not specified unless they are strictly necessary to build the dataset (datasets need to be divided by codon positions).

  • Added documentation using sphinx-doc

  • Creates datasets as aminoacid sequences.

0.1.0 (2015-09-23)

  • Creates Nexus, Tnt, Fasta, Phylip and Mega dataset formats.

0.0.1 (2015-06-10)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataset-creator-0.3.4.tar.gz (95.6 kB view hashes)

Uploaded Source

Built Distribution

dataset_creator-0.3.4-py2.py3-none-any.whl (16.3 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page