This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

Dataset creator for phylogenetic software

tests
package

Dataset-Creator - easy way to creat phylogenetic datasets in many formats

Documentation: dataset-creator.readthedocs.org

Takes SeqRecordExpanded objects and creates datasets for phylogenetic software such as MrBayes, TNT, BEAST, RAxML, MEGA, etc.

Features

  • Creates datasets in the following formats: FASTA, GenBankFASTA, NEXUS, TNT, MEGA and Phylip.
  • Can generate datasets of DNA and aminoacid sequences.
  • Can generate datasets of degenerated sequences.
  • It can partition datasets by codon positions or by gene.

Quick start

First:

pip install dataset_creator

Then the list of SeqRecordExpanded objects should be sorted by gene_code first then by voucher_code.

>>> from seqrecord_expanded import SeqRecord
>>> from dataset_creator import Dataset
>>>
>>> # `table` is the Translation Table code based on NCBI
>>> seq_record1 = SeqRecord('ACTACCTA', reading_frame=2, gene_code='RpS5',
...                         table=1, voucher_code='CP100-10',
...                         taxonomy={'genus': 'Aus', 'species': 'bus'})
>>>
>>> seq_record2 = SeqRecord('ACTACCTA', reading_frame=2, gene_code='RpS5',
...                         table=1, voucher_code='CP100-10',
...                         taxonomy={'genus': 'Aus', 'species': 'bus'})
>>>
>>> seq_record3 = SeqRecord('ACTACCTA', reading_frame=2, gene_code='wingless',
...                         table=1, voucher_code='CP100-10',
...                         taxonomy={'genus': 'Aus', 'species': 'bus'})
>>>
>>> seq_record4 = SeqRecord('ACTACCTA', reading_frame=2, gene_code='winglesss',
...                         table=1, voucher_code='CP100-10',
...                         taxonomy={'genus': 'Aus', 'species': 'bus'})
>>>
>>> seq_records = [
...    seq_record1, seq_record2, seq_record3, seq_record4,
... ]

>>> # codon positions can be 1st, 2nd, 3rd, 1st-2nd, ALL (default)
>>> dataset = Dataset(seq_records, format='TNT', partitioning='by codon position',
...                   codon_positions='ALL')

>>> dataset = Dataset(seq_records, format='PHYLIP', partitioning='1st-2nd, 3rd',
...                   codon_positions='ALL')

>>> dataset = Dataset(seq_records, format='NEXUS', partitioning='by gene',
...                   codon_positions='1st')

>>> dataset = Dataset(seq_records, format='NEXUS', partitioning='by gene',
...                   codon_positions='ALL', aminoacids=True)

>>> # Produce a dataset of degenerated sequences using the 'S' method:
>>> dataset = Dataset(seq_records, format='NEXUS', partitioning='by gene',
...                   codon_positions='ALL', degenerate='S')

>>> print(dataset.dataset_str)
#NEXUS
blah blah ...

Further documentation can be found at dataset-creator.readthedocs.org

Development

To run the all tests run:

tox

Changelog

0.3.14 (2016-09-11)

  • upgrade seqrecord-expanded.

0.3.13 (2016-08-27)

  • Fixed bug that did not replace all white spaces for underscores in taxon names when building datasets. Due to taxon names with whitespaces, the NEXUS interpreter assumed that part of the name was actually part of the sequence, rendering the sequence invalid.
  • Added some dependencies to requirements.

0.3.11 (2016-06-25)

  • Upgraded seqrecord-expanded requirement.

0.3.10 (2015-12-01)

  • Fixed bug that produced FASTA sequences with underscores. Now all voucher codes will have their dashes replaced by underscores.

0.3.9 (2015-11-06)

  • Create datasets using the GenBankFASTA format. This format has the following extra info in the description of sequences: >Aus_aus_CP100-10 [org=Aus aus] [Specimen-voucher=CP100-10] [note=ArgKin gene, partial cds.] [Lineage=]

0.3.8 (2015-10-30)

  • Fixed making dataset as aminoacid seqs for MEGA format.
  • Fixed making dataset as degenerated seqs for MEGA format.
  • Fixed making dataset as degenerated seqs for TNT format.
  • Fixed making dataset as aa seqs with specified outgroup for TNT format.
  • Raise ValueError when asked to degenerate seqs that will go to partitioning based on codon positions.
  • Dataset creator returns warnings if translated sequences have stop codons ‘*’.
  • Cannot generate MEGA datasets with partitioning.

0.3.7 (2015-10-30)

  • Fixed 2nd, 3rd codon positions bug that returned empty FASTA datasets.

0.3.6 (2015-10-30)

  • Fixed 3rd codon positions bug that returned FASTA datasets with 3rd codon positions even if they were not needed.

0.3.5 (2015-10-29)

  • If user provides outgroup, then TNT datasets will place its sequences in first position in the dataset blocks.

0.3.4 (2015-10-02)

  • Fixed bug that did not show DATATYPE=PROTEIN in Nexus files when aminoacid sequences were requested by user.

0.3.3 (2015-10-02)

  • Fixed bug that raised an exception when SeqExpandedRecords did not have data in the taxonomy field.

0.3.2 (2015-10-01)

  • Fixed bug that raised an exception when user wanted partitioned dataset as 1st-2nd and 3rd codon positions of only one codon.

0.3.1 (2015-10-01)

  • Fixed bug that raised an exception when user wanted partitioned dataset by codon positions of only one codon.

0.3.0 (2015-10-01)

  • Accepts voucher code as string that will be used to generate the outgroup string needed for NEXUS and TNT files.

0.2.0 (2015-09-30)

  • Creates datasets as degenerated sequences using the method by Zwick et al.

0.1.1 (2015-09-30)

  • It will issue errors if reading frames are not specified unless they are strictly necessary to build the dataset (datasets need to be divided by codon positions).
  • Added documentation using sphinx-doc
  • Creates datasets as aminoacid sequences.

0.1.0 (2015-09-23)

  • Creates Nexus, Tnt, Fasta, Phylip and Mega dataset formats.

0.0.1 (2015-06-10)

  • First release on PyPI.
Release History

Release History

0.3.14

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.13

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.12

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.11

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.10

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.9

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.8

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.7

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.6

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
dataset_creator-0.3.14-py2.py3-none-any.whl (20.9 kB) Copy SHA256 Checksum SHA256 3.4 Wheel Sep 11, 2016
dataset-creator-0.3.14.tar.gz (133.3 kB) Copy SHA256 Checksum SHA256 Source Sep 11, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting