Skip to main content

Build and visualize phylogenetic trees from FASTA files using parsimony and ML-style methods.

Project description

๐ŸŒฟ SimplePhylo

CI PyPI version License: MIT A phylogenetic tree builder for DNA sequence analysis using parsimony and maximum likelihood methods.

Why it matters:

  • ๐Ÿ” Parsimony finds the simplest tree with the fewest evolutionary changesโ€”lightning-fast and perfect for classroom demos or sketching out relationships in agricultural genetics.

  • ๐Ÿงฎ Maximum Likelihood applies explicit statistical models of sequence evolution to infer the tree that best explains your dataโ€”essential for robust analyses in real-world forensics or crop-breeding studies.

  • โš–๏ธ Compare both to gauge confidence, uncover hidden rate variation, and get a fuller picture of evolutionary history.

    Whether youโ€™re teaching high school biology or delving into forensic DNA casework, SimplePhylo empowers you to explore and explain phylogenies with clarity and rigor.


๐Ÿ“‹ Table of Contents

  1. Overview
  2. ๐Ÿš€ Quick Start
  3. ๐Ÿ“ Project Structure
  4. ๐Ÿ“ฆ Dependencies
  5. โš™๏ธ MUSCLE Alignment Notes
  6. ๐Ÿงช Example Workflow
  7. ๐Ÿ“… Future Plans
  8. ๐Ÿงฐ Maintainer
  9. ๐Ÿ“„ License
  10. ๐Ÿ“œ Citations & Attributions

๐Ÿง Overview

SimplePhylo (a.k.a. Evolutionary Tree Analyzer) is a cute, flashโ€‘fast Python package and Dash web app for building phylogenetic trees from FASTA sequences.

It enables you to:

  • Parse DNA sequences in FASTA format.
  • Align them using MUSCLE v3.8.31.
  • Build phylogenetic trees via both Parsimony (UPGMA) and ML-style (distance-based on identity).
  • Visualize and export tree images as .png for teaching slides, lab reports, or research.

Whether youโ€™re running a quick classroom demo or prototyping a research pipeline, SimplePhylo keeps everything modular and accessible.


๐Ÿš€ Quick Start

1. Clone the repo

git clone https://github.com/YourUser/evolutionary-tree-analyzer.git  
cd evolutionary-tree-analyzer  

2. Install dependencies

(pick one)

# From PyPI (stable release)
pip install evolutionary-tree-analyzer  

# From GitHub (editable/dev mode)
pip install -r requirements.txt  
pip install -e .

3. Launch the Dash App

python main.py

4. (Totally optional) Run the notebook

jupyter notebook notebooks/tree_builder.ipynb

โ“ Questions? Drop me a line at biology.mae@gmail.com


๐Ÿ“ Project Structure

evolutionary-tree-analyzer/
|
โ”œโ”€โ”€ .github/                 # CI workflows and configuration
โ”‚   โ””โ”€โ”€ workflows/           # GitHub Actions for testing, packaging
โ”‚       โ””โ”€โ”€ ci.yml           # Continuous integration pipeline
|
โ”œโ”€โ”€ assets/                  # Pipeline Bio logos (sunflower + tree)
|
โ”œโ”€โ”€ bin/                     # MUSCLE binary (v3.8.31) with +x permission
|
โ”œโ”€โ”€ data/                    # Example FASTA inputs
โ”‚   โ”œโ”€โ”€ vertebrate_test.fa   # Small demo FASTA with vertebrate mitochondrion seqs
โ”‚   โ””โ”€โ”€ example_small.fa     # Small FASTA sample for testing
|
โ”œโ”€โ”€ notebooks/               # Jupyter workflow (tree_builder.ipynb)
|
โ”œโ”€โ”€ output/                  # Generated alignments and tree images
โ”‚   โ””โ”€โ”€ tree_images/         # Parsimony & ML PNG outputs
|
โ”œโ”€โ”€ src/                     # Core Python library modules
โ”‚   โ”œโ”€โ”€ fasta_parser.py      # Parse FASTA โ†’ SeqIO records
โ”‚   โ”œโ”€โ”€ align_sequences.py   # Run MUSCLE alignment
โ”‚   โ”œโ”€โ”€ build_tree.py        # Build parsimony & ML-style trees
โ”‚   โ””โ”€โ”€ visualize_tree.py    # Render trees to PNG via Biopython Phylo
|
โ”œโ”€โ”€ tests/                   # pytest test suite for modules
โ”‚   โ””โ”€โ”€ test_align_sequences.py
โ”‚   โ”œโ”€โ”€ test_fasta_parser.py      
โ”‚   โ”œโ”€โ”€ test_build_tree.py        
โ”‚   โ””โ”€โ”€ test_visualize_tree.py    
|
โ”œโ”€โ”€ handle_upload.py         # Utility for file ingestion
โ”œโ”€โ”€ main.py                  # Dash web-app entrypoint
โ”œโ”€โ”€ render.yaml              # Deployment config for Render.com
โ”œโ”€โ”€ requirements.txt         # Runtime dependencies
โ”œโ”€โ”€ setup.py                 # Packaging metadata for PyPI
โ”œโ”€โ”€ CHANGELOG.md             # Release notes
โ”œโ”€โ”€ LICENSE                  # MIT License
โ””โ”€โ”€ README.md                # Project overview and usage


๐Ÿ“ฆ Dependencies

Python โ‰ฅ3.8 and the following PyPI packages:

  • dash & dash-bootstrap-components โ€“ build the interactive web UI
  • biopython โ€“ FASTA parsing, tree building & Phylo rendering
  • matplotlib โ€“ save publication-quality tree images
  • scipy โ€“ compute distance matrices for ML-style trees
  • click โ€“ simple command-line interface (CLI)
Standard library
  • subprocess, os, pathlib โ€“ invoke & locate external tools
  • importlib.util โ€“ dynamic module loading in notebooks
  • logging โ€“ configurable console output

โš™๏ธ MUSCLE Alignment Notes

This project uses MUSCLE v3.8.31 to align DNA sequences.

  • For small files, alignment runs automatically in Python
  • For large files, use the batch script: align_manual.bat

You must have MUSCLE installed and accessible from your system's PATH for alignment.

๐Ÿ’ก Ensure the muscle.exe binary is placed in: C:\Program Files\muscle\muscle.exe Or edit the .bat file to reflect the correct path.

MUSCLE citation: Edgar, R.C. (2004) Nucleic Acids Res 32(5):1792โ€“1797. http://www.drive5.com/muscle

โœจPlease cite this work if you use the alignment functionality in your research or publications.

๐Ÿง  Tips for Large Input Files

  • Work smaller first: debug your pipeline on 5โ€“10 sequences before scaling up.
  • Split & conquer: break multi-FASTA into chunks (seqkit split2 or bash loops).
  • Memory guardrails: MUSCLE can spike RAM on huge alignmentsโ€”keep input FASTA under 50 MB.
  • Use the batch script (align_manual.bat or your own shell wrapper) for > 1000 sequences.

๐Ÿงช Example Workflow

  1. Drop your .fasta file into the data/ folder
  2. Launch the app or notebook
  3. Youโ€™ll get:
  • A multiple sequence alignment (FASTA)
  • Two phylogenetic trees (Parsimony & ML-style)
  • PNG files saved in output/tree_images/

Perfect for: ๐Ÿงฌ Biology class demonstrations ๐Ÿงช Research prototyping ๐Ÿ“š Curriculum development ๐Ÿ’ก Student-led investigations


๐Ÿงฐ Future Plans

  • Add support for bootstrap analysis
  • Export PDF/HTML reports
  • Add tree comparison metrics (e.g., RF distance) Want to see SimplePhylo improved? I welcome all feedback and I can't wait to hear from you!

๐Ÿ‘ฉโ€๐Ÿ’ป Maintainer

Pipeline Bio โ€“ Simple Bioinformatics for Educators
๐ŸŒป Sunflower logo with a phylogenetic tree
๐Ÿ“ซ Contact: biology.mae@gmail.com
๐Ÿ›’ Teachers Pay Teachers Store: Pipeline Bio
๐Ÿ“Œ #teacherspayteachers


๐Ÿ“„ License

This project is licensed under the MIT License. See the LICENSE file for more details.


๐Ÿงพ Citations & Attributions

This project uses the MUSCLE alignment tool developed by Robert C. Edgar.

Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5):1792โ€“1797.
http://www.drive5.com/muscle

Please cite this work if you use the alignment functionality in your research or publications.


๐Ÿงฌ Happy tree building! ยฉ 2025 Mae Warner (Pipeline Bio). All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evolutionary_tree_analyzer-1.1.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evolutionary_tree_analyzer-1.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file evolutionary_tree_analyzer-1.1.0.tar.gz.

File metadata

File hashes

Hashes for evolutionary_tree_analyzer-1.1.0.tar.gz
Algorithm Hash digest
SHA256 1ac145ee9197986d4485a8cada21cf35f90a9ba7174dc84cbbaa5106dc33000d
MD5 e33c8f60406b0a32085257103cceff06
BLAKE2b-256 0b0d74f85d58b5dac72c4c450a390064561c7cb3b8d5f9546778642ed20bbda5

See more details on using hashes here.

File details

Details for the file evolutionary_tree_analyzer-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for evolutionary_tree_analyzer-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f8c7bf7818fe69d8c83f1fef50de08d7e906d07aad6706a3c59f79921d0235c6
MD5 028ca2d31e9f9f83a4eaac2a8dbaeaf2
BLAKE2b-256 a322a81cb5a5839b0b70cad55e3f08d99a096479d13fa4cfa08863a49e20ede9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page