Skip to main content

Build and visualize phylogenetic trees from FASTA files using parsimony and ML-style methods.

Project description

๐ŸŒฟ SimplePhylo

CI PyPI version License: MIT A phylogenetic tree builder for DNA sequence analysis using parsimony and maximum likelihood methods.

Why it matters:

  • ๐Ÿ” Parsimony finds the simplest tree with the fewest evolutionary changesโ€”lightning-fast and perfect for classroom demos or sketching out relationships in agricultural genetics.

  • ๐Ÿงฎ Maximum Likelihood applies explicit statistical models of sequence evolution to infer the tree that best explains your dataโ€”essential for robust analyses in real-world forensics or crop-breeding studies.

  • โš–๏ธ Compare both to gauge confidence, uncover hidden rate variation, and get a fuller picture of evolutionary history.

    Whether youโ€™re teaching high school biology or delving into forensic DNA casework, SimplePhylo empowers you to explore and explain phylogenies with clarity and rigor.


๐Ÿ“‹ Table of Contents

  1. Overview
  2. ๐Ÿš€ Quick Start
  3. ๐Ÿ“ Project Structure
  4. ๐Ÿ“ฆ Dependencies
  5. โš™๏ธ MUSCLE Alignment Notes
  6. ๐Ÿงช Example Workflow
  7. ๐Ÿ“… Future Plans
  8. ๐Ÿงฐ Maintainer
  9. ๐Ÿ“„ License
  10. ๐Ÿ“œ Citations & Attributions

๐Ÿง Overview

SimplePhylo (a.k.a. Evolutionary Tree Analyzer) is a cute, flashโ€‘fast Python package and Dash web app for building phylogenetic trees from FASTA sequences.

It enables you to:

  • Parse DNA sequences in FASTA format.
  • Align them using MUSCLE v3.8.31.
  • Build phylogenetic trees via both Parsimony (UPGMA) and ML-style (distance-based on identity).
  • Visualize and export tree images as .png for teaching slides, lab reports, or research.

Whether youโ€™re running a quick classroom demo or prototyping a research pipeline, SimplePhylo keeps everything modular and accessible.


๐Ÿš€ Quick Start

1. Clone the repo

git clone https://github.com/YourUser/evolutionary-tree-analyzer.git  
cd evolutionary-tree-analyzer  

2. Install dependencies

(pick one)

# From PyPI (stable release)
pip install simplephylo  

# From GitHub (editable/dev mode)
pip install -r requirements.txt  
pip install -e .

3. Launch the Dash App

python main.py

4. (Totally optional) Run the notebook

jupyter notebook notebooks/tree_builder.ipynb

โ“ Questions? Drop me a line at biology.mae@gmail.com


๐Ÿ“ Project Structure

evolutionary-tree-analyzer/
|
โ”œโ”€โ”€ .github/                 # CI workflows and configuration
โ”‚   โ””โ”€โ”€ workflows/           # GitHub Actions for testing, packaging
โ”‚       โ””โ”€โ”€ ci.yml           # Continuous integration pipeline
|
โ”œโ”€โ”€ assets/                  # Pipeline Bio logos (sunflower + tree)
|
โ”œโ”€โ”€ bin/                     # MUSCLE binary (v3.8.31) with +x permission
|
โ”œโ”€โ”€ data/                    # Example FASTA inputs
โ”‚   โ”œโ”€โ”€ vertebrate_test.fa   # Small demo FASTA with vertebrate mitochondrion seqs
โ”‚   โ””โ”€โ”€ example_small.fa     # Small FASTA sample for testing
|
โ”œโ”€โ”€ notebooks/               # Jupyter workflow (tree_builder.ipynb)
|
โ”œโ”€โ”€ output/                  # Generated alignments and tree images
โ”‚   โ””โ”€โ”€ tree_images/         # Parsimony & ML PNG outputs
|
โ”œโ”€โ”€ src/                     # Core Python library modules
โ”‚   โ”œโ”€โ”€ fasta_parser.py      # Parse FASTA โ†’ SeqIO records
โ”‚   โ”œโ”€โ”€ align_sequences.py   # Run MUSCLE alignment
โ”‚   โ”œโ”€โ”€ build_tree.py        # Build parsimony & ML-style trees
โ”‚   โ””โ”€โ”€ visualize_tree.py    # Render trees to PNG via Biopython Phylo
|
โ”œโ”€โ”€ tests/                   # pytest test suite for modules
โ”‚   โ””โ”€โ”€ test_align_sequences.py
โ”‚   โ”œโ”€โ”€ test_fasta_parser.py      
โ”‚   โ”œโ”€โ”€ test_build_tree.py        
โ”‚   โ””โ”€โ”€ test_visualize_tree.py    
|
โ”œโ”€โ”€ handle_upload.py         # Utility for file ingestion
โ”œโ”€โ”€ main.py                  # Dash web-app entrypoint
โ”œโ”€โ”€ render.yaml              # Deployment config for Render.com
โ”œโ”€โ”€ requirements.txt         # Runtime dependencies
โ”œโ”€โ”€ setup.py                 # Packaging metadata for PyPI
โ”œโ”€โ”€ CHANGELOG.md             # Release notes
โ”œโ”€โ”€ LICENSE                  # MIT License
โ””โ”€โ”€ README.md                # Project overview and usage


๐Ÿ“ฆ Dependencies

Python โ‰ฅ3.8 and the following PyPI packages:

  • dash & dash-bootstrap-components โ€“ build the interactive web UI
  • biopython โ€“ FASTA parsing, tree building & Phylo rendering
  • matplotlib โ€“ save publication-quality tree images
  • scipy โ€“ compute distance matrices for ML-style trees
  • click โ€“ simple command-line interface (CLI)
Standard library
  • subprocess, os, pathlib โ€“ invoke & locate external tools
  • importlib.util โ€“ dynamic module loading in notebooks
  • logging โ€“ configurable console output

โš™๏ธ MUSCLE Alignment Notes

This project uses MUSCLE v3.8.31 to align DNA sequences.

  • For small files, alignment runs automatically in Python
  • For large files, use the batch script: align_manual.bat

You must have MUSCLE installed and accessible from your system's PATH for alignment.

๐Ÿ’ก Ensure the muscle.exe binary is placed in: C:\Program Files\muscle\muscle.exe Or edit the .bat file to reflect the correct path.

MUSCLE citation: Edgar, R.C. (2004) Nucleic Acids Res 32(5):1792โ€“1797. http://www.drive5.com/muscle

โœจPlease cite this work if you use the alignment functionality in your research or publications.

๐Ÿง  Tips for Large Input Files

  • Work smaller first: debug your pipeline on 5โ€“10 sequences before scaling up.
  • Split & conquer: break multi-FASTA into chunks (seqkit split2 or bash loops).
  • Memory guardrails: MUSCLE can spike RAM on huge alignmentsโ€”keep input FASTA under 50 MB.
  • Use the batch script (align_manual.bat or your own shell wrapper) for > 1000 sequences.

๐Ÿงช Example Workflow

  1. Drop your .fasta file into the data/ folder
  2. Launch the app or notebook
  3. Youโ€™ll get:
  • A multiple sequence alignment (FASTA)
  • Two phylogenetic trees (Parsimony & ML-style)
  • PNG files saved in output/tree_images/

Perfect for: ๐Ÿงฌ Biology class demonstrations ๐Ÿงช Research prototyping ๐Ÿ“š Curriculum development ๐Ÿ’ก Student-led investigations


๐Ÿงฐ Future Plans

  • Add support for bootstrap analysis
  • Export PDF/HTML reports
  • Add tree comparison metrics (e.g., RF distance) Want to see SimplePhylo improved? I welcome all feedback and I can't wait to hear from you!

๐Ÿ‘ฉโ€๐Ÿ’ป Maintainer

Pipeline Bio โ€“ Simple Bioinformatics for Educators
๐ŸŒป Sunflower logo with a phylogenetic tree
๐Ÿ“ซ Contact: biology.mae@gmail.com
๐Ÿ›’ Teachers Pay Teachers Store: Pipeline Bio
๐Ÿ“Œ #teacherspayteachers


๐Ÿ“„ License

This project is licensed under the MIT License. See the LICENSE file for more details.


๐Ÿงพ Citations & Attributions

This project uses the MUSCLE alignment tool developed by Robert C. Edgar.

Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5):1792โ€“1797.
http://www.drive5.com/muscle

Please cite this work if you use the alignment functionality in your research or publications.


๐Ÿงฌ Happy tree building! ยฉ 2025 Mae Warner (Pipeline Bio). All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplephylo-1.1.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simplephylo-1.1.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file simplephylo-1.1.0.tar.gz.

File metadata

  • Download URL: simplephylo-1.1.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for simplephylo-1.1.0.tar.gz
Algorithm Hash digest
SHA256 8cecf08f0644c015102b29808d838a15b62638de968d5bc3be64c0a24d72d936
MD5 ea9974df9eaddc527cf6b93ead0f498d
BLAKE2b-256 f1cd95bd2ac040306cf9f2c5ca7915789a043d24e9d64a755285d8e672ce7ebd

See more details on using hashes here.

File details

Details for the file simplephylo-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: simplephylo-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for simplephylo-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5072fd326098766973919ab513c87e9049d097587e0c4d917c77b4b0ec351504
MD5 63f2f4c6b6d5c8041a040363d2fa246a
BLAKE2b-256 90980d0efec8dfb776e49c1a08058d315a2d829505076267886d27d801727395

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page