Build and visualize phylogenetic trees from FASTA files using parsimony and ML-style methods.
Project description
๐ฟ SimplePhylo
A phylogenetic tree builder for DNA sequence analysis using parsimony and maximum likelihood methods.
Why it matters:
-
๐ Parsimony finds the simplest tree with the fewest evolutionary changesโlightning-fast and perfect for classroom demos or sketching out relationships in agricultural genetics.
-
๐งฎ Maximum Likelihood applies explicit statistical models of sequence evolution to infer the tree that best explains your dataโessential for robust analyses in real-world forensics or crop-breeding studies.
-
โ๏ธ Compare both to gauge confidence, uncover hidden rate variation, and get a fuller picture of evolutionary history.
Whether youโre teaching high school biology or delving into forensic DNA casework, SimplePhylo empowers you to explore and explain phylogenies with clarity and rigor.
๐ Table of Contents
- Overview
- ๐ Quick Start
- ๐ Project Structure
- ๐ฆ Dependencies
- โ๏ธ MUSCLE Alignment Notes
- ๐งช Example Workflow
- ๐ Future Plans
- ๐งฐ Maintainer
- ๐ License
- ๐ Citations & Attributions
๐ง Overview
SimplePhylo (a.k.a. Evolutionary Tree Analyzer) is a cute, flashโfast Python package and Dash web app for building phylogenetic trees from FASTA sequences.
It enables you to:
- Parse DNA sequences in FASTA format.
- Align them using MUSCLE v3.8.31.
- Build phylogenetic trees via both Parsimony (UPGMA) and ML-style (distance-based on identity).
- Visualize and export tree images as
.pngfor teaching slides, lab reports, or research.
Whether youโre running a quick classroom demo or prototyping a research pipeline, SimplePhylo keeps everything modular and accessible.
๐ Quick Start
1. Clone the repo
git clone https://github.com/YourUser/evolutionary-tree-analyzer.git
cd evolutionary-tree-analyzer
2. Install dependencies
(pick one)
# From PyPI (stable release)
pip install simplephylo
# From GitHub (editable/dev mode)
pip install -r requirements.txt
pip install -e .
3. Launch the Dash App
python main.py
4. (Totally optional) Run the notebook
jupyter notebook notebooks/tree_builder.ipynb
โ Questions? Drop me a line at biology.mae@gmail.com
๐ Project Structure
evolutionary-tree-analyzer/
|
โโโ .github/ # CI workflows and configuration
โ โโโ workflows/ # GitHub Actions for testing, packaging
โ โโโ ci.yml # Continuous integration pipeline
|
โโโ assets/ # Pipeline Bio logos (sunflower + tree)
|
โโโ bin/ # MUSCLE binary (v3.8.31) with +x permission
|
โโโ data/ # Example FASTA inputs
โ โโโ vertebrate_test.fa # Small demo FASTA with vertebrate mitochondrion seqs
โ โโโ example_small.fa # Small FASTA sample for testing
|
โโโ notebooks/ # Jupyter workflow (tree_builder.ipynb)
|
โโโ output/ # Generated alignments and tree images
โ โโโ tree_images/ # Parsimony & ML PNG outputs
|
โโโ src/ # Core Python library modules
โ โโโ fasta_parser.py # Parse FASTA โ SeqIO records
โ โโโ align_sequences.py # Run MUSCLE alignment
โ โโโ build_tree.py # Build parsimony & ML-style trees
โ โโโ visualize_tree.py # Render trees to PNG via Biopython Phylo
|
โโโ tests/ # pytest test suite for modules
โ โโโ test_align_sequences.py
โ โโโ test_fasta_parser.py
โ โโโ test_build_tree.py
โ โโโ test_visualize_tree.py
|
โโโ handle_upload.py # Utility for file ingestion
โโโ main.py # Dash web-app entrypoint
โโโ render.yaml # Deployment config for Render.com
โโโ requirements.txt # Runtime dependencies
โโโ setup.py # Packaging metadata for PyPI
โโโ CHANGELOG.md # Release notes
โโโ LICENSE # MIT License
โโโ README.md # Project overview and usage
๐ฆ Dependencies
Python โฅ3.8 and the following PyPI packages:
dash&dash-bootstrap-componentsโ build the interactive web UIbiopythonโ FASTA parsing, tree building & Phylo renderingmatplotlibโ save publication-quality tree imagesscipyโ compute distance matrices for ML-style treesclickโ simple command-line interface (CLI)
Standard library
subprocess,os,pathlibโ invoke & locate external toolsimportlib.utilโ dynamic module loading in notebooksloggingโ configurable console output
โ๏ธ MUSCLE Alignment Notes
This project uses MUSCLE v3.8.31 to align DNA sequences.
- For small files, alignment runs automatically in Python
- For large files, use the batch script: align_manual.bat
You must have MUSCLE installed and accessible from your system's PATH for alignment.
๐ก Ensure the muscle.exe binary is placed in: C:\Program Files\muscle\muscle.exe Or edit the .bat file to reflect the correct path.
MUSCLE citation: Edgar, R.C. (2004) Nucleic Acids Res 32(5):1792โ1797. http://www.drive5.com/muscle
โจPlease cite this work if you use the alignment functionality in your research or publications.
๐ง Tips for Large Input Files
- Work smaller first: debug your pipeline on 5โ10 sequences before scaling up.
- Split & conquer: break multi-FASTA into chunks (
seqkit split2orbashloops). - Memory guardrails: MUSCLE can spike RAM on huge alignmentsโkeep input FASTA under 50 MB.
- Use the batch script (
align_manual.bator your own shell wrapper) for > 1000 sequences.
๐งช Example Workflow
- Drop your .fasta file into the data/ folder
- Launch the app or notebook
- Youโll get:
- A multiple sequence alignment (FASTA)
- Two phylogenetic trees (Parsimony & ML-style)
- PNG files saved in output/tree_images/
Perfect for: ๐งฌ Biology class demonstrations ๐งช Research prototyping ๐ Curriculum development ๐ก Student-led investigations
๐งฐ Future Plans
- Add support for bootstrap analysis
- Export PDF/HTML reports
- Add tree comparison metrics (e.g., RF distance) Want to see SimplePhylo improved? I welcome all feedback and I can't wait to hear from you!
๐ฉโ๐ป Maintainer
Pipeline Bio โ Simple Bioinformatics for Educators
๐ป Sunflower logo with a phylogenetic tree
๐ซ Contact: biology.mae@gmail.com
๐ Teachers Pay Teachers Store: Pipeline Bio
๐ #teacherspayteachers
๐ License
This project is licensed under the MIT License. See the LICENSE file for more details.
๐งพ Citations & Attributions
This project uses the MUSCLE alignment tool developed by Robert C. Edgar.
Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5):1792โ1797.
http://www.drive5.com/muscle
Please cite this work if you use the alignment functionality in your research or publications.
๐งฌ Happy tree building! ยฉ 2025 Mae Warner (Pipeline Bio). All rights reserved.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simplephylo-1.1.0.tar.gz.
File metadata
- Download URL: simplephylo-1.1.0.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8cecf08f0644c015102b29808d838a15b62638de968d5bc3be64c0a24d72d936
|
|
| MD5 |
ea9974df9eaddc527cf6b93ead0f498d
|
|
| BLAKE2b-256 |
f1cd95bd2ac040306cf9f2c5ca7915789a043d24e9d64a755285d8e672ce7ebd
|
File details
Details for the file simplephylo-1.1.0-py3-none-any.whl.
File metadata
- Download URL: simplephylo-1.1.0-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5072fd326098766973919ab513c87e9049d097587e0c4d917c77b4b0ec351504
|
|
| MD5 |
63f2f4c6b6d5c8041a040363d2fa246a
|
|
| BLAKE2b-256 |
90980d0efec8dfb776e49c1a08058d315a2d829505076267886d27d801727395
|