BSCAMPP - A Scalable Phylogenetic Placement Tool

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

c5shen

These details have not been verified by PyPI

Project description

BSCAMPP - A Scalable Phylogenetic Placement Method and Framework

Table of Contents

Overview
Installation
Usage
Example Code and Data

Overview

Inputs
1. Reference tree to place sequences into.
2. Alignment of reference sequences.
3. Alignment of query sequences (can be combined with ii.).
4. Tree info file.
  - (EPA-ng as base method), RAxML-ng info file, typically with suffix .bestModel.
  - (pplacer as base method), RAxML-ng or FastTree log file.
Output
1. Placement results of query sequences in the reference tree in .jplace format.

BSCAMPP is an extension and scalable solution to its previous method SCAMPP for phylogenetic placement. BSCAMPP achieves some magnitudes of speedup compared to the SCAMPP framework. The core algorithm is described in detail at https://doi.org/10.1101/2022.10.26.513936. In short, BSCAMPP in default uses EPA-ng as the base placement method, allowing it to scale to placement trees of up to ~200,000 leaves. BSCAMPP achieves this by extracting appropriate subtrees and assigning each query to its most fitting subtree.

BSCAMPP essentially is a divide-and-conquer framework and can be used with any base placement methods (e.g., pplacer as well). Currently, BSCAMPP is implemented with epa-ng and pplacer.

It is recommended that BSCAMPP be used with subtrees of size 2000 and with 5 votes based on current best results, especially if sequences are fragmentary. Defaults for the subtree size and number of votes are set to 2,000 and 5 respectively (see Usage for more details on customizing BSCAMPP).

Installation

BSCAMPP was tested on Python 3.7 to 3.12. There are two ways to install and use BSCAMPP: (1) with PyPI, or (2) from this GitHub repository. If you have any difficulties installing or running BSCAMPP, please contact Eleanor Wedell (ewedell@illinois.edu).

External requirements

EPA-ng and/or pplacer are requirements to run BSCAMPP since BSCAMPP will use them as the base phylogenetic placement methods. By default, BSCAMPP will search for binary executables of pplacer and epa-ng in the user's environment when running for the first time. We also included a compiled version of pplacer for the Linux system under bscampp/tools.

(1) Install with `pip` (Coming soon)

The easiest way to install BSCAMPP is to use pip install. This will also install all required Python packages.

# 1. install with pip (--user if no root access)
pip install bscampp [--user]

# 2. Two binary executables will be installed. The first time
#    running any will create a config file at
#    ~/.bscampp/main.config that resolves the links to all
#    external software (e.g., epa-ng, pplacer)
bscampp [-h]    # or
run_bscampp.py [-h]

(2) Install from GitHub

Alternatively, the user can clone this GitHub repository and install the required packages manually.

Requirements

python>=3.7
ConfigParser>=5.0.0
numpy>=1.21.6
treeswift>=1.1.45
taxtastic>=0.9.3

# 1. Close the GitHub repo
git clone https://github.com/ewedell/BSCAMPP.git

# 2. Install all requirements
pip install -r requirements.txt

# 3. Execute BSCAMPP executable `run_bscampp.py`
python run_bscampp.py [-h]

Usage

All parameter settings can be found by running

run_bscampp.py -h

(1) Default case (`epa-ng`)

run_bscampp.py -i [raxml best model] -t [reference tree] -a [alignment file]

To run BSCAMPP in its default mode with EPA-ng. [alignment file] should contain both sequences from the placement tree and the query sequences to be placed. This will create an output directory bscampp_output and write the placement results to bscampp_output/bscampp_result.jplace.

(2) Separately giving query alignment and finer control of outputs

run_bscampp.py -i [raxml best model] -t [reference tree] -a [reference alignment] \
    -q [query sequence alignment] -d [output directory] -o [output name] \
    --threads [num cpus]

(3) Using `pplacer` as the base placement method

run_bscampp.py -i [logfile from either RAxML/FastTree] -t [reference tree] \
    -a [reference alignment] -q [query sequence alignment]

More comprehensive usage

> usage: run_bscampp.py [-h] [-v] [--placement-method {epa-ng,pplacer}] -i
>                       INFO_PATH -t TREE_PATH -a ALN_PATH [-q QALN_PATH]
>                       [-d OUTDIR] [-o OUTNAME] [--threads NUM_CPUS] [-m MODEL]
>                       [-b SUBTREESIZE] [-V VOTES]
>                       [--similarityflag SIMILARITYFLAG] [-n TMPFILENBR]
>                       [--fragmentflag FRAGMENTFLAG] [--keeptemp KEEPTEMP]
> 
> This program runs BSCAMPP, a scalable phylogenetic placement framework that scales EPA-ng/pplacer to very large tree placement.
> 
> options:
>   -h, --help            show this help message and exit
>   -v, --version         show program's version number and exit
> 
> BASIC PARAMETERS:
>   These are the basic parameters for BSCAMPP.
> 
>   --placement-method {epa-ng,pplacer}
>                         The base placement method to use. Default: epa-ng
>   -i INFO_PATH, --info INFO_PATH, --info-path INFO_PATH
>                         Path to model parameters. E.g., .bestModel from
>                         RAxML/RAxML-ng
>   -t TREE_PATH, --tree TREE_PATH, --tree-path TREE_PATH
>                         Path to reference tree with estimated branch lengths
>   -a ALN_PATH, --alignment ALN_PATH, --aln-path ALN_PATH
>                         Path for reference sequence alignment in FASTA format.
>                         Optionally with query sequences. Query alignment can
>                         be specified with --qaln-path
>   -q QALN_PATH, --qalignment QALN_PATH, --qaln-path QALN_PATH
>                         Optionally provide path to query sequence alignment in
>                         FASTA format. Default: None
>   -d OUTDIR, --outdir OUTDIR
>                         Directory path for output. Default: bscampp_output/
>   -o OUTNAME, --output OUTNAME
>                         Output file name. Default: bscampp_result.jplace
>   --threads NUM_CPUS, --num-cpus NUM_CPUS
>                         Number of cores for parallelization, default: -1 (all)
> 
> ADVANCE PARAMETERS:
>   These parameters control how BSCAMPP is run. The default values are set based on experiments.
> 
>   -m MODEL, --model MODEL
>                         Model used for edge distances. Default: GTR
>   -b SUBTREESIZE, --subtreesize SUBTREESIZE
>                         Integer size of the subtree. Default: 2000
>   -V VOTES, --votes VOTES
>                         Number of votes per query sequence. Default: 5
>   --similarityflag SIMILARITYFLAG
>                         Boolean, True if maximizing sequence similarity
>                         instead of simple Hamming distance (ignoring gap sites
>                         in the query). Default: True
> 
> MISCELLANEOUS PARAMETERS:
>   -n TMPFILENBR, --tmpfilenbr TMPFILENBR
>                         Temporary file indexing. Default: 0
>   --fragmentflag FRAGMENTFLAG
>                         If queries contains fragments. Default: True
>   --keeptemp KEEPTEMP   Boolean, True to keep all temporary files. Default:
                        False

Example Code and Data

Example script and data are provided in this GitHub repository in examples/. The data is originally from the RNAsim-VS datasets.

examples/run.sh: contains a simple script to test BSCAMPP with epa-ng or pplacer, placing 200 query sequences to a 10000-leaf placement tree. The info file is from RAxML-ng when running epa-ng, and from FastTree-2 when running pplacer.
- run.sh will invoke BSCAMPP with epa-ng.
- run.sh pplacer will invoke BSCAMPP with pplacer.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

c5shen

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.8

Jul 7, 2025

1.0.7

Apr 16, 2025

1.0.6

Mar 10, 2025

1.0.5

Mar 9, 2025

1.0.3

Feb 12, 2025

1.0.2

Feb 10, 2025

1.0.2b0 pre-release

Feb 12, 2025

1.0.1

Feb 9, 2025

1.0.1b0 pre-release

Feb 9, 2025

This version

1.0.1a0 pre-release

Feb 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bscampp-1.0.1a0.tar.gz (4.6 MB view details)

Uploaded Feb 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bscampp-1.0.1a0-py3-none-any.whl (4.7 MB view details)

Uploaded Feb 9, 2025 Python 3

File details

Details for the file bscampp-1.0.1a0.tar.gz.

File metadata

Download URL: bscampp-1.0.1a0.tar.gz
Upload date: Feb 9, 2025
Size: 4.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for bscampp-1.0.1a0.tar.gz
Algorithm	Hash digest
SHA256	`dd19b66819227e4e99c698d966b13ceca2b8df4b21fdb8719fd7c5693a7eaf70`
MD5	`46111012fdecedb00e636194f863d56b`
BLAKE2b-256	`dee64a9cb93f57cb164cc21f26d0789295dde87766b41c0e00cf39a4082cc653`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bscampp-1.0.1a0.tar.gz:

Publisher: python-publish.yml on ewedell/BSCAMPP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bscampp-1.0.1a0.tar.gz
- Subject digest: dd19b66819227e4e99c698d966b13ceca2b8df4b21fdb8719fd7c5693a7eaf70
- Sigstore transparency entry: 169899185
- Sigstore integration time: Feb 9, 2025
Source repository:
- Permalink: ewedell/BSCAMPP@d6841b026936a3b9f75c523de3432cab89e6cc20
- Branch / Tag: refs/tags/v1.0.1a
- Owner: https://github.com/ewedell
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@d6841b026936a3b9f75c523de3432cab89e6cc20
- Trigger Event: release

File details

Details for the file bscampp-1.0.1a0-py3-none-any.whl.

File metadata

Download URL: bscampp-1.0.1a0-py3-none-any.whl
Upload date: Feb 9, 2025
Size: 4.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for bscampp-1.0.1a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`76981b442e46f338d2a2b358ae01707120ee637e944efc5fb74fb985353be7c9`
MD5	`85e485c114a59e1fdb5325858dbe66b8`
BLAKE2b-256	`58d851caac471eb75cba5b40fd092aeb333b6953592191ccc1273652d2cbd924`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bscampp-1.0.1a0-py3-none-any.whl:

Publisher: python-publish.yml on ewedell/BSCAMPP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bscampp-1.0.1a0-py3-none-any.whl
- Subject digest: 76981b442e46f338d2a2b358ae01707120ee637e944efc5fb74fb985353be7c9
- Sigstore transparency entry: 169899186
- Sigstore integration time: Feb 9, 2025
Source repository:
- Permalink: ewedell/BSCAMPP@d6841b026936a3b9f75c523de3432cab89e6cc20
- Branch / Tag: refs/tags/v1.0.1a
- Owner: https://github.com/ewedell
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@d6841b026936a3b9f75c523de3432cab89e6cc20
- Trigger Event: release

bscampp 1.0.1a0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

BSCAMPP - A Scalable Phylogenetic Placement Method and Framework

Overview

Installation

External requirements

(1) Install with pip (Coming soon)

(2) Install from GitHub

Requirements

Usage

(1) Default case (epa-ng)

(2) Separately giving query alignment and finer control of outputs

(3) Using pplacer as the base placement method

More comprehensive usage

Example Code and Data

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

(1) Install with `pip` (Coming soon)

(1) Default case (`epa-ng`)

(3) Using `pplacer` as the base placement method