No project description provided
Project description
Phyfum
Visit our GitBook for a detailed tutorial of Phyfum
Phyfum is a tool for inferring phylogenetic trees on methylation-based studies. We harness fluctuating CpG (fCpG) sites of methylation arrays to study the clonal evolution of samples. You can read more about fCpGs in the original paper.
We have implemented a phylogenetic model within BEAST v.1.8.4 based on the original described in the above paper. We have also designed a snakemake-based pipeline, covering the IDAT preprocessing, fCpG calling, automatic XML generation and BEAST inference. Additionally, if both tumor and reference samples are available, CNVs are called to curate non-fluctuating CpGs.
Quick start
Phyfum allows two different workflows. If you are working with raw data (IDAT files), you can run phyfum in complete mode. In this mode, phyfum will preprocess the files with minfi. If needed and if both tumor and normal samples are available, it will also run a copy number analysis with rascal to blacklist fCpGs located within copy-number-altered regions, which do not behave as the model expects.
An example run for this workflow would look like this:
input_dir="epic_array_dir" #path to input directory with idat files and proper folder structure
output_dir="experiment1" #path to output directory
patient_info="${input_dir}/sample_sheet.csv" #path to csv file with experiment design
phyfum run --input ${input_dir}\
--output ${output_dir}\
--workdir ${output_dir}\
--patientinfo ${patient_info}\
--patient-col patient\
--age-col age\
--patient-col patient\
--sample-col sample\
--sample-type-col group\
--stemcells 3-10-3
If you have already pre-processed the data and have the beta values, you can run phyfum in trees mode. The pipeline will simply deploy the XMLcreator tool to format the input data as expected by our modified version of BEAST and run the inference.
input_dir="beta_dir"
input=${input_dir}/"exampleBeta.csv" #path to input file with beta values
output_dir="onlybetas" #path to output directory
patient_info="${input_dir}/meta.csv" #path to csv file with metadata
phyfum run --input ${input}\
--output ${output_dir}\
--workdir ${output_dir}\
--patientinfo ${patient_info}\
--patient-col patient\
--age-col age\
--patient-col patient\
--sample-type-col group\
--stemcells 3-10-3
Phyfum auto-detects what kind of input is provided and selects automatically the optimal workflow.
Installation
A docker image of Phyfum is available, and is our recommended way to use the tool:
docker pull pbousquets/phyfum
The commands above can be ran as:
input_dir="epic_array_dir" #path to input directory with idat files and proper folder structure
output_dir="experiment1" #path to output directory
patient_info="${input_dir}/sample_sheet.csv" #path to csv file with experiment design
docker run --rm -it -v ${input_dir}:${input_dir} -v ${output_dir}:${output_dir}\
pbousquets/phyfum --input ${input_dir}\
--output ${output_dir}\
--workdir ${output_dir}\
--patientinfo ${patient_info}\
--patient-col patient\
--age-col age\
--patient-col patient\
--sample-col sample\
--sample-type-col group\
--stemcells 3-10-3
input_dir="beta_dir"
input=${input_dir}/"exampleBeta.csv" #path to input file with beta values
output_dir="onlybetas" #path to output directory
patient_info="${input_dir}/meta.csv" #path to csv file with metadata
docker run --rm -it -v ${input_dir}:${input_dir} -v ${output_dir}:${output_dir}\
pbousquets/phyfum --input ${input}\
--output ${output_dir}\
--workdir ${output_dir}\
--patientinfo ${patient_info}\
--patient-col patient\
--age-col age\
--patient-col patient\
--sample-type-col group\
--stemcells 3-10-3
Manual installation
Prior to installing phyfum, you'll need to install our modified version of BEAST to enable fCpGs to be analyzed under the framework. Then, make sure you have installed python3 and R (>4.0.0) and simply run:
pip install phyfum
In order to preprocess IDAT files, we use minfi, conumee and rascal, as well as some tidyverse packages. Missing dependencies will automatically be installed during the first run with Phyfum, so it may take longer than usual to run. You can also install them yourself with:
if (!require("pacman")) install.packages("pacman")
pacman::p_load(optparse, pacman, data.table, tibble, dplyr, tidyr, ggplot2, lubridate, BiocManager, gifski, gtools, ggrepel, cowplot, parallel, treeio, ggtree, svglite, ggbeeswarm, rstan, LaplacesDemon, HDInterval)
pacman::p_load_gh('crukci-bioinformatics/rascal', 'adamallo/rwty')
BiocManager::install('conumee'); BiocManager::install('minfi')
Preparing the sample sheet / metadata
Phyfum relies on the Array Sample sheet for the complete workflow and a custom metadata file for the trees workflow. In any case, the file must be a comma-separated file (.csv).
-
Sample sheet. When running the
complete
workflow, we recommend passing the array sample sheet. Custom columns can be added to specify parameters that are required by the pipeline (sample age, age_at_diagnosis, etc.). Additionally, if the user wanted to remove any sample from the analysis, the corresponding row in the sample sheet can be filtered out to exclude it from the analysis.The pipeline will try to find how many "normal" or "control" samples exist to use them as controls for the CNV pipeline. You can provide the column name with the argument
--sample-type-col
. If no normals are found, this part of the pipeline will be skipped. -
Custom metadata. Sample-wise file providing information about the sample age, patient, age_at_diagnosis, etc. It doesn't require anything special as long as it is in CSV format. In order to identify what the columns are, you can use the arguments
--patient-col
,--sample-col
andage-col
, if the column names in your file are different from the defaults.
Both the custom metadata and the sample sheet are passed through
--patientinfo
.
License
phyfum
is distributed under the terms of the CC-BY-NC-SA license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file phyfum-0.4.4.tar.gz
.
File metadata
- Download URL: phyfum-0.4.4.tar.gz
- Upload date:
- Size: 5.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff9520ddf3369d1defd2ab315a336f8c75a80323b0a2b8b672634e4c73abd7cc |
|
MD5 | 280a1847e673eb0c6a6cbb0c6cbbf01a |
|
BLAKE2b-256 | 87935c09106a1545e359ad058fb92c88bbe7d74ca3983d4bcd9144519a170a4f |
File details
Details for the file phyfum-0.4.4-py3-none-any.whl
.
File metadata
- Download URL: phyfum-0.4.4-py3-none-any.whl
- Upload date:
- Size: 5.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1eb0ab7670f5ff057b3e288bad26a26f8a6665083f9570c3f5437740761d71b0 |
|
MD5 | f65de37729e2fa88aac52bfa5442f8f1 |
|
BLAKE2b-256 | b417da4eff44da232c9a9c7dac6f296949939cc365edf876b2e17ceffc4aa301 |