Skip to main content

No project description provided

Project description

Phyfum

PyPI - Version PyPI - Downloads PyPI - Python Version GitBook Docker Image Version Docker Pulls

Visit our GitBook for a detailed tutorial of Phyfum


Phyfum is a tool for inferring phylogenetic trees on methylation-based studies. We harness fluctuating CpG (fCpG) sites of methylation arrays to study the clonal evolution of samples. You can read more about fCpGs in the original paper.

We have implemented a phylogenetic model within BEAST v.1.8.4 based on the original described in the above paper. We have also designed a snakemake-based pipeline, covering the IDAT preprocessing, fCpG calling, automatic XML generation and BEAST inference. Additionally, if both tumor and reference samples are available, CNVs are called to curate non-fluctuating CpGs.

Quick start

Phyfum allows two different workflows. If you are working with raw data (IDAT files), you can run phyfum in complete mode. In this mode, phyfum will preprocess the files with minfi. If needed and if both tumor and normal samples are available, it will also run a copy number analysis with rascal to blacklist fCpGs located within copy-number-altered regions, which do not behave as the model expects.

An example run for this workflow would look like this:

input_dir="epic_array_dir" #path to input directory with idat files and proper folder structure
output_dir="experiment1" #path to output directory
patient_info="${input_dir}/sample_sheet.csv" #path to csv file with experiment design

phyfum run --input ${input_dir}\
 --output ${output_dir}\
 --workdir ${output_dir}\
 --patientinfo ${patient_info}\
 --patient-col patient\
 --age-col age\
 --patient-col patient\
 --sample-col sample\
 --sample-type-col group\
 --stemcells 3-10-3 

If you have already pre-processed the data and have the beta values, you can run phyfum in trees mode. The pipeline will simply deploy the XMLcreator tool to format the input data as expected by our modified version of BEAST and run the inference.

input_dir="beta_dir"
input=${input_dir}/"exampleBeta.csv" #path to input file with beta values
output_dir="onlybetas" #path to output directory
patient_info="${input_dir}/meta.csv" #path to csv file with metadata

phyfum run --input ${input}\
 --output ${output_dir}\
 --workdir ${output_dir}\
 --patientinfo ${patient_info}\
 --patient-col patient\
 --age-col age\
 --patient-col patient\
 --sample-type-col group\
 --stemcells 3-10-3 

Phyfum auto-detects what kind of input is provided and selects automatically the optimal workflow.

Installation

A docker image of Phyfum is available, and is our recommended way to use the tool:

docker pull pbousquets/phyfum

The commands above can be ran as:

input_dir="epic_array_dir" #path to input directory with idat files and proper folder structure
output_dir="experiment1" #path to output directory
patient_info="${input_dir}/sample_sheet.csv" #path to csv file with experiment design

docker run --rm -it -v ${input_dir}:${input_dir} -v ${output_dir}:${output_dir}\
pbousquets/phyfum --input ${input_dir}\
 --output ${output_dir}\
 --workdir ${output_dir}\
 --patientinfo ${patient_info}\
 --patient-col patient\
 --age-col age\
 --patient-col patient\
 --sample-col sample\
 --sample-type-col group\
 --stemcells 3-10-3 
input_dir="beta_dir"
input=${input_dir}/"exampleBeta.csv" #path to input file with beta values
output_dir="onlybetas" #path to output directory
patient_info="${input_dir}/meta.csv" #path to csv file with metadata

docker run --rm -it -v ${input_dir}:${input_dir} -v ${output_dir}:${output_dir}\
pbousquets/phyfum --input ${input}\
 --output ${output_dir}\
 --workdir ${output_dir}\
 --patientinfo ${patient_info}\
 --patient-col patient\
 --age-col age\
 --patient-col patient\
 --sample-type-col group\
 --stemcells 3-10-3 

Manual installation

Prior to installing phyfum, you'll need to install our modified version of BEAST to enable fCpGs to be analyzed under the framework. Then, make sure you have installed python3 and R (>4.0.0) and simply run:

pip install phyfum

In order to preprocess IDAT files, we use minfi, conumee and rascal, as well as some tidyverse packages. Missing dependencies will automatically be installed during the first run with Phyfum, so it may take longer than usual to run. You can also install them yourself with:

if (!require("pacman")) install.packages("pacman")
pacman::p_load(optparse, pacman, data.table, tibble, dplyr, tidyr, ggplot2, lubridate, BiocManager, gifski, gtools, ggrepel, cowplot, parallel, treeio, ggtree, svglite, ggbeeswarm, rstan, LaplacesDemon, HDInterval)
pacman::p_load_gh('crukci-bioinformatics/rascal', 'adamallo/rwty')
BiocManager::install('conumee'); BiocManager::install('minfi')

Preparing the sample sheet / metadata

Phyfum relies on the Array Sample sheet for the complete workflow and a custom metadata file for the trees workflow. In any case, the file must be a comma-separated file (.csv).

  • Sample sheet. When running the complete workflow, we recommend passing the array sample sheet. Custom columns can be added to specify parameters that are required by the pipeline (sample age, age_at_diagnosis, etc.). Additionally, if the user wanted to remove any sample from the analysis, the corresponding row in the sample sheet can be filtered out to exclude it from the analysis.

    The pipeline will try to find how many "normal" or "control" samples exist to use them as controls for the CNV pipeline. You can provide the column name with the argument --sample-type-col. If no normals are found, this part of the pipeline will be skipped.

  • Custom metadata. Sample-wise file providing information about the sample age, patient, age_at_diagnosis, etc. It doesn't require anything special as long as it is in CSV format. In order to identify what the columns are, you can use the arguments --patient-col, --sample-col and age-col, if the column names in your file are different from the defaults.

Both the custom metadata and the sample sheet are passed through --patientinfo.

License

phyfum is distributed under the terms of the CC-BY-NC-SA license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phyfum-0.4.4.tar.gz (5.8 MB view details)

Uploaded Source

Built Distribution

phyfum-0.4.4-py3-none-any.whl (5.9 MB view details)

Uploaded Python 3

File details

Details for the file phyfum-0.4.4.tar.gz.

File metadata

  • Download URL: phyfum-0.4.4.tar.gz
  • Upload date:
  • Size: 5.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for phyfum-0.4.4.tar.gz
Algorithm Hash digest
SHA256 ff9520ddf3369d1defd2ab315a336f8c75a80323b0a2b8b672634e4c73abd7cc
MD5 280a1847e673eb0c6a6cbb0c6cbbf01a
BLAKE2b-256 87935c09106a1545e359ad058fb92c88bbe7d74ca3983d4bcd9144519a170a4f

See more details on using hashes here.

File details

Details for the file phyfum-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: phyfum-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 5.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for phyfum-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 1eb0ab7670f5ff057b3e288bad26a26f8a6665083f9570c3f5437740761d71b0
MD5 f65de37729e2fa88aac52bfa5442f8f1
BLAKE2b-256 b417da4eff44da232c9a9c7dac6f296949939cc365edf876b2e17ceffc4aa301

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page