Skip to main content

No project description provided

Project description

Phyfum

PyPI - Version PyPI - Python Version PHYFUM CI/CD Pipeline GitBook Badge

[!TIP]

Visit our GitBook for a detailed tutorial of Phyfum


Quick start

Phyfum allows two different workflows. If you are working with raw data (IDAT files), you can run phyfum in complete mode. In this mode, phyfum will preprocess the files with minfi. If needed and if both tumor and normal samples are available, it will also run a copy number analysis with rascal to blacklist fCpGs located within copy-number-altered regions, which do not behave as the model expects.

An example run for this workflow would look like this:

input_dir="epic_array_dir" #path to input directory with idat files and proper folder structure
output_dir="experiment1" #path to output directory
patient_info="${input_dir}/sample_sheet.csv" #path to csv file with experiment design

phyfum run --input ${input_dir}\
 --output ${output_dir}\
 --workdir ${output_dir}\
 --patientinfo ${patient_info}\
 --patient-col patient\
 --age-col age\
 --patient-col patient\
 --sample-col sample\
 --sample-type-col group\
 --stemcells 3-10-3 

If you have already pre-processed the data and have the beta values, you can run phyfum in trees mode. The pipeline will simply deploy the XMLcreator tool to format the input data as expected by our modified version of BEAST and run the inference.

input_dir="beta_dir"
input=${input_dir}/"exampleBeta.csv" #path to input file with beta values
output_dir="onlybetas" #path to output directory
patient_info="${input_dir}/meta.csv" #path to csv file with metadata

phyfum run --input ${input}\
 --output ${output_dir}\
 --workdir ${output_dir}\
 --patientinfo ${patient_info}\
 --patient-col patient\
 --age-col age\
 --patient-col patient\
 --sample-type-col group\
 --stemcells 3-10-3 

Phyfum auto-detects what kind of input is provided and selects automatically the optimal workflow.

Installation

A docker image of Phyfum is available, and is our recommended way to use the tool:

docker pull pbousquets/phyfum

The commands above can be ran as:

input_dir="epic_array_dir" #path to input directory with idat files and proper folder structure
output_dir="experiment1" #path to output directory
patient_info="${input_dir}/sample_sheet.csv" #path to csv file with experiment design

docker run --rm -it -v ${input_dir}:${input_dir} -v ${output_dir}:${output_dir}\
pbousquets/phyfum --input ${input_dir}\
 --output ${output_dir}\
 --workdir ${output_dir}\
 --patientinfo ${patient_info}\
 --patient-col patient\
 --age-col age\
 --patient-col patient\
 --sample-col sample\
 --sample-type-col group\
 --stemcells 3-10-3 
input_dir="beta_dir"
input=${input_dir}/"exampleBeta.csv" #path to input file with beta values
output_dir="onlybetas" #path to output directory
patient_info="${input_dir}/meta.csv" #path to csv file with metadata

docker run --rm -it -v ${input_dir}:${input_dir} -v ${output_dir}:${output_dir}\
pbousquets/phyfum --input ${input}\
 --output ${output_dir}\
 --workdir ${output_dir}\
 --patientinfo ${patient_info}\
 --patient-col patient\
 --age-col age\
 --patient-col patient\
 --sample-type-col group\
 --stemcells 3-10-3 

Manual installation

Prior to installing phyfum, you'll need to install our modified version of BEAST to enable fCpGs to be analyzed under the framework. Then, make sure you have installed python3 and R (>4.0.0) and simply run:

pip install phyfum

In order to preprocess IDAT files, we use minfi, conumee and rascal, as well as some tidyverse packages. Missing dependencies will automatically be installed during the first run with Phyfum, so it may take longer than usual to run. You can also install them yourself with:

if (!require("pacman")) install.packages("pacman")
p_load(optparse, cli, conumee, minfi, parallel, tibble, tidyr, dplyr, data.table, gtools)
p_load_gh("crukci-bioinformatics/rascal")

Preparing the sample sheet / metadata

Phyfum relies on the Array Sample sheet for the complete workflow and a custom metadata file for the trees workflow. In any case, the file must be a comma-separated file (.csv).

  • Sample sheet. When running the complete workflow, we recommend passing the array sample sheet. Custom columns can be added to specify parameters that are required by the pipeline (sample age, age_at_diagnosis, etc.). Additionally, if the user wanted to remove any sample from the analysis, the corresponding row in the sample sheet can be filtered out to exclude it from the analysis.

    The pipeline will try to find how many "normal" or "control" samples exist to use them as controls for the CNV pipeline. You can provide the column name with the argument --sample-type-col. If no normals are found, this part of the pipeline will be skipped.

  • Custom metadata. Sample-wise file providing information about the sample age, patient, age_at_diagnosis, etc. It doesn't require anything special as long as it is in CSV format. In order to identify what the columns are, you can use the arguments --patient-col, --sample-col and age-col, if the column names in your file are different from the defaults.

Both the custom metadata and the sample sheet are passed through --patientinfo.

License

phyfum is distributed under the terms of the CC-BY-NC-SA license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phyfum-0.6.1.tar.gz (9.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phyfum-0.6.1-py3-none-any.whl (5.9 MB view details)

Uploaded Python 3

File details

Details for the file phyfum-0.6.1.tar.gz.

File metadata

  • Download URL: phyfum-0.6.1.tar.gz
  • Upload date:
  • Size: 9.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for phyfum-0.6.1.tar.gz
Algorithm Hash digest
SHA256 3d251f83c67236e24a432347dd859d141ea9d25cd7b5573f93df23ff1e8f78be
MD5 435a6437ba088fa0438273dc0126e208
BLAKE2b-256 65ae7c143fefac08d38a059b45cd180fd297a1506f2c37ac68bc37ad39bee42a

See more details on using hashes here.

File details

Details for the file phyfum-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: phyfum-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 5.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for phyfum-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0dcd909437f2981e85b2fd6001d03b196a405e0ec584f35f4fda6b80a89fa805
MD5 8bb251e0cec0797e5b95b26df4c34090
BLAKE2b-256 bb1ee2a1b7d4aa89c533338a96f5dc7171572620f05135d7b978117cec4abb7b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page