This package is a SubstraFL implementation FL of PyDESeq2.
Project description
FedPyDESeq2
This repository contains the FedDESeq2 package, a Python package for Federated Differential Expression Analysis based on PyDESeq2 which is itself a python implementation of DESeq2.
Setup
Package installation through PyPI
You can install the package from PyPI using the following command:
pip install fedpydeseq2
Package installation for developpers
0 - Clone the repository
Start by cloning this repository
git clone git@github.com:owkin/fedpydeseq2.git
1 - Create a conda environment with python 3.10+
conda create -n fedpydeseq2 python=3.11 # or a version compatible
conda activate fedpydeseq2
2 - Install poetry
Run
conda install pip
pip install poetry==1.8.2
and test the installation with poetry --version.
3 - Install the package and its dependencies using poetry
cd to the root of the repository and run
poetry install --with linting,testing
4 - Download the data to run the tests on
To download the data, cd to the root of the repository run this command.
fedpydeseq2-download-data --raw_data_output_path data/raw
This way, you create a data/raw subdirectory in the directory containing all the necessary data. If you want to modify
the location of this raw data, you can in the following way. Run this command instead:
fedpydeseq2-download-data --raw_data_output_path MY_RAW_PATH
And create a file in the tests directory named paths.json containing
- A
raw_datafield with the path to the raw dataMY_RAW_PATH - An optional
assets_tcgafield with the path to the directory containing theopener.pyfile and its description (by default present in the fedpydeseq2_datasets module, so no need to specify this unless you need to modify the opener). - An optional
processed_datafield with the path to the directory where you want to save processed data. This is used if you want to run tests locally without reprocessing the data during each test session. Otherwise, the processed data will be saved in a temporary file during each test session. - An optional
default_logging_configfield with the path to the logging configuration used by default in tests. For more details on how logging works, please refer to the README in the logging folder. - An optional
workflow_logging_configfield with the path to the logging configuration used in the logging tests.
5 - Install pre-commit hooks
Still in the root of the repository, run
pre-commit install
You are now ready to contribute.
CI on a self-hosted runner
Tests are run using a self-hosted runner. To add a self-hosted runner, instantiate the machine
you want to use as a runner, go to the repository settings, then to the Actions tab, and click on
Add runner. Follow the instructions to install the runner on the machine you want
to use as a self-hosted runner.
Make sure to label the self-hosted runner with the label "fedpydeseq2-self-hosted" so that the CI workflow can find it.
Docker CI
The docker mode is only tested manually. To test it, first run poetry build
in order to create a wheel in the dist folder. Then launch in a tmux the
following:
pytest -m "docker" -s
The -s option enables to print all the logs/outputs continuously. Otherwise, these
outputs appear only once the test is done. As the test takes time, it's better to
print them continuously.
Running on a real Substra environment
Running the compute plan
To run a compute plan on an environment with the substra front-end, you need first to generate token in each of the
substra nodes. Then you need to duplicate
credentials-template.yaml
into a new file
credentials.yaml and fill in the
tokens. You should not need to rebuild the wheel manually by running
poetry build as the script will try to do it for you, but watch out for
related error message when executing the file.
Citing this work
@article{muzellec2024fedpydeseq2,
title={FedPyDESeq2: a federated framework for bulk RNA-seq differential expression analysis},
author={Muzellec, Boris and Marteau-Ferey, Ulysse and Marchand, Tanguy},
journal={bioRxiv},
pages={2024--12},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}
References
[1] Love, M. I., Huber, W., & Anders, S. (2014). "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome biology, 15(12), 1-21. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8
[2] Muzellec, B., Teleńczuk, M., Cabeli, V., & Andreux, M. (2023). "PyDESeq2: a python package for bulk RNA-seq differential expression analysis." Bioinformatics, 39(9), btad547. https://academic.oup.com/bioinformatics/article/39/9/btad547/7260507
License
FedPyDESeq2 is released under an MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fedpydeseq2-0.1.1.tar.gz.
File metadata
- Download URL: fedpydeseq2-0.1.1.tar.gz
- Upload date:
- Size: 89.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cddbcf3778e5ab6a94f4867b859fde8069ac514ce180c1aaa1f7f7a01d933ba9
|
|
| MD5 |
63a46889a93b0f9fe3f97f742a29ba9d
|
|
| BLAKE2b-256 |
130ac27c8b2c3e250f800d869f0e152af70b0b139fbecd788db2287502a121c3
|
File details
Details for the file fedpydeseq2-0.1.1-py3-none-any.whl.
File metadata
- Download URL: fedpydeseq2-0.1.1-py3-none-any.whl
- Upload date:
- Size: 145.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12c98c4391554c880198613b9bf9f45deab172a6707ae785738439d10dbeb47c
|
|
| MD5 |
2ec3d88f906c944c30a2e84d5b12a8fb
|
|
| BLAKE2b-256 |
16becca5e5e46ee25f25ab24e65b6f7b1252b4234df43edd97a43d8126d7b7fb
|