Skip to main content

This package is a SubstraFL implementation FL of PyDESeq2.

Project description

FedPyDESeq2

This repository contains the FedDESeq2 package, a Python package for Federated Differential Expression Analysis based on PyDESeq2 which is itself a python implementation of DESeq2.

Setup

Package installation through PyPI

You can install the package from PyPI using the following command:

pip install fedpydeseq2

Package installation for developpers

0 - Clone the repository

Start by cloning this repository

git clone git@github.com:owkin/fedpydeseq2.git

1 - Create a conda environment with python 3.10+

conda create -n fedpydeseq2 python=3.11 # or a version compatible
conda activate fedpydeseq2

2 - Install poetry

Run

conda install pip
pip install poetry==1.8.2

and test the installation with poetry --version.

3 - Install the package and its dependencies using poetry

cd to the root of the repository and run

poetry install --with linting,testing

4 - Download the data to run the tests on

To download the data, cd to the root of the repository run this command.

fedpydeseq2-download-data --raw_data_output_path data/raw

This way, you create a data/raw subdirectory in the directory containing all the necessary data. If you want to modify the location of this raw data, you can in the following way. Run this command instead:

fedpydeseq2-download-data --raw_data_output_path MY_RAW_PATH

And create a file in the tests directory named paths.json containing

  • A raw_data field with the path to the raw data MY_RAW_PATH
  • An optional assets_tcga field with the path to the directory containing the opener.py file and its description (by default present in the fedpydeseq2_datasets module, so no need to specify this unless you need to modify the opener).
  • An optional processed_data field with the path to the directory where you want to save processed data. This is used if you want to run tests locally without reprocessing the data during each test session. Otherwise, the processed data will be saved in a temporary file during each test session.
  • An optional default_logging_config field with the path to the logging configuration used by default in tests. For more details on how logging works, please refer to the README in the logging folder.
  • An optional workflow_logging_config field with the path to the logging configuration used in the logging tests.

5 - Install pre-commit hooks

Still in the root of the repository, run

pre-commit install

You are now ready to contribute.

CI on a self-hosted runner

Tests are run using a self-hosted runner. To add a self-hosted runner, instantiate the machine you want to use as a runner, go to the repository settings, then to the Actions tab, and click on Add runner. Follow the instructions to install the runner on the machine you want to use as a self-hosted runner.

Make sure to label the self-hosted runner with the label "fedpydeseq2-self-hosted" so that the CI workflow can find it.

Docker CI

The docker mode is only tested manually. To test it, first run poetry build in order to create a wheel in the dist folder. Then launch in a tmux the following:

pytest -m "docker" -s

The -s option enables to print all the logs/outputs continuously. Otherwise, these outputs appear only once the test is done. As the test takes time, it's better to print them continuously.

Running on a real Substra environment

Running the compute plan

To run a compute plan on an environment with the substra front-end, you need first to generate token in each of the substra nodes. Then you need to duplicate credentials-template.yaml into a new file credentials.yaml and fill in the tokens. You should not need to rebuild the wheel manually by running poetry build as the script will try to do it for you, but watch out for related error message when executing the file.

Citing this work

@article{muzellec2024fedpydeseq2,
  title={FedPyDESeq2: a federated framework for bulk RNA-seq differential expression analysis},
  author={Muzellec, Boris and Marteau-Ferey, Ulysse and Marchand, Tanguy},
  journal={bioRxiv},
  pages={2024--12},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

References

[1] Love, M. I., Huber, W., & Anders, S. (2014). "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome biology, 15(12), 1-21. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8

[2] Muzellec, B., Teleńczuk, M., Cabeli, V., & Andreux, M. (2023). "PyDESeq2: a python package for bulk RNA-seq differential expression analysis." Bioinformatics, 39(9), btad547. https://academic.oup.com/bioinformatics/article/39/9/btad547/7260507

License

FedPyDESeq2 is released under an MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fedpydeseq2-0.1.1.tar.gz (89.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fedpydeseq2-0.1.1-py3-none-any.whl (145.4 kB view details)

Uploaded Python 3

File details

Details for the file fedpydeseq2-0.1.1.tar.gz.

File metadata

  • Download URL: fedpydeseq2-0.1.1.tar.gz
  • Upload date:
  • Size: 89.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for fedpydeseq2-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cddbcf3778e5ab6a94f4867b859fde8069ac514ce180c1aaa1f7f7a01d933ba9
MD5 63a46889a93b0f9fe3f97f742a29ba9d
BLAKE2b-256 130ac27c8b2c3e250f800d869f0e152af70b0b139fbecd788db2287502a121c3

See more details on using hashes here.

File details

Details for the file fedpydeseq2-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: fedpydeseq2-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 145.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for fedpydeseq2-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 12c98c4391554c880198613b9bf9f45deab172a6707ae785738439d10dbeb47c
MD5 2ec3d88f906c944c30a2e84d5b12a8fb
BLAKE2b-256 16becca5e5e46ee25f25ab24e65b6f7b1252b4234df43edd97a43d8126d7b7fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page