Skip to main content

A package for single-cell genomics analysis with scGen

Project description

FedscGen: privacy-preserving federated batch effect correction of single-cell RNA sequencing data

FedscGen is a federated learning framework for privacy-aware batch effect correction in single-cell RNA sequencing (scRNA-seq) data. It enables multiple institutions to collaboratively train a shared variational autoencoder (VAE) model without exchanging raw data. Each site trains a local model and securely shares only model parameters with a central coordinator, which aggregates them to update the global model. After training, the shared model is used to extract latent representations of cells across sites. For each shared cell type, dominant batches are identified, and corresponding mean latent features are calculated and aggregated in a privacy-preserving manner. These latent shifts are then used to locally correct batch effects, allowing new or existing clients to harmonize their datasets while maintaining full control over their data.

scheme

🔧 Setup Environment

To reproduce the results of the paper, please follow the instructions to create two Conda environments:

  1. fedscgen: Python environment for FedscGen.
  2. r_eval: R + Python environment for benchmarking.

Set up a Python environment for FedscGen:

conda env create -f environment.yml
conda activate fedscgen
export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True
pip install crypten==0.4.1

Set up an R environment for running kBET and LISI for evaluation:

conda env create -f r_eval.yml
conda activate r_eval
Rscript install_libraries.R

Dataset and Models

For reproducibility, please ensure the preprocessed datasets are downloaded and extracted to the data/datasets directory. Optionally, the initial models can also be downloaded to the models/ directory.

  • The initial PyTorch models are available at DOI

  • All preprocessed datasets used in the paper are available at DOI

📊 Reproduce Results

All models are initialized using a fixed seed for reproducibility.

Navigate to the experiments/ directory and run experiments.sh while providing a comma-separated list of GPU indices to use for training. For example, to use GPUs 0 and 1:

conda activate fedscgen
cd experiments
chmod +x experiment.sh
./experiment.sh 0,1

Once the experiments are complete, run the evaluation metrics by navigating to the metrics/ directory and executing evaluate.sh:

conda activate r_eval
cd metrics
chmod +x evaluation.sh
./evaluation.sh

All results will be saved in the results/ directory. Both scripts will automatically fully utilize the available system resources.

FeatureCloud Logo app

FedscGen is implemented for real-world federated collaboration as a FeatureCloud app with automated deployment.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fedscgen-0.1.1.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fedscgen-0.1.1-py3-none-any.whl (30.2 kB view details)

Uploaded Python 3

File details

Details for the file fedscgen-0.1.1.tar.gz.

File metadata

  • Download URL: fedscgen-0.1.1.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for fedscgen-0.1.1.tar.gz
Algorithm Hash digest
SHA256 59289830decb52dd96a97ec96dee7ad20b719d6d77b05681498b369ceb469bee
MD5 d01cb3c80f2730c85a992912461b6529
BLAKE2b-256 010ceddac7f033485ad3a01793b80a72dcf159c0c05ebc3694d09fbc809cbf38

See more details on using hashes here.

File details

Details for the file fedscgen-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: fedscgen-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 30.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for fedscgen-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d702f9bc0eada268e86e6f2e84a5f2fa428eb0298fbd6614f1aef92c7592b94e
MD5 ff4fe542416e6efb3f7f7009392305ce
BLAKE2b-256 195072809a255afd3b5392170ba7f0f9310053a7b2f6697282986b01022d8bf8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page