Skip to main content

Fast domain adaptation for genomic data

Project description

wenda_gpu: fast domain adaptation for genomic data

Overview

Domain adaptation allows for development of predictive models even in cases with limited or unlabeled sample data, by developing predictors for the data of interest (target data) using labeled data from a similar distribution (source data). This repo is a fast implementation of one domain adaptation method, weighted elastic net domain adaptation, or wenda. It leverages the complex interactions between biological features (such as genes) to optimize a model’s predictive power on both source and target datasets.

Installation

This package can be installed using pip:

pip install wenda_gpu

Alternatively, you can install the latest development version directly from this GitHub repository:

pip install git+https://github.com/greenelab/wenda_gpu

Usage

The most basic usage of wenda is this:

from wenda_gpu import wenda_gpu as wg

source_data, target_data = wg.load_data(prefix="sample")
source_data_normed, target_data_normed = wg.normalize_data(source_data, target_data)
wg.train_feature_models(source_data_normed, target_data_normed, prefix="sample")
source_y = wg.load_labels(prefix="sample")
wg.train_elastic_net(source_data_normed, source_y, target_data_normed, prefix="sample")

For a step-by-step tutorial in running wenda_gpu, consult wenda_gpu_quick_usage.ipynb in the example folder.

Directory structure

By default, wenda_gpu implements the following structure in your working directory:

working_directory
    ├── data
    │   └── prefix
    │       ├── source_data.tsv
    │       ├── source_y.tsv
    │       └── target_data.tsv
    ├── feature_models
    │   └── prefix
    │       ├── model_0.pth
    │       ├── model_1.pth
    │       └── ...
    ├── confidences
    │   └── prefix
    │       ├── confidences.tsv
    │       ├── model_0_confidence.txt
    │       ├── model_1_confidence.txt
    │       └── ...   
    └── output
        └── prefix
            ├── k_00
            │   ├── target_predictions.txt
            │   └── target_probabilities.txt
            ├── k_01
            │   ├── target_predictions.txt
            │   └── target_probabilities.txt
            └── ...

"prefix" is intended to be a unique identifier for your dataset, which allows you to run wenda_gpu on multiple datasets and have them nested within the same directory structure.

The user will need to create the files under the data directory, containing the feature information for both source and target datasets and the labels for the source data. Data can be loaded from a different source, for an example consult wenda_gpu_quick_usage.ipynb. The files under the feature_models, confidences, and output directories will be automatically created by wenda_gpu. If you want intermediate files and output in a different location than inside your working directory, you can specify your own paths using the path arguments in the related functions, e.g.

wg.train_feature_models(source_data_normed, target_data_normed, prefix="sample", feature_model_path="~/wenda_gpu_run/feature_models", confidence_path="~/wenda_gpu_run/confidences")

Helpful links

Example usage of this software and results can be found here: (https://github.com/greenelab/wenda_gpu_paper). The original paper on wenda can be found here: (https://academic.oup.com/bioinformatics/article/35/14/i154/5529259).

Citation

If you use this method, please cite the following:

wenda_gpu: fast domain adaptation for genomic data Ariel A. Hippen, Jake Crawford, Jacob R. Gardner, Casey S. Greene bioRxiv 2022.04.09.487671; doi: https://doi.org/10.1101/2022.04.09.487671

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wenda_gpu-0.7.5.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

wenda_gpu-0.7.5-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file wenda_gpu-0.7.5.tar.gz.

File metadata

  • Download URL: wenda_gpu-0.7.5.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.10

File hashes

Hashes for wenda_gpu-0.7.5.tar.gz
Algorithm Hash digest
SHA256 f3c844580aa53afe3cb1e09e7f76dbe23405a6d27155797ad6ec80abf7c7d826
MD5 b30b0fa368be6e4f3563cee6e179feaf
BLAKE2b-256 ecdc7f4d5f6a5847ba3f1463914e70bc02eb10c2fa4ec18b35ca58e00772c567

See more details on using hashes here.

File details

Details for the file wenda_gpu-0.7.5-py3-none-any.whl.

File metadata

  • Download URL: wenda_gpu-0.7.5-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.10

File hashes

Hashes for wenda_gpu-0.7.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8f117ca2c7302e5df22c0710a6e32dbeb25046daffe40e479806d663f94c5574
MD5 24fafbb0a47c7ecc9d454029ff4d731f
BLAKE2b-256 a5dbb6b6734f9c0af816539cb976c983fe3980aa8f2872827df74447c1cd7e4a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page