selene-sdk

framework for developing sequence-level deep learning networks

These details have not been verified by PyPI

Project links

Homepage

Project description

logo

Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

Please see our release notes for the latest updates to Selene.

Installation

We recommend using Selene with Python 3.9 or above. Package installation should only take a few minutes (less than 10 minutes, typically ~2-3 minutes) with any of these methods (conda, pip, source).

First, install PyTorch. If you have an NVIDIA GPU, install a version of PyTorch that supports it--Selene will run much faster with a discrete GPU. The library is currently compatible with PyTorch versions between 1.0.0 and 2.3.1. We will continue to update Selene to be compatible with the latest version of PyTorch.

Installing selene with Anaconda (for Linux):

conda install -c bioconda selene-sdk

Installing selene with pip:

pip install selene-sdk

Note that we do not recommend pip-installing older versions of Selene (below 0.4.0), as these releases were less stable.

We currently only have a source distribution available for pip-installation.

Installing selene from source:

First, download the latest commits from the source repository (or download the latest tagged version of Selene for a stable release):

git clone https://github.com/FunctionLab/selene.git

The setup.py script requires NumPy, Cython, and setuptools. Please make sure you have these already installed.

If you plan on working in the selene repository directly, we recommend setting up a conda environment using selene-cpu.yml or selene-gpu.yml (if CUDA is enabled on your machine) and activating it. These environment YAML files list specific versions of package dependencies that we have used in the past to test Selene.

Selene contains some Cython files. You can build these by running

python setup.py build_ext --inplace

If you would like to locally install Selene, you can run

python setup.py install

About Selene

Selene is composed of a command-line interface and an API (the selene-sdk Python package). Users supply their data, model architecture, and configuration parameters, and Selene runs the user-specified operations (training, evaluation, prediction) for that sequence-based model.

For a more detailed overview of the components in the Selene software development kit (SDK), please consult the page here.

summary figure

Help

Please post bugs or feature requests to our Github issues.

Join our Google group if you have questions about the package, case studies, or model development.

Documentation

The documentation for Selene is available here. If you are interested in running Selene through the command-line interface (CLI), this document describes how the configuration file format (used by the CLI) works and details all the possible configuration parameters you may need to build your own configuration file.

Important: The tutorials and manuscript examples were originally run on Selene version 0.1.3---and later with Selene 0.2.0 (PyTorch version 0.4.1). Selene has since been updated substantially and files such as selene-gpu.yml specify PyTorch version 1.0.0. Please note that models created with an older version of PyTorch (such as those downloadable with the manuscript case studies) are NOT compatible with newer versions of PyTorch. If you run into errors loading trained model weights files, it is likely the result of differences in PyTorch or CUDA toolkit versions.

We recommend referring to the API documentation linked above, along with more current usages of Selene in related papers (e.g. Sei framework code) as the easiest starting point as of right now.

Examples

We provide 2 sets of examples: Jupyter notebook tutorials and case studies that we've described in our manuscript. The Jupyter notebooks are more accessible in that they can be easily perused and run on a laptop. We also take the opportunity to show how Selene can be used through the CLI (via configuration files) as well as through the API. Finally, the notebooks are particularly useful for demonstrating various visualization components that Selene contains. The API, along with the visualization functions, are much less emphasized in the manuscript's case studies.

In the case studies, we demonstrate more complex use cases (e.g. training on much larger datasets) that we could not present in a Jupyter notebook. Further, we show how you can use the outputs of variant effect prediction in a subsequent statistical analysis (case 3). These examples reflect how we most often use Selene in our own projects, whereas the Jupyter notebooks survey the many different ways and contexts in which we can use Selene.

We recommend that the examples be run on a machine with a CUDA-enabled GPU. All examples take significantly longer when run on a CPU machine. (See the following sections for time estimates.)

Tutorials

Tutorials for Selene are available here.

It is possible to run the tutorials (Jupyter notebook examples) on a standard CPU machine--you should not expect to fully finish running the training examples unless you can run them for more than 2-3 days, but they can all be run to completion on CPU in a couple of days. You can also change the training parameters (e.g. total number of steps) so that they complete in a much faster amount of time.

The non-training examples (variant effect prediction, in silico mutagenesis) can be run fairly quickly (variant effect prediction might take 20-30 minutes, in silico mutagenesis in 10-15 minutes).

Please see the README in the tutorials directory for links and descriptions to the specific tutorials.

Manuscript case studies

The code to reproduce case studies in the manuscript is available here.

Each case has its own directory and README describing how to run these cases. We recommend consulting the step-by-step breakdown of each case study that we provide in the methods section of the manuscript as well.

The manuscript examples were only tested on GPU. Our GPU (NVIDIA Tesla V100) time estimates:

Case study 1 finishes in about 1.5 days on a GPU node.
Case study 2 takes 6-7 days to run training (distributed the work across 4 v100s) and evaluation.
Case study 3 (variant effect prediction) takes about 1 day to run.

The case studies in the manuscript focus on developing deep learning models for classification tasks. Selene does support training and evaluating sequence-based regression models, and we have provided a tutorial to demonstrate this.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.6.0

Dec 16, 2024

0.5.3

Jul 9, 2024

0.5.2

Jul 9, 2024

0.5.1

Nov 22, 2021

0.5.0

Jul 30, 2021

0.4.8

May 10, 2020

0.4.7

Apr 28, 2020

0.4.6

Apr 24, 2020

0.4.5

Feb 25, 2020

0.4.4

Nov 19, 2019

0.4.3

Nov 11, 2019

0.4.2

Sep 23, 2019

0.4.1

Jul 30, 2019

0.4.0

Jul 15, 2019

0.1.3

Oct 4, 2018

0.1.2

Sep 25, 2018

0.1.1

Sep 24, 2018

0.1.0

Sep 7, 2018

0.0.1

Aug 6, 2018

0.0.0

Jul 24, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selene_sdk-0.6.0.tar.gz (1.5 MB view details)

Uploaded Dec 16, 2024 Source

File details

Details for the file selene_sdk-0.6.0.tar.gz.

File metadata

Download URL: selene_sdk-0.6.0.tar.gz
Upload date: Dec 16, 2024
Size: 1.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.6

File hashes

Hashes for selene_sdk-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`d758ac7adaeb6147388aeb7f15a618b261400c56890a074ab60421c2536d9e80`
MD5	`129c9536d3a74b03d854eef1be416189`
BLAKE2b-256	`8215b83d3c925c15d5b3801062a4d717f91360c7969e65f4ffa80d84c9832d16`

See more details on using hashes here.

selene-sdk 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Installing selene with Anaconda (for Linux):

Installing selene with pip:

Installing selene from source:

About Selene

Help

Documentation

Examples

Tutorials

Manuscript case studies

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes