llm2geneset
Project description
llm2geneset
llm2geneset
This project combines LLMs, gene set generation, and overrepresentation analysis to power analyis of RNA-seq, scRNA-seq, and proteomics data sets.
llm2geneset is similar to popular tools such as enrichr, cluststerProfiler, or DAVID, but is uses LLMs to propose gene set descriptions and gene sets themselves. The generated gene sets can also be used in tools such as fGSEA and GSEApy.
If you have an OpenAI API key, you can try out the web application at https://llm2geneset.streamlit.app.
Pre-print
A bioRxiv pre-print describing this work is available here:
Enhancing Gene Set Overrepresentation Analysis with Large Language Models
Jiqing Zhu, Rebecca Y. Wang, Xiaoting Wang, Ricardo Azevedo, Alex Moreno, Julia A. Kuhn, Zia Khan doi: https://doi.org/10.1101/2024.11.11.621189
OpenAI API Key Setup
Please read OpenAI's best practicies for API key safety and make sure you have your OpenAI API key setup as environment variables e.g.:
export OPENAI_API_ORG="org-XXXX"
export OPENAI_API_KEY="XXXXX"
Installation using pixi
An environment to run llm2geneset can be configured using
pixi
Once pixi is installed, run pixi shell in the llm2geneset directory
cd llm2geneset
pixi shell
Installing as a pypi package
You can also install the package
using pip as it is available on pypi https://pypi.org/project/llm2geneset
pip install llm2geneset
Usage
You can use the package in a script as follows after
running pixi shell.
import openai
import llm2geneset
import asyncio
async def main():
aclient = openai.AsyncClient()
genes = await llm2geneset.get_genes(aclient, "Antigen presentation")
print(','.join(genes['parsed_genes']))
res = await llm2geneset.gs_proposal(
aclient, genes, n_pathways=5,
n_background=19846)
res = await llm2geneset.gs_proposal(aclient, genes['parsed_genes'])
print(res['ora_results'])
if __name__ == "__main__":
asyncio.run(main())
# Output:
# HLA-A,HLA-B,HLA-C,HLA-DRA,HLA-DRB1,HLA-DRB3,HLA-DRB4,...
# set_descr generatio bgratio richFactor foldEnrich
# 1 Antigen processing and presentation via MHC cl... 0.500000 0.001209 0.625000 413.458333
# ...
See also notebooks/simple_example.ipynb for an example of how to use it within a jupyter notebook.
Webapp Interface
Streamlit is included in the llm2geneset environment. You can run the webapp interface as follows.
pixi shell
streamlit run webapp/app.py
pypi deployment
Bump version in setup.cfg and run the following according to the instructions for pyscaffold-markdown
pixi shell
tox -e docs # to build documentation
tox -e build -- --wheel # to build the package distribution w/o source
tox -e publish # to test project uploads correctly in test.pypi.org
tox -e publish -- --repository pypi # release package to PyPI
Note
This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm2geneset-0.0.6-py3-none-any.whl.
File metadata
- Download URL: llm2geneset-0.0.6-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6946b2fb7233de290d023df7d5d2dd75bab77c41278eb4d5ae35a50791bfa66
|
|
| MD5 |
0b8c3acead266ac5fe792127bb4c330c
|
|
| BLAKE2b-256 |
e4b96fda9dd470796d0feed52b156dbf454bc0f287dcb30b40843e006979f868
|