Skip to main content

llm2geneset

Project description

Project generated with PyScaffold

llm2geneset

llm2geneset

This project combines LLMs, gene set generation, and overrepresentation analysis to power analyis of RNA-seq, scRNA-seq, and proteomics data sets.

llm2geneset is similar to popular tools such as enrichr, cluststerProfiler, or DAVID, but is uses LLMs to propose gene set descriptions and gene sets themselves. The generated gene sets can also be used in tools such as fGSEA and GSEApy.

If you have an OpenAI API key, you can try out the web application at https://llm2geneset.streamlit.app.

Pre-print

A bioRxiv pre-print describing this work is available here:

Enhancing Gene Set Overrepresentation Analysis with Large Language Models

Jiqing Zhu, Rebecca Y. Wang, Xiaoting Wang, Ricardo Azevedo, Alex Moreno, Julia A. Kuhn, Zia Khan doi: https://doi.org/10.1101/2024.11.11.621189

OpenAI API Key Setup

Please read OpenAI's best practicies for API key safety and make sure you have your OpenAI API key setup as environment variables e.g.:

export OPENAI_API_ORG="org-XXXX"
export OPENAI_API_KEY="XXXXX"

Installation using pixi

An environment to run llm2geneset can be configured using pixi

Once pixi is installed, run pixi shell in the llm2geneset directory

cd llm2geneset
pixi shell

Installing as a pypi package

You can also install the package using pip as it is available on pypi https://pypi.org/project/llm2geneset

pip install llm2geneset

Usage

You can use the package in a script as follows after running pixi shell.

import openai
import llm2geneset
import asyncio

async def main():
    aclient = openai.AsyncClient()
    genes = await llm2geneset.get_genes(aclient, "Antigen presentation")
    print(','.join(genes['parsed_genes']))
    res = await llm2geneset.gs_proposal(
        aclient, genes, n_pathways=5,
        n_background=19846)
    res = await llm2geneset.gs_proposal(aclient, genes['parsed_genes'])
    print(res['ora_results'])

if __name__ == "__main__":
    asyncio.run(main())

# Output:
# HLA-A,HLA-B,HLA-C,HLA-DRA,HLA-DRB1,HLA-DRB3,HLA-DRB4,...
# set_descr  generatio   bgratio  richFactor  foldEnrich
# 1  Antigen processing and presentation via MHC cl...   0.500000  0.001209    0.625000  413.458333
# ...

See also notebooks/simple_example.ipynb for an example of how to use it within a jupyter notebook.

Webapp Interface

Streamlit is included in the llm2geneset environment. You can run the webapp interface as follows.

pixi shell
streamlit run webapp/app.py

pypi deployment

Bump version in setup.cfg and run the following according to the instructions for pyscaffold-markdown

pixi shell
tox -e docs  # to build documentation
tox -e build -- --wheel # to build the package distribution w/o source
tox -e publish  # to test project uploads correctly in test.pypi.org
tox -e publish -- --repository pypi  # release package to PyPI

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm2geneset-0.0.6-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file llm2geneset-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: llm2geneset-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for llm2geneset-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e6946b2fb7233de290d023df7d5d2dd75bab77c41278eb4d5ae35a50791bfa66
MD5 0b8c3acead266ac5fe792127bb4c330c
BLAKE2b-256 e4b96fda9dd470796d0feed52b156dbf454bc0f287dcb30b40843e006979f868

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page