Skip to main content

llm2geneset

Project description

Project generated with PyScaffold

llm2geneset

llm2geneset

This project combines LLMs, gene set generation, and overrepresentation analysis to power analyis of RNA-seq, scRNA-seq, and proteomics data sets.

If you have an OpenAI API key, you can try out the web application at https://llm2geneset.streamlit.app.

Pre-print

A bioRxiv pre-print describing this work is available here:

Enhancing Gene Set Overrepresentation Analysis with Large Language Models

Jiqing Zhu, Rebecca Y. Wang, Xiaoting Wang, Ricardo Azevedo, Alex Moreno, Julia A. Kuhn, Zia Khan doi: https://doi.org/10.1101/2024.11.11.621189

OpenAI API Key Setup

Please read OpenAI's best practicies for API key safety and make sure you have your OpenAI API key setup as environment variables e.g.:

export OPENAI_API_ORG="org-XXXX"
export OPENAI_API_KEY="XXXXX"

Installation using pixi

An environment to run llm2geneset can be configured using pixi

Once pixi is installed, run pixi shell in the llm2geneset directory

cd llm2geneset
pixi shell

Installing as a pypi package

You can also install the package using pip as it is available on pypi https://pypi.org/project/llm2geneset

pip install llm2geneset

Usage

You can use the package in a script as follows after running pixi shell.

import openai
import llm2geneset
import asyncio

async def main():
    aclient = openai.AsyncClient()
    genes = await llm2geneset.get_genes(aclient, "Antigen presentation")
    print(','.join(genes['parsed_genes']))
    res = await llm2geneset.gs_proposal(
        aclient, genes, n_pathways=5,
        n_background=19846)
    res = await llm2geneset.gs_proposal(aclient, genes['parsed_genes'])
    print(res['ora_results'])

if __name__ == "__main__":
    asyncio.run(main())

# Output:
# HLA-A,HLA-B,HLA-C,HLA-DRA,HLA-DRB1,HLA-DRB3,HLA-DRB4,...
# set_descr  generatio   bgratio  richFactor  foldEnrich
# 1  Antigen processing and presentation via MHC cl...   0.500000  0.001209    0.625000  413.458333
# ...

See also notebooks/simple_example.ipynb for an example of how to use it within a jupyter notebook.

Webapp Interface

Streamlit is included in the llm2geneset environment. You can run the webapp interface as follows.

pixi shell
streamlit run webapp/app.py

pypi deployment

Bump version in setup.cfg and run the following according to the instructions for pyscaffold-markdown

pixi shell
tox -e docs  # to build documentation
tox -e build -- --wheel # to build the package distribution w/o source
tox -e publish  # to test project uploads correctly in test.pypi.org
tox -e publish -- --repository pypi  # release package to PyPI

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm2geneset-0.0.4-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file llm2geneset-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: llm2geneset-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for llm2geneset-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 59015746cdfc04b67cf69a5244b7a9ea79fe846aebde9df05f06359c95c6d1c4
MD5 b8a67dd76ff75089d9ca9df929865326
BLAKE2b-256 a2088f3659740834be75b0f02269340615ad9e5c70e34a9511c7ff132b94b134

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page