Create summararies of groups of genes.
Project description
👓 GeneGist
Researchers often face challenges in deciphering the complex interactions and functions of systems of genes. GeneGist addresses this problem by providing detailed summaries and insights into gene behaviors, interactions, and their roles in biological processes.
This complexity arises from the vast array of gene interactions, regulatory mechanisms, and the multifaceted roles genes play in biological processes. GeneGist generates in-depth summaries and insights into gene behaviors and interactions, as well as their roles in biological pathways and systems.
GeneGist first scrapes and analyzes academic articles. GeneGist leverages the most advanced Large Language Models (LLMs) available to analyze this information. Using this distilled knowledge it produces biological process summaries.
GeneGist can also create Gene Reference Into Function (GeneRIFs) directly from scientific literature. GeneRIFs are concise sentence-like annotations, typically written by a human, that describe the function of a gene. GeneGist can construct GeneRIFs using generative AI technology based on LLMs.
License
Apache License
Installation
To install GeneGist, ensure you have Python 3.10 or higher. It can be installed via pip:
pip install genegist
Development
Installing Poetry
Poetry is required to handle dependencies and package management. To install Poetry, run:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
Setting Up genegist
-
Clone the repository:
git clone [repository URL] cd genegist
-
Install the dependencies using Poetry:
poetry install
Usage
To use genegist, run the following command:
poetry run genegist [options]
Options
-
-g GENE
,--gene GENE
: Look up GeneRIFs for a given gene. -
-s GENESET
,--geneset GENESET
: Look up GeneRIFs for a given gene set. -
-f GENESET_FILE
,--geneset-file GENESET_FILE
: Look up GeneRIFs for a file containing a list of genes. -
-p PROCESS
,--process PROCESS
: Find a biological process for the inputted gene set. -
-d CREATE_DRY_RUN
,--create-dry-run CREATE_DRY_RUN
: Don't actually run the biological process finder, but save the gene summaries to a file. -
-a
,--abstracts
: Also look up abstracts. -
-r LOAD_DRY_RUN
,--load-dry-run LOAD_DRY_RUN
: Load the gene summaries from a file instead of running the LLM on them explicitly. -
--llm {gpt-3.5-turbo-1106,gpt-4-1106-preview}
: Specify the LLM to use. -
-m ARTICLE
,--article ARTICLE
: Get the summary for a given PMID. -
-t
,--tasks
: Run a given custom task. Currently only E3 ligase analysis is supported. -
-y
,--synthetic-generifs
: Create synthetic generifs and save them to a tab-delimited file. -
-i
,--build-index
: Build an embedding index for all the generifs.
Development
Running Tests
To run tests, use:
poetry run pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file genegist-0.1.5.tar.gz
.
File metadata
- Download URL: genegist-0.1.5.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.12.1 Darwin/23.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6e75b4d2add7e12e02637aaf01dc50f0d290c301e603517c8ae94b359c1f814 |
|
MD5 | b5bfc90d0a0f60d4b4fc73bfb3d95411 |
|
BLAKE2b-256 | ca57c77a00f6cf2c3375a81da273ba87a05a1c6da900a4cef5e42721845b8f42 |
File details
Details for the file genegist-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: genegist-0.1.5-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.12.1 Darwin/23.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f119c2febcb72997af04bc02fae940f9d681fb974576cf9b833cb4f27797e34 |
|
MD5 | fcf3a648ea989dad3e3908b2908f43a5 |
|
BLAKE2b-256 | 6ecbaac9a2f630845f194ce30c88b9bb50e2ec92e611f3106388f60e54d5474a |