Skip to main content

OntoGPT is a Python package for extracting structured information from text with large language models (LLMs), instruction prompts, and ontology-based grounding.

Project description

OntoGPT

OntoGPT Logo

DOI PyPI

Introduction

OntoGPT is a Python package for extracting structured information from text with large language models (LLMs), instruction prompts, and ontology-based grounding.

For more details, please see the full documentation.

Quick Start

OntoGPT runs on the command line, though there's also a minimal web app interface (see Web Application section below).

  1. Ensure you have Python 3.9 or greater installed.

  2. Install with pip:

    pip install ontogpt
    
  3. Set your OpenAI API key:

    runoak set-apikey -e openai <your openai api key>
    
  4. See the list of all OntoGPT commands:

    ontogpt --help
    
  5. Try a simple example of information extraction:

    echo "One treatment for high blood pressure is carvedilol." > example.txt
    ontogpt extract -i example.txt -t drug
    

    OntoGPT will retrieve the necessary ontologies and output results to the command line. Your output will provide all extracted objects under the heading extracted_object.

Web Application

There is a bare bones web application for running OntoGPT and viewing results.

First, install the required dependencies with pip by running the following command:

pip install ontogpt[web]

Then run this command to start the web application:

web-ontogpt

NOTE: We do not recommend hosting this webapp publicly without authentication.

Model APIs

OntoGPT uses the litellm package (https://litellm.vercel.app/) to interface with LLMs.

This means most APIs are supported, including OpenAI, Azure, Anthropic, Mistral, Replicate, and beyond.

The model name to use may be found from the command ontogpt list-models - use the name in the first column with the --model option.

In most cases, this will require setting the API key for a particular service as above:

runoak set-apikey -e anthropic-key <your anthropic api key>

Some endpoints, such as OpenAI models through Azure, require setting additional details. These may be set similarly:

runoak set-apikey -e azure-key <your azure api key>
runoak set-apikey -e azure-base <your azure endpoint url>
runoak set-apikey -e azure-version <your azure api version, e.g. "2023-05-15">

These details may also be set as environment variables as follows:

export AZURE_API_KEY="my-azure-api-key"
export AZURE_API_BASE="https://example-endpoint.openai.azure.com"
export AZURE_API_VERSION="2023-05-15"

Open Models

Open LLMs may be retrieved and run through the ollama package (https://ollama.com/).

You will need to install ollama (see the GitHub repo), and you may need to start it as a service with a command like ollama serve or sudo systemctl start ollama.

Then retrieve a model with ollama pull <modelname>, e.g., ollama pull llama3.

The model may then be used in OntoGPT by prefixing its name with ollama/, e.g., ollama/llama3, along with the --model option.

Some ollama models may not be listed in ontogpt list-models but the full list of downloaded LLMs can be seen with ollama list command.

Evaluations

OntoGPT's functions have been evaluated on test data. Please see the full documentation for details on these evaluations and how to reproduce them.

Related Projects

  • TALISMAN, a tool for generating summaries of functions enriched within a gene set. TALISMAN uses OntoGPT to work with LLMs.

Tutorials and Presentations

  • Presentation: "Staying grounded: assembling structured biological knowledge with help from large language models" - presented by Harry Caufield as part of the AgBioData Consortium webinar series (September 2023)
  • Presentation: "Transforming unstructured biomedical texts with large language models" - presented by Harry Caufield as part of the BOSC track at ISMB/ECCB 2023 (July 2023)
  • Presentation: "OntoGPT: A framework for working with ontologies and large language models" - talk by Chris Mungall at Joint Food Ontology Workgroup (May 2023)

Citation

The information extraction approach used in OntoGPT, SPIRES, is described further in: Caufield JH, Hegde H, Emonet V, Harris NL, Joachimiak MP, Matentzoglu N, et al. Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning. Bioinformatics, Volume 40, Issue 3, March 2024, btae104, https://doi.org/10.1093/bioinformatics/btae104.

Acknowledgements

This project is part of the Monarch Initiative. We also gratefully acknowledge Bosch Research for their support of this research project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ontogpt-1.0.19.tar.gz (300.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ontogpt-1.0.19-py3-none-any.whl (455.3 kB view details)

Uploaded Python 3

File details

Details for the file ontogpt-1.0.19.tar.gz.

File metadata

  • Download URL: ontogpt-1.0.19.tar.gz
  • Upload date:
  • Size: 300.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ontogpt-1.0.19.tar.gz
Algorithm Hash digest
SHA256 67429256b85d30c42a083760a5c06eb334d9f04f7d6db52add6a14b4ce610987
MD5 b6f602cc33c9f23bd7754d2111efde52
BLAKE2b-256 d9a53e3f40a3850ec43780cb36b13f6c8e2ac7e75a76f6564fc159ca6255a745

See more details on using hashes here.

Provenance

The following attestation bundles were made for ontogpt-1.0.19.tar.gz:

Publisher: pypi-publish.yml on monarch-initiative/ontogpt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ontogpt-1.0.19-py3-none-any.whl.

File metadata

  • Download URL: ontogpt-1.0.19-py3-none-any.whl
  • Upload date:
  • Size: 455.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ontogpt-1.0.19-py3-none-any.whl
Algorithm Hash digest
SHA256 6cd2b2026b6a1e1b287723af1d0ea157b74e0525499cfced2881562dd6c6cbe0
MD5 5f4af2cf04a1a30cffce9e4a153d7fe1
BLAKE2b-256 1cba77f5081b504fe97a89fa7750fc7c1e88b53c56889991c2b7d8bc1738d843

See more details on using hashes here.

Provenance

The following attestation bundles were made for ontogpt-1.0.19-py3-none-any.whl:

Publisher: pypi-publish.yml on monarch-initiative/ontogpt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page