CurateGPT

These details have not been verified by PyPI

Project description

CurateGPT

CurateGPT is a prototype web application and framework for performing general purpose AI-guided curation and curation-related operations over collections of objects.

See also the app on curategpt.io (note: this is sometimes down, and may only have a subset of the functionality of the local app)

Getting started

User installation

CurateGPT is available on Pypi and may be installed with pip:

pip install curategpt

Developer installation

You will first need to install Poetry.

Then clone this repo.

git clone https://github.com/monarch-initiative/curategpt.git
cd curategpt

and install the dependencies:

poetry install

API keys

In order to get the best performance from CurateGPT, we recommend getting an OpenAI API key, and setting it:

export OPENAI_API_KEY=<your key>

(for members of Monarch: ask on Slack if you would like to use the group key)

CurateGPT will also work with other large language models - see "Selecting models" below.

Loading example data and running the app

You initially start with an empty database. You can load whatever you like into this database! Any JSON, YAML, or CSV is accepted. CurateGPT comes with wrappers for some existing local and remote sources, including ontologies. The Makefile contains some examples of how to load these. You can load any ontology using the ont-<name> target, e.g.:

make ont-cl

This loads CL (via OAK) into a collection called ont_cl

Note that by default this loads into a collection set stored at stagedb, whereas the app works off of db. You can copy the collection set to the db with:

cp -r stagedb/* db/

You can then run the streamlit app with:

make app

Building Indexes

CurateGPT depends on vector database indexes of the databases/ontologies you want to curate.

The flagship application is ontology curation, so to build an index for an OBO ontology like CL:

make ont-cl

This requires an OpenAI key.

(You can build indexes using an open embedding model, modify the command to leave off the -m option, but this is not recommended as currently oai embeddings seem to work best).

To load the default ontologies:

make all

(this may take some time)

To load different databases:

make load-db-hpoa
make load-db-reactome

You can load an arbitrary json, yaml, or csv file:

curategpt view index -c my_foo foo.json

(you will need to do this in the poetry shell)

To load a GitHub repo of issues:

curategpt -v view index -c gh_uberon -m openai:  --view github --init-with "{repo: obophenotype/uberon}"

The following are also supported:

Google Drives
Google Sheets
Markdown files
LinkML Schemas
HPOA files
GOCAMs
MAXOA files
Many more

Notebooks

See notebooks for examples.

Selecting models

Currently this tool works best with the OpenAI gpt-4 model (for instruction tasks) and OpenAI ada-text-embedding-002 for embedding.

CurateGPT is layered on top of simonw/llm which has a plugin architecture for using alternative models. In theory you can use any of these plugins.

Additionally, you can set up an openai-emulating proxy using litellm.

The litellm proxy may be installed with pip as pip install litellm[proxy].

Let's say you want to run mixtral locally using ollama. You start up ollama (you may have to run ollama serve first):

ollama run mixtral

Then start up litellm:

litellm -m ollama/mixtral

Next edit your extra-openai-models.yaml as detailed in the llm docs:

- model_name: ollama/mixtral
  model_id: litellm-mixtral
  api_base: "http://0.0.0.0:8000"

You can now use this:

curategpt ask -m litellm-mixtral -c ont_cl "What neurotransmitter is released by the hippocampus?"

But be warned that many of the prompts in curategpt were engineered against openai models, and they may give suboptimal results or fail entirely on other models. As an example, ask seems to work quite well with mixtral, but complete works horribly. We haven't yet investigated if the issue is the model or our prompts or the overall approach.

Welcome to the world of AI engineering!

Using the command line

curategpt --help

You will see various commands for working with indexes, searching, extracting, generating, etc.

These functions are generally available through the UI, and the current priority is documenting these.

Chatting with a knowledge base

curategpt ask -c ont_cl "What neurotransmitter is released by the hippocampus?"

may yield something like:

The hippocampus releases gamma-aminobutyric acid (GABA) as a neurotransmitter [1](#ref-1).

...

## 1

id: GammaAminobutyricAcidSecretion_neurotransmission
label: gamma-aminobutyric acid secretion, neurotransmission
definition: The regulated release of gamma-aminobutyric acid by a cell, in which the
  gamma-aminobutyric acid acts as a neurotransmitter.
...

Chatting with pubmed

curategpt view ask -V pubmed "what neurons express VIP?"

Chatting with a GitHub issue tracker

curategpt ask -c gh_obi "what are some new term requests for electrophysiology terms?"

Term Autocompletion (DRAGON-AI)

curategpt complete -c ont_cl  "mesenchymal stem cell of the apical papilla"

yields

id: MesenchymalStemCellOfTheApicalPapilla
definition: A mesenchymal cell that is part of the apical papilla of a tooth and has
  the ability to self-renew and differentiate into various cell types such as odontoblasts,
  fibroblasts, and osteoblasts.
relationships:
- predicate: PartOf
  target: ApicalPapilla
- predicate: subClassOf
  target: MesenchymalCell
- predicate: subClassOf
  target: StemCell
original_id: CL:0007045
label: mesenchymal stem cell of the apical papilla

All-by-all comparisons

You can compare all objects in one collection

curategpt all-by-all --threshold 0.80 -c ont_hp -X ont_mp --ids-only -t csv > ~/tmp/allxall.mp.hp.csv

This takes 1-2s, as it involves comparison over pre-computed vectors. It reports top hits above a threshold.

Results may vary. You may want to try different texts for embeddings (the default is the entire json object; for ontologies it is concatenation of labels, definition, aliases).

sample:

HP:5200068,Socially innappropriate questioning,MP:0001361,social withdrawal,0.844015132437909
HP:5200069,Spinning,MP:0001411,spinning,0.9077306606290237
HP:5200071,Delayed Echolalia,MP:0013140,excessive vocalization,0.8153252835818089
HP:5200072,Immediate Echolalia,MP:0001410,head bobbing,0.8348177036912526
HP:5200073,Excessive cleaning,MP:0001412,excessive scratching,0.8699103725005582
HP:5200104,Abnormal play,MP:0020437,abnormal social play behavior,0.8984862078522344
HP:5200105,Reduced imaginative play skills,MP:0001402,decreased locomotor activity,0.85571629684631
HP:5200108,Nonfunctional or atypical use of objects in play,MP:0003908,decreased stereotypic behavior,0.8586700411012859
HP:5200129,Abnormal rituals,MP:0010698,abnormal impulsive behavior control,0.8727804272023427
HP:5200134,Jumping,MP:0001401,jumpy,0.9011393233129765

Note that CurateGPT has a separate component for using an LLM to evaluate candidate matches (see also https://arxiv.org/abs/2310.03666); this is not enabled by default, this would be expensive to run for a whole ontology.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.4

Jun 25, 2025

0.2.3

Jan 16, 2025

0.2.2

Nov 15, 2024

0.2.1

Oct 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

curategpt-0.2.4.tar.gz (130.1 kB view details)

Uploaded Jun 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

curategpt-0.2.4-py3-none-any.whl (183.4 kB view details)

Uploaded Jun 25, 2025 Python 3

File details

Details for the file curategpt-0.2.4.tar.gz.

File metadata

Download URL: curategpt-0.2.4.tar.gz
Upload date: Jun 25, 2025
Size: 130.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for curategpt-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`fb7556adc1a0ca804d63e7bc8e86e0164b410e899a239ac2476862e4d4d1548f`
MD5	`52ae3905c466eb79c195ad03be9dcfa6`
BLAKE2b-256	`7d6754649e1b98d6b740e1fbcbe6db3d6618549a332545a6379a9b4c146a9399`

See more details on using hashes here.

Provenance

The following attestation bundles were made for curategpt-0.2.4.tar.gz:

Publisher: pypi-publish.yml on monarch-initiative/curategpt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: curategpt-0.2.4.tar.gz
- Subject digest: fb7556adc1a0ca804d63e7bc8e86e0164b410e899a239ac2476862e4d4d1548f
- Sigstore transparency entry: 250700705
- Sigstore integration time: Jun 25, 2025
Source repository:
- Permalink: monarch-initiative/curategpt@214800066d45194305cab6c9cc4e8d3e64bda974
- Branch / Tag: refs/tags/v0.2.4
- Owner: https://github.com/monarch-initiative
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@214800066d45194305cab6c9cc4e8d3e64bda974
- Trigger Event: release

File details

Details for the file curategpt-0.2.4-py3-none-any.whl.

File metadata

Download URL: curategpt-0.2.4-py3-none-any.whl
Upload date: Jun 25, 2025
Size: 183.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for curategpt-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4e8bb4bbb412d07a0e602ffc4b9a0c10898ddbbf3a6ba0a8523f08bb7e29fb04`
MD5	`6224f489b7ab1e824afbb2fed4807c9e`
BLAKE2b-256	`f40c1cc1a360193418e3936753b09b9d05d4454f05952d07beaaa73705f1f84d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for curategpt-0.2.4-py3-none-any.whl:

Publisher: pypi-publish.yml on monarch-initiative/curategpt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: curategpt-0.2.4-py3-none-any.whl
- Subject digest: 4e8bb4bbb412d07a0e602ffc4b9a0c10898ddbbf3a6ba0a8523f08bb7e29fb04
- Sigstore transparency entry: 250700722
- Sigstore integration time: Jun 25, 2025
Source repository:
- Permalink: monarch-initiative/curategpt@214800066d45194305cab6c9cc4e8d3e64bda974
- Branch / Tag: refs/tags/v0.2.4
- Owner: https://github.com/monarch-initiative
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-publish.yml@214800066d45194305cab6c9cc4e8d3e64bda974
- Trigger Event: release

curategpt 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

CurateGPT

Getting started

User installation

Developer installation

API keys

Loading example data and running the app

Building Indexes

Notebooks

Selecting models

Using the command line

Chatting with a knowledge base

Chatting with pubmed

Chatting with a GitHub issue tracker

Term Autocompletion (DRAGON-AI)

All-by-all comparisons

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance