Skip to main content

LLM-retrieval based knowledge grounding

Project description

LLM-retrieval based knowledge grounding

ragnosis contains tools for extracting hypotheses from scientific paper PDFs, extracting entities according to a user model, and grounding entities to ontology terms.

Installation

This project relies on langchain-rdf developed by Vincent Emonet

To install langchain-rdf run:

pip install git+https://github.com/vemonet/langchain-rdf.git

ragnosis can be installed using pip:

pip install ragnosis

Usage

Hypothesis extraction

Hypotheses can be extracted from PDF files by running:

ragnosis extract_hypothesis path/to/paper.pdf [--model MODEL] [--temperature TEMP] [--out_file OUTPUT.txt]

Creating ontology indices

Before grounding entities, vector store indices must be created from your ontology files. One or more OWL files can be provided to create a single index. force_create will overwrite an existing index. The index will be saved in the index_directory with the name merged_index unless index_name is specified:

ragnosis create_index index_directory path/to/ontology1.owl path/to/ontology2.owl [--force_create] [--index_name NAME]

Hypothesis grounding

To ground entities in an input text to ontology terms:

ragnosis ground_hypothesis "your hypothesis text" path/to/yaml_map.yaml [--model MODEL] [--temperature TEMP] [--out_md OUTPUT.md]

The YAML file should map entity extraction categories to pre-built vector store indices, for example:

bio_components: path/to/go_index
genes_proteins: path/to/protein_index
taxa: path/to/taxonomy_index
small_molecules: path/to/chebi_index

where the path/to/go_index refers to pre-built vector store files path/to/go_index.faiss and path/to/go_index.pkl. A sample YAML file can be found in the ragnosis repository.

LLM Model Selection

For all commands that accept a --model parameter, you can specify:

  • OpenAI models with prefix openai/ (e.g., openai/gpt-4o)
  • Ollama models with prefix ollama/ (e.g., ollama/llama3)

The default model is openai/gpt-4o. When using OpenAI models, make sure to set your OPENAI_API_KEY environment variable before running the commands. For Ollama, make sure to have ollama installed and running.

Output Files

Most commands support saving output in markdown format using the --out_md parameter. For hypothesis extraction, use --out_file to save the extracted hypothesis as plain text. If the out file parameter is not provided, no output file will be saved. The output will be printed to the console in all cases.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragnosis-0.1.4.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

ragnosis-0.1.4-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file ragnosis-0.1.4.tar.gz.

File metadata

  • Download URL: ragnosis-0.1.4.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for ragnosis-0.1.4.tar.gz
Algorithm Hash digest
SHA256 e7347295b57694212b53336ef0d567fed71ca451c3f8b25df413b7dfb8f8a5b4
MD5 3a01e45fb69c5aa10d5a6dcb833e78ba
BLAKE2b-256 ced0f1b466353d31ea3a49b78be1d49aacf3820da1cefab1d8ef60d036ed204d

See more details on using hashes here.

File details

Details for the file ragnosis-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: ragnosis-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for ragnosis-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 48bc6061d89900a5cec705ccf8b1f06e794e535968bcdffc30d29444a3308ee8
MD5 387eac68a478bff3429e566317f3f61e
BLAKE2b-256 0d21585ac1ea8df8400751dfae237861eef36e520450692a58b75ad75f64fdb2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page