Skip to main content

LLM-retrieval based knowledge grounding

Project description

LLM-retrieval based knowledge grounding

ragnosis contains tools for extracting hypotheses from scientific paper PDFs, extracting entities according to a user model, and grounding entities to ontology terms.

Installation

This project relies on langchain-rdf which should be installed separately.

To install langchain-rdf run:

pip install git+https://github.com/vemonet/langchain-rdf.git

ragnosis can be installed using pip:

pip install ragnosis

Usage

Hypothesis extraction

Hypotheses can be extracted from PDF files by running:

ragnosis extract_hypothesis path/to/paper.pdf [--model MODEL] [--temperature TEMP] [--out_file OUTPUT.txt]

Creating ontology indices

Before grounding entities, vector store indices must be created from your ontology files. One or more OWL files can be provided to create a single index. force_create will overwrite an existing index. The index will be saved in the index_directory with the name merged_index unless index_name is specified:

ragnosis create_index index_directory path/to/ontology1.owl path/to/ontology2.owl [--force_create] [--index_name NAME]

Hypothesis grounding

To ground entities in an input text to ontology terms:

ragnosis ground_hypothesis "your hypothesis text" path/to/yaml_map.yaml [--model MODEL] [--temperature TEMP] [--out_md OUTPUT.md]

The YAML file should map entity extraction categories to pre-built vector store indices, for example:

bio_components: path/to/go_index
genes_proteins: path/to/protein_index
taxa: path/to/taxonomy_index
small_molecules: path/to/chebi_index

where the path/to/go_index refers to pre-built vector store files path/to/go_index.faiss and path/to/go_index.pkl. A sample YAML file can be found in the ragnosis repository.

LLM Model Selection

For all commands that accept a --model parameter, you can specify:

  • OpenAI models with prefix openai/ (e.g., openai/gpt-4o)
  • Ollama models with prefix ollama/ (e.g., ollama/llama3)

The default model is openai/gpt-4o. When using OpenAI models, make sure to set your OPENAI_API_KEY environment variable before running the commands. For Ollama, make sure to have ollama installed and running.

Output Files

Most commands support saving output in markdown format using the --out_md parameter. For hypothesis extraction, use --out_file to save the extracted hypothesis as plain text. If the out file parameter is not provided, no output file will be saved. The output will be printed to the console in all cases.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragnosis-0.1.5.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

ragnosis-0.1.5-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file ragnosis-0.1.5.tar.gz.

File metadata

  • Download URL: ragnosis-0.1.5.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for ragnosis-0.1.5.tar.gz
Algorithm Hash digest
SHA256 61f20aab5ea46f153326a9ffd03183709902c1466cba725422332fb3b10f5f6a
MD5 49aa4e01cd94da3b01943c2e4f419403
BLAKE2b-256 c70b04589caa2d45738ca3cf13c70d679c2f9c5c09e1df4c5df0ef5d2c881295

See more details on using hashes here.

File details

Details for the file ragnosis-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: ragnosis-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.12

File hashes

Hashes for ragnosis-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e12703d4bdb66d52d4fc9973d343b77816be022ad1e6f2a900a76038853772e2
MD5 a569c26d7000b8e7faf5bac1c3b8e21b
BLAKE2b-256 d498c96b3307a88d1a38daaae3eee9f2e64f3051be0b84e07501537f165b9c84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page