LLM-retrieval based knowledge grounding
Project description
LLM-retrieval based knowledge grounding
ragnosis
contains tools for extracting hypotheses from scientific paper PDFs, extracting entities according to a user model, and grounding entities to ontology terms.
Installation
This project relies on langchain-rdf which should be installed separately.
To install langchain-rdf run:
pip install git+https://github.com/vemonet/langchain-rdf.git
ragnosis
can be installed using pip:
pip install ragnosis
Usage
Hypothesis extraction
Hypotheses can be extracted from PDF files by running:
ragnosis extract_hypothesis path/to/paper.pdf [--model MODEL] [--temperature TEMP] [--out_file OUTPUT.txt]
Creating ontology indices
Before grounding entities, vector store indices must be created from your ontology files. One or more OWL files can be provided to create a single index. force_create
will overwrite an existing index. The index will be saved in the index_directory with the name merged_index
unless index_name
is specified:
ragnosis create_index index_directory path/to/ontology1.owl path/to/ontology2.owl [--force_create] [--index_name NAME]
Hypothesis grounding
To ground entities in an input text to ontology terms:
ragnosis ground_hypothesis "your hypothesis text" path/to/yaml_map.yaml [--model MODEL] [--temperature TEMP] [--out_md OUTPUT.md]
The YAML file should map entity extraction categories to pre-built vector store indices, for example:
bio_components: path/to/go_index
genes_proteins: path/to/protein_index
taxa: path/to/taxonomy_index
small_molecules: path/to/chebi_index
where the path/to/go_index
refers to pre-built vector store files path/to/go_index.faiss
and path/to/go_index.pkl
. A sample YAML file can be found in the ragnosis
repository.
LLM Model Selection
For all commands that accept a --model
parameter, you can specify:
- OpenAI models with prefix
openai/
(e.g.,openai/gpt-4o
) - Ollama models with prefix
ollama/
(e.g.,ollama/llama3
)
The default model is openai/gpt-4o
. When using OpenAI models, make sure to set your OPENAI_API_KEY
environment variable before running the commands. For Ollama, make sure to have ollama installed and running.
Output Files
Most commands support saving output in markdown format using the --out_md
parameter. For hypothesis extraction, use --out_file
to save the extracted hypothesis as plain text. If the out file parameter is not provided, no output file will be saved. The output will be printed to the console in all cases.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ragnosis-0.1.5.tar.gz
.
File metadata
- Download URL: ragnosis-0.1.5.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61f20aab5ea46f153326a9ffd03183709902c1466cba725422332fb3b10f5f6a |
|
MD5 | 49aa4e01cd94da3b01943c2e4f419403 |
|
BLAKE2b-256 | c70b04589caa2d45738ca3cf13c70d679c2f9c5c09e1df4c5df0ef5d2c881295 |
File details
Details for the file ragnosis-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: ragnosis-0.1.5-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e12703d4bdb66d52d4fc9973d343b77816be022ad1e6f2a900a76038853772e2 |
|
MD5 | a569c26d7000b8e7faf5bac1c3b8e21b |
|
BLAKE2b-256 | d498c96b3307a88d1a38daaae3eee9f2e64f3051be0b84e07501537f165b9c84 |