conformal retreival augmented generation with LLMs
Project description
CONFLARE: CONFormal LArge language model REtrieval
This is the repo for the CONFLARE paper and the related python package conflare
.
Installation
pip install conflare
Here are the 3 main tasks this package helps you with:
- Loading the source documents (+ cleaning and chunking them)
- Creating (or loading) a Calibration set
- Retrieval Augmented Generation by applying conformal prediction
Usage
Example:
# 1
import os
os.environ['OPENAI_API_KEY'] = 'your openai secret key'
# to use HuggingFace models w/o needing an openai key, look below.
import conflare
from conflare import initialize_pipeline
from conflare.conformal.calibration import create_calibration_records
from conflare.augmented_retrieval.rag import ConformalRetrievalQA
document_dir = './data/documents'
docs, qa_pipeline, vector_db = initialize_pipeline(document_dir)
# 2
calibration_records = create_calibration_records(
docs,
qa_pipeline=qa_pipeline,
vector_db=vector_db,
size=100,
topic_of_interest="Deep Learning"
)
# 3
conformal_rag = ConformalRetrievalQA(
qa_pipeline=qa_pipeline,
vector_db=vector_db,
calibration_records=calibration_records,
error_rate=0.10,
verbose=True
)
response, retrieved_docs = conformal_rag(
"How can a transformer model be used in detection of COVID?"
)
print(response)
>>>
Input Error Rate: 10.00%
Selected cosine distance thereshold: 0.456
Number of retrieved documents: 2
A transformer model can be used in the detection of COVID-19 by analyzing medical images ...
If you have run this script once before and saved the calibration records to disk, you can use the following to load the calibration records:
from conflare.conformal.calibration import QuestionEvaluation
q_evaluation = QuestionEvaluation.from_pickle(path_to_pickle)
calibration_records = q_evaluation.get_calibration_records()
Arguments
Here are some of the more important arguments that the functions and classes in this package use.
You can also take a look at the definition of initialize_pipeline
function to see most of them.
Looking at the definition of initialize_pipeline
, you can see the sequence of the functions called inside it and use them in your own custom way if neccessary.
model
: the model name used for QA and retreivals. If set to gpt-*
models, it will use the OpenAI models and an OpenAI API Key will be required. It can also be set to models names on HuggingFace like mistralai/Mistral-7B-Instruct-v0.1
to use HF models w/o needing a key.
embedding_model
: the model from sentence-transformers
library to be used to create embeddings for text chunks and user questions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.