conformal retreival augmented generation with LLMs
Project description
CONFLARE: CONFormal LArge language model REtrieval
This is the repo for the CONFLARE paper and the related python package conflare.
Installation
pip install conflare
Here are the 3 main tasks this package helps you with:
- Loading the source documents (+ cleaning and chunking them)
- Creating (or loading) a Calibration set
- Retrieval Augmented Generation by applying conformal prediction
Usage
Example:
# 1
import os
os.environ['OPENAI_API_KEY'] = 'your openai secret key'
# to use HuggingFace models w/o needing an openai key, look below.
import conflare
from conflare import initialize_pipeline
from conflare.conformal.calibration import create_calibration_records
from conflare.augmented_retrieval.rag import ConformalRetrievalQA
document_dir = './data/documents'
docs, qa_pipeline, vector_db = initialize_pipeline(document_dir)
# 2
calibration_records = create_calibration_records(
docs,
qa_pipeline=qa_pipeline,
vector_db=vector_db,
size=100,
topic_of_interest="Deep Learning"
)
# 3
conformal_rag = ConformalRetrievalQA(
qa_pipeline=qa_pipeline,
vector_db=vector_db,
calibration_records=calibration_records,
error_rate=0.10,
verbose=True
)
response, retrieved_docs = conformal_rag(
"How can a transformer model be used in detection of COVID?"
)
print(response)
>>>
Input Error Rate: 10.00%
Selected cosine distance thereshold: 0.456
Number of retrieved documents: 2
A transformer model can be used in the detection of COVID-19 by analyzing medical images ...
If you have run this script once before and saved the calibration records to disk, you can use the following to load the calibration records:
from conflare.conformal.calibration import QuestionEvaluation
q_evaluation = QuestionEvaluation.from_pickle(path_to_pickle)
calibration_records = q_evaluation.get_calibration_records()
Arguments
Here are some of the more important arguments that the functions and classes in this package use.
You can also take a look at the definition of initialize_pipeline function to see most of them.
Looking at the definition of initialize_pipeline, you can see the sequence of the functions called inside it and use them in your own custom way if neccessary.
model: the model name used for QA and retreivals. If set to gpt-* models, it will use the OpenAI models and an OpenAI API Key will be required. It can also be set to models names on HuggingFace like mistralai/Mistral-7B-Instruct-v0.1 to use HF models w/o needing a key.
embedding_model: the model from sentence-transformers library to be used to create embeddings for text chunks and user questions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file conflare-0.1.3.tar.gz.
File metadata
- Download URL: conflare-0.1.3.tar.gz
- Upload date:
- Size: 17.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3bd4ad3366015ce70441033efb182146dbf31b42ca7f53d591348f03444abf7e
|
|
| MD5 |
6c9d4236800e036f091f57afe946fd18
|
|
| BLAKE2b-256 |
bfcec546b73fad45ad448b27914111beca84cb4412acbfd592b7c1a8da89009c
|
File details
Details for the file conflare-0.1.3-py3-none-any.whl.
File metadata
- Download URL: conflare-0.1.3-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a736f45257fd6fe13b22264ea59062e7fb45726d5c6610ce06bf29fa8846673
|
|
| MD5 |
5cb1bf9c74c03819011abc1b330621c5
|
|
| BLAKE2b-256 |
e18f2e09564a50454254b70644e8203bb8fda28c8b54b8e1943e18b65424df56
|