No project description provided
Project description
MidrasAI
MidrasAI provides a simple API for using the Colpali model, which is a multi-modal model for text and image retrieval. It allows for local and remote access to the model, and integrates a vector database for efficient storage and sematic search.
Getting started
Note: This is an alpha version of MidrasAI. All feedack and suggestions are welcome!
Local Dependencies
- ColPali access: ColPali is based on PaliGemma, you will request access to the model here. Then you must authenticate through the huggingface-cli to download the model.
- Poppler: Midras uses
pdf2image
to convert pdfs to images. This library requirespoppler
to be installed on your system. Check out the installation instructions here. - Hardware: ColPali is a 3B parmeter model, so I recommend using a GPU with at least 8GB of VRAM.
API Dependencies
- API Key: You will need an API key to use MidrasAI. You can get one by logging in to the MidrasAI website with your Github account.
Installation
If running locally, you can install MidrasAI and its dependencies with pip:
pip install 'midrasai[local]'
If using the API, you can install MidrasAI by itself without dependencies with pip:
pip install midrasai
Usage
Starting the ColPali model
To load the ColPali model locally, you just need to use the LocalMidras
class:
from midrasai.local import LocalMidras
midras = LocalMidras() # Make sure your'e logged in to HuggingFace so you can download the model
If you're using the API, you can import the Midras
class instead, which will not load the model locally:
from midrasai import Midras
import os
midras = Midras(os.getenv("MIDRAS_API_KEY")) # Using this class requires an API key
Aftert this point, both local and API Midras will work exactly the same.
Creating an index
To create an index, you can use the create_index
method with the name of the index you want to create:
midras.create_index("my_index")
Using the model to embed data
The Midras class provides a couple of convenience methods for embeding data.
You can use the embed_pdf
method to embed a single pdf, or the embed_pil_images
method to embed a list of images. Here's how to use them:
# Embed a single pdf
path_to_pdf = "path/to/pdf.pdf"
pdf_response = midras.embed_pdf(path_to_pdf, include_images=True)
# Embed a list of images
images = [Image.open("path/to/image.png"), Image.open("path/to/another_image.png")]
image_response = midras.embed_pil_images(images)
Inserting data into an index
Once you have your data embeddings, you can insert a data point into your index with the add_point
method:
midras.add_point(
index="my_index", # name of the index you want to add to
id=1, # id of this data point, can be any integer or string
embedding=response.embeddings[0], # the embedding you created in the previous step
data={ # any additional data you want to store with this point, can be any dictionary
"something": "hi"
"something_else": 123
}
)
Searching an index
After you've added data to your index, you can start searching for relevant data. You can use the query_text
method to do this:
query = "What is the meaing of life?"
results = midras.query_text(index_name, text=query)
# Top 3 relevant data points
for result in results[:3]:
# Each result will have a score, which is a measure of how relevant the data is to the query
print(f"score: {result.score}")
# Each result will also have any additional data you stored with it
print(f"data: {result.data}")
If you want a more detailed example including RAG, check out the example vector search notebook.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file midrasai-0.1.4.tar.gz
.
File metadata
- Download URL: midrasai-0.1.4.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.5 Linux/6.1.0-25-amd64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bdd62a30554e7a2b31beec574e5aee7d7faa16beb51359c86eb04d062f4599e |
|
MD5 | 893f1f3df967a1864fedc8a61ba4cbcf |
|
BLAKE2b-256 | 7163adf57f10592cb995a7c9c5866d47cf30ac1d501465acb17ec81b9db3bfee |
File details
Details for the file midrasai-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: midrasai-0.1.4-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.5 Linux/6.1.0-25-amd64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed9ddc8b71deec4736c3d9c92024219f207a7794ef308838cd00b78702e89aad |
|
MD5 | b41e246511e348a9022289f63e1d5d05 |
|
BLAKE2b-256 | f0b417954c4d2b3719c6ba6e9c033a9bd2949e479f58c9be00d6037e70779a61 |