Skip to main content

A simple framework for multimodal RAG

Project description

MidrasAI

MidrasAI provides a simple API for using the Colpali model, which is a multi-modal model for text and image retrieval. It allows for local access to the model, and integrates a vector database for efficient storage and sematic search.

Setting up the model as a server for remote access is a WIP.

Getting started

Note: This is an alpha version of MidrasAI. All feedack and suggestions are welcome!

Local Dependencies

  • ColPali access: ColPali is based on PaliGemma, you will need to request access to the model here. Then you must authenticate through the huggingface-cli to download the model.
  • Poppler: Midras uses pdf2image to convert pdfs to images. This library requires poppler to be installed on your system. Check out the installation instructions here.
  • Hardware: ColPali is a 3B parmeter model, so I recommend using a GPU with at least 8GB of VRAM.

Installation

If running locally, you can install MidrasAI and its dependencies with pip, poetry, or uv:

# pip
pip install 'midrasai[local]'

# poetry
poetry install 'midrasai[local]'

# uv
uv install 'midrasai[local]'

Usage

Starting the ColPali model

To load the ColPali model locally, you just need to use the LocalMidras class:

from midrasai.local import LocalMidras

midras = LocalMidras() # Make sure you're logged in to HuggingFace so you can download the model

Creating an index

To create an index, you can use the create_index method with the name of the index you want to create:

midras.create_index("my_index")

Using the model to embed data

The Midras class provides a couple of convenience methods for embeding data. You can use the embed_pdf method to embed a single pdf, or the embed_pil_images method to embed a list of images. Here's how to use them:

# Embed a single pdf
path_to_pdf = "path/to/pdf.pdf"

pdf_response = midras.embed_pdf(path_to_pdf, include_images=True)
# Embed a list of images
images = [Image.open("path/to/image.png"), Image.open("path/to/another_image.png")]

image_response = midras.embed_pil_images(images)

Inserting data into an index

Once you have your data embeddings, you can insert a data point into your index with the add_point method:

midras.add_point(
    index="my_index", # name of the index you want to add to
    id=1, # id of this data point, can be any integer or string
    embedding=response.embeddings[0], # the embedding you created in the previous step
    data={ # any additional data you want to store with this point, can be any dictionary
        "something": "hi"
        "something_else": 123
    }
)

Searching an index

After you've added data to your index, you can start searching for relevant data. You can use the query method to do this:

query = "What is the meaing of life?"

results = midras.query(index_name, query=query)

# Top 3 relevant data points
for result in results[:3]:
    # Each result will have a score, which is a measure of how relevant the data is to the query
    print(f"score: {result.score}")
    # Each result will also have any additional data you stored with it
    print(f"data: {result.data}")

If you want a more detailed example including RAG, check out the example vector search notebook.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

midrasai-0.1.5.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

midrasai-0.1.5-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file midrasai-0.1.5.tar.gz.

File metadata

  • Download URL: midrasai-0.1.5.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.8.0-48-generic

File hashes

Hashes for midrasai-0.1.5.tar.gz
Algorithm Hash digest
SHA256 321928e0ba3ae66a7ce279fb6b0965cb4eb4eca7674acf5f7e07e6c5d0787218
MD5 69cccf73718a6c2e42899581a866173f
BLAKE2b-256 e97b5069fee0858fea644cdfe3ccc8bb75b7661c2af82fc0419d01794c0a94dd

See more details on using hashes here.

File details

Details for the file midrasai-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: midrasai-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.8.0-48-generic

File hashes

Hashes for midrasai-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a576024b1ff3495384276cf1ea69f9d84e7c97cb4d3786d831c7ada91ec565eb
MD5 4514e70d21736bacbc046b534f3a30cd
BLAKE2b-256 4ca065a344ad24871aa86e095b490af5b75163d056d80e95ff3002449c59b12a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page