Skip to main content

A python package for applying TriModal Retrieval-Augmented Generation (RAG) system.

Project description

GitHub release Github Testing Build License Open In Colab Open In LangChain Downloads

Content | Installation | Quickstart | Acknowledgements | References | Hugging Face | Portfolio

TriModal Ritrieval Augmented Generation - TriModalRAG

Tripple Model + Langchain: Find and support users in providing solutions for weather data

End to End your Retrieval Augmented Generation (RAG) pipelines integrating LLM Models (SOTA)

:book: Contents

🧊 Model Overview

Introduction

The TriModal Retrieval-Augmented Generation (T-RAG) Project is an advanced AI system that combines the power of text, image, and audio data for multi-modal retrieval and generation tasks. This project leverages state-of-the-art deep learning models, and cutting-edge supportive frameworks such as Langchain, DVC, and ZenML. Consequently, a shared embedding space can be built more efficiently where data from all three modalities can be processed, retrieved, and used in a generative pipeline.

The primary goal of this system is to enhance traditional information retrieval by integrating cross-modal knowledge, a fusion mechanism enabling the model to retrieve and generate accurate, context-aware responses that span multiple data types. Whether the task involves answering questions based on text, recognizing patterns in images, or interpreting sounds, the TriModal RAG framework is designed to handle and fuse these distinct types of data into a unified response.

Architecture

🪸 Getting Started

:shield: Installation

From release:

pip install trimodal-rag

Alternatively, from source:

git clone https://github.com/tph-kds/TriModalRAG_System.git

Or using docker container with our image, you can run:

    docker run -p 8000:8000 trimrag/trimrag

:fire: Quickstart

This is a small example program you can run to see trim_rag in action!

# You can setup inputs following yourselves:
    
# Let provide a query for chatbot response, as a below example.
query = "Does Typhoon Yagi have damages in Vietnam country and what were the consequences?"

# Create a folder which contains your data to run this model.
# example: Naming for data folder is ``data``
#  
text = ROOT_PROJECT_DIR /  ("data/file.pdf")
image = ROOT_PROJECT_DIR / ("data/image.jpg")
audio = ROOT_PROJECT_DIR / ("data/audio.mp3")

# |  Where ROOT_PROJECT_DIR: main folder of this project on your local computer after downloading from my github.  | 

# Using make for running the quick start on "terminal"
make qt
or
[
    #  adjust in a folder name: "tests/integration/quick_start.py"
    
    # and run: 
    
    python tests/integration/quick_start.py --query==query --text=text --image==image --audio=audio
]

# Ultimately, You would receive a result from chatbot's response
# 
# Good Luck! And Thank you for your interesting. 

[!NOTE] You could also check step by step of this project's workflow such as Data Ingestion, Data Processing, and more... in the tests/integration folder .

Install Required Packages

(It is recommended that the dependencies be installed under the Conda environment.)

pip install -r requirements.txt

or run init_setup.sh file in the project's folder:

<!-- Run this command to give the script execution rights: -->

chmod +x init_setup.sh

<!-- Right now, you can execute the script by typing: -->

bash init_setup.sh

To be detailed requirements on Pypi Website

The required supportive environment uses a hardware accelerator GPUs such as T4 of Colab, GPU A100, etc.

Prepare the Training Data

Name #Text(PDF) #Image #Audio
Quantity 100 100 100
Topic "Machine Learning Weather Prediction" "Weather" "Weather"
Type API API API
Supportive Website Arxiv Unsplash FreeSound
Feature Text in research papers Natural Object - (Non-human) Natural Sound - (As Rain, Lighting)

Models

  • The BERT (Bidirectional Encoder Representations from Transformers) model is used in the TriModal Retrieval-Augmented Generation (RAG) Project to generate high-quality text embeddings. BERT is a transformer-based model pre-trained on vast amounts of text data, which allows it to capture contextual information from both directions (left-to-right and right-to-left) of a sentence. This makes BERT highly effective at understanding the semantic meaning of text, even in complex multi-sentence inputs. Available on this link

  • The CLIP (Contrastive Language–Image Pretraining) model, specifically the openai/clip-vit-base-patch32 variant, is utilized in the TriModal Retrieval-Augmented Generation (RAG) Project. CLIP is a powerful model trained on both images and their textual descriptions, allowing it to learn shared representations between visual and textual modalities. This capability is crucial for multi-modal tasks where text and image data need to be compared and fused effectively. Available on this link

  • The Wav2Vec 2.0 Model - (facebook/wav2vec2-base-960h), is a state-of-the-art speech representation learning framework developed by Facebook AI Research. Applying supportive Embedding to advanced models processes raw audio signals to produce rich, context-aware embeddings, having been pre-trained on a vast corpus of speech data. Its ability to seamlessly integrate with text and image modalities enhances the project's overall functionality and versatility in handling diverse data types. Available on this link

:v: Acknowledgements

:star: Future Plans

  • Overall and comprehensive assessment of project performance.
  • Upgrade usually and integrate plenty of new positive models and technologies.
  • Optimal Response capabilities result higher than currently available.
  • Experiment with increasing and expanding larger dataset inputs.

Stay tuned for future releases as we are continuously working on improving the model, expanding the dataset, and adding new features.

Thank you for your interest in my project. We hope you find it useful. If you have any questions, please feel free and don't hesitate to contact me at tranphihung8383@gmail.com

References

  • Chen, Wenhu, et al. "Murag: Multimodal retrieval-augmented generator for open question answering over images and text." arXiv preprint arXiv:2210.02928 (2022). Available on this link.

  • VIDIVELLI, S.; RAMACHANDRAN, Manikandan; DHARUNBALAJI, A. Efficiency-Driven Custom Chatbot Development: Unleashing LangChain, RAG, and Performance-Optimized LLM Fusion. Computers, Materials & Continua, 2024, 80.2. Available on this link.

  • DE STEFANO, Gianluca; PELLEGRINO, Giancarlo; SCHÖNHERR, Lea. Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks. arXiv preprint arXiv:2408.05025, 2024. Available on this link.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trimodal_rag-0.0.1.tar.gz (65.8 kB view details)

Uploaded Source

Built Distribution

trimodal_rag-0.0.1-py3-none-any.whl (94.3 kB view details)

Uploaded Python 3

File details

Details for the file trimodal_rag-0.0.1.tar.gz.

File metadata

  • Download URL: trimodal_rag-0.0.1.tar.gz
  • Upload date:
  • Size: 65.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for trimodal_rag-0.0.1.tar.gz
Algorithm Hash digest
SHA256 8dfb88a9675e96e98e4b43a2d644b065b44e6111e10f67867959048c359f0e93
MD5 c43ff319e9f6e499dd02598392eafa3c
BLAKE2b-256 d11988cb0bed51eb77369d5c4ea15727d4396e23b5f2c4bca101a4591ff1f101

See more details on using hashes here.

File details

Details for the file trimodal_rag-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: trimodal_rag-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 94.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for trimodal_rag-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3c6cc702bf272d2aa5b289d98f143e11327c34fb4daed9986b294490daa3a1ad
MD5 bb8c6f9ddfdf9adc45722c6dac4082cf
BLAKE2b-256 075b29c75156153aeed740b46c1dce1cbfcb0d3c69bc8a91244884a048ec5cfe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page