Multimodal RAG pipeline for low-compute, local, real-world deployment
Project description
RAG-LLM-API-Pipeline
A fully local GPU poor, multimodal Retrieval-Augmented Generation (RAG) system powered by open-source local LLMs. This pipeline is designed for operational technology environments to provide AI-assisted access to technical knowledge, manuals, and historical data — securely and offline, at min cost.
✅ Key Features
- 🔍 Retrieval-Augmented Generation (RAG) using FAISS + SentenceTransformers
- 🧠 Query handling via a local, open-source Large Language Model (LLM)
- 📄 Supports multiple input formats:
- PDFs
- Plain text files
- Images (OCR via Tesseract)
- Audio files (
.wav,.flac,.aiff) - Videos (
.mp4with audio extraction)
- 💻 Interfaces:
- Command Line Interface (CLI)
- Local REST API (FastAPI)
- 🛠️ Asset definition via YAML configuration
- 🔐 Works in fully local environments after setup
✅ Works locally, GPU/CPU-friendly with configurable precision
✅ CLI, API and simple web UI included
📦 Installation
pip install rag-llm-api-pipeline
🛠️ Setup Instructions (Windows + Anaconda)
1. Create Python Environment
conda create -n rag_env python=3.10
conda activate rag_env
2. Install Dependencies
Via Conda (system-level tools):
conda install -c conda-forge ffmpeg pytesseract pyaudio
Via Pip (Python packages):
pip install -r requirements.txt
Ensure Tesseract is installed and in your system PATH. You can get it from https://github.com/tesseract-ocr/tesseract.
🚀 Usage
CLI Example
python cli/main.py --system Pump_A --question "What is the pressure threshold for operation?"
API Server
Start the server:
uvicorn api.server:app --reload
Query with curl or Postman:
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"system": "Pump_A", "question": "Explain the restart procedure"}'
🧱 Configuration
Edit config/system.yaml to define your assets and associated documents:
assets:
- name: Pump_A
docs:
- pump_manual.pdf
- safety_guide.mp4
models:
embedding_model: sentence-transformers/all-MiniLM-L6-v2
llm_model: tiiuae/falcon-7b-instruct
retriever:
top_k: 5
index_dir: data/indexes
llm:
max_new_tokens: 256
prompt_template: |
Use the following context to answer the question:
{context}
Question: {question}
Answer:
settings:
data_dir: data/manuals
force_rebuild_index: false
use_cpu: true
Documents can be PDFs, plain text, images, or audio/video files.
🐧 Setup Instructions (Linux)
1. Create Python Environment
python3 -m venv rag_env
source rag_env/bin/activate
Or with conda:
conda create -n rag_env python=3.10
conda activate rag_env
2. Install System Dependencies
sudo apt update
sudo apt install -y ffmpeg tesseract-ocr libpulse-dev portaudio19-dev
Optional: install language packs for OCR (e.g.,
tesseract-ocr-eng).
3. Install Python Packages
pip install -r requirements.txt
🔁 Running the Application on Linux
CLI
python cli/main.py --system Pump_A --question "What is the restart sequence for this machine?"
API Server
uvicorn api.server:app --host 0.0.0.0 --port 8000
cURL Query
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"system": "Pump_A", "question": "What does error E204 indicate?"}'
📚 How it Works
-
Index Building:
- Files are parsed using
loader.py. - Text chunks are embedded with MiniLM.
- FAISS index stores embeddings for fast similarity search.
- Files are parsed using
-
Query Execution:
- User provides a natural language question.
- Relevant text chunks are retrieved from the index.
- LLM generates an answer based on retrieved context.
🧠 Model Info
- Default LLM:
tiiuae/falcon-rw-1b(run locally viatransformers) - Embedding model:
sentence-transformers/all-MiniLM-L6-v2 - All models are open-source and run offline.
You can replace these with any local-compatible Hugging Face model.
🔐 Security & Offline Use
- No cloud or external dependencies required after initial setup.
- Ideal for OT environments.
- All processing is local: embeddings, LLM inference, and data storage.
📜 License
MIT License
📧 Contact
For issues, improvements, or contributions, please open an issue or PR.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rag_llm_api_pipeline-0.2.1.tar.gz.
File metadata
- Download URL: rag_llm_api_pipeline-0.2.1.tar.gz
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e06e40161c9476bd7ae8bca371e2c625aa08075de5ae20d0c82724b1f82e06a
|
|
| MD5 |
8c4109b0766fe3ea426366c3e8b3cc4c
|
|
| BLAKE2b-256 |
44fb1f883e51a20679dd1fd9a9b8d4149031220e0ec0f27894b7e14a7ceec808
|
File details
Details for the file rag_llm_api_pipeline-0.2.1-py3-none-any.whl.
File metadata
- Download URL: rag_llm_api_pipeline-0.2.1-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08dc6115206b1ee42169494e76c01848f1de86ab0fcc922aec4afed419f65a07
|
|
| MD5 |
4075802ecf93dc441f7f2df7f071ecb4
|
|
| BLAKE2b-256 |
984f71eebb3af8bf3803c000f5edc82e5e6e94754625c85725f3e424e647b5f4
|