Skip to main content

Multimodal RAG pipeline for low-compute, local, real-world deployment

Project description

RAG-LLM-API-Pipeline

A fully local GPU poor, multimodal Retrieval-Augmented Generation (RAG) system powered by open-source local LLMs. This pipeline is designed for operational technology environments to provide AI-assisted access to technical knowledge, manuals, and historical data โ€” securely and offline, at min cost.


โœ… Key Features

  • ๐Ÿ” Retrieval-Augmented Generation (RAG) using FAISS + SentenceTransformers
  • ๐Ÿง  Query handling via a local, open-source Large Language Model (LLM)
  • ๐Ÿ“„ Supports multiple input formats:
    • PDFs
    • Plain text files
    • Images (OCR via Tesseract)
    • Audio files (.wav, .flac, .aiff)
    • Videos (.mp4 with audio extraction)
  • ๐Ÿ’ป Interfaces:
    • Command Line Interface (CLI)
    • Local REST API (FastAPI)
  • ๐Ÿ› ๏ธ Asset definition via YAML configuration
  • ๐Ÿ” Works in fully local environments after setup

๐Ÿ“‚ Project Structure

rag_llm_api_pipeline/
โ”œโ”€โ”€ api/                # FastAPI application
โ”‚   โ””โ”€โ”€ server.py
โ”œโ”€โ”€ cli/                # Command-line interface
โ”‚   โ””โ”€โ”€ main.py
โ”œโ”€โ”€ config/
โ”‚   โ””โ”€โ”€ system.yaml     # Asset and document config
โ”œโ”€โ”€ data/
โ”‚   โ””โ”€โ”€ manuals/        # PDF, image, audio, etc.
โ”œโ”€โ”€ loader.py           # Multimodal file loader
โ”œโ”€โ”€ retriever.py        # Embedding, FAISS search
โ”œโ”€โ”€ llm_wrapper.py      # Local LLM text generation
โ”œโ”€โ”€ requirements.txt    # Python dependencies
โ”œโ”€โ”€ README.md

๐Ÿ› ๏ธ Setup Instructions (Windows + Anaconda)

1. Create Python Environment

conda create -n rag_env python=3.10
conda activate rag_env

2. Install Dependencies

Via Conda (system-level tools):

conda install -c conda-forge ffmpeg pytesseract pyaudio

Via Pip (Python packages):

pip install -r requirements.txt

Ensure Tesseract is installed and in your system PATH. You can get it from https://github.com/tesseract-ocr/tesseract.


๐Ÿš€ Usage

CLI Example

python cli/main.py --system Pump_A --question "What is the pressure threshold for operation?"

API Server

Start the server:

uvicorn api.server:app --reload

Query with curl or Postman:

curl -X POST http://localhost:8000/query \
     -H "Content-Type: application/json" \
     -d '{"system": "Pump_A", "question": "Explain the restart procedure"}'

๐Ÿงฑ Configuration

Edit config/system.yaml to define your assets and associated documents:

assets:
  - name: Pump_A
    docs:
      - manuals/pump_manual.pdf
      - manuals/startup_guide.png
      - manuals/technician_note.wav

Documents can be PDFs, plain text, images, or audio/video files.


๐Ÿง Setup Instructions (Linux)

1. Create Python Environment

python3 -m venv rag_env
source rag_env/bin/activate

Or with conda:

conda create -n rag_env python=3.10
conda activate rag_env

2. Install System Dependencies

sudo apt update
sudo apt install -y ffmpeg tesseract-ocr libpulse-dev portaudio19-dev

Optional: install language packs for OCR (e.g., tesseract-ocr-eng).

3. Install Python Packages

pip install -r requirements.txt

๐Ÿ” Running the Application on Linux

CLI

python cli/main.py --system Pump_A --question "What is the restart sequence for this machine?"

API Server

uvicorn api.server:app --host 0.0.0.0 --port 8000

cURL Query

curl -X POST http://localhost:8000/query \
     -H "Content-Type: application/json" \
     -d '{"system": "Pump_A", "question": "What does error E204 indicate?"}'

๐Ÿ“š How it Works

  1. Index Building:

    • Files are parsed using loader.py.
    • Text chunks are embedded with MiniLM.
    • FAISS index stores embeddings for fast similarity search.
  2. Query Execution:

    • User provides a natural language question.
    • Relevant text chunks are retrieved from the index.
    • LLM generates an answer based on retrieved context.

๐Ÿง  Model Info

  • Default LLM: tiiuae/falcon-rw-1b (run locally via transformers)
  • Embedding model: sentence-transformers/all-MiniLM-L6-v2
  • All models are open-source and run offline.

You can replace these with any local-compatible Hugging Face model.


๐Ÿ” Security & Offline Use

  • No cloud or external dependencies required after initial setup.
  • Ideal for OT environments.
  • All processing is local: embeddings, LLM inference, and data storage.

๐Ÿ“œ License

MIT License


๐Ÿ“ง Contact

For issues, improvements, or contributions, please open an issue or PR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_llm_api_pipeline-0.1.0.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rag_llm_api_pipeline-0.1.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file rag_llm_api_pipeline-0.1.0.tar.gz.

File metadata

  • Download URL: rag_llm_api_pipeline-0.1.0.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for rag_llm_api_pipeline-0.1.0.tar.gz
Algorithm Hash digest
SHA256 01fc590072f817fcdae9852362ce9422b93d6fceaa7a319a249f4002de5df620
MD5 92d27677c7de038337b6e571a97e330f
BLAKE2b-256 76a006ccbe16c78c980b2a8fc6379990804846c804c03112b6b5a92cae80cc97

See more details on using hashes here.

File details

Details for the file rag_llm_api_pipeline-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for rag_llm_api_pipeline-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a84baf98d996229875925cda09ee6bb48e0bcb46a6e85cfda555dc12479e5545
MD5 5433d66fbc2cbd3148058bb79a79ae79
BLAKE2b-256 b1e9fd1c56c91a458316686569c82cb35a63e595c8b7fd9a4b16e64a05bc6ba3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page