Multimodal RAG pipeline for low-compute, local, real-world deployment

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

RAG-LLM-API-Pipeline

A fully local GPU poor, multimodal Retrieval-Augmented Generation (RAG) system powered by open-source local LLMs. This pipeline is designed for operational technology environments to provide AI-assisted access to technical knowledge, manuals, and historical data — securely and offline, at min cost.

✅ Key Features

🔍 Retrieval-Augmented Generation (RAG) using FAISS + SentenceTransformers
🧠 Query handling via a local, open-source Large Language Model (LLM)
📄 Supports multiple input formats:
- PDFs
- Plain text files
- Images (OCR via Tesseract)
- Audio files (.wav, .flac, .aiff)
- Videos (.mp4 with audio extraction)
💻 Interfaces:
- Command Line Interface (CLI)
- Local REST API (FastAPI)
🛠️ Asset definition via YAML configuration
🔐 Works in fully local environments after setup

✅ Works locally, GPU/CPU-friendly with configurable precision
✅ CLI, API and simple web UI included

📦 Installation

pip install rag-llm-api-pipeline

🛠️ Setup Instructions (Windows + Anaconda)

1. Create Python Environment

conda create -n rag_env python=3.10
conda activate rag_env

2. Install Dependencies

Via Conda (system-level tools):

conda install -c conda-forge ffmpeg pytesseract pyaudio

Via Pip (Python packages):

pip install -r requirements.txt

Ensure Tesseract is installed and in your system PATH. You can get it from https://github.com/tesseract-ocr/tesseract.

🚀 Usage

CLI Example

python cli/main.py --system Pump_A --question "What is the pressure threshold for operation?"

API Server

Start the server:

uvicorn api.server:app --reload

Query with curl or Postman:

curl -X POST http://localhost:8000/query \
     -H "Content-Type: application/json" \
     -d '{"system": "Pump_A", "question": "Explain the restart procedure"}'

🧱 Configuration

Edit config/system.yaml to define your assets and associated documents:

assets:
  - name: Pump_A
    docs:
      - pump_manual.pdf
      - safety_guide.mp4

models:
  embedding_model: sentence-transformers/all-MiniLM-L6-v2
  llm_model: tiiuae/falcon-7b-instruct

retriever:
  top_k: 5
  index_dir: data/indexes

llm:
  max_new_tokens: 256
  prompt_template: |
    Use the following context to answer the question:
    {context}

    Question: {question}
    Answer:

settings:
  data_dir: data/manuals
  force_rebuild_index: false
  use_cpu: true

Documents can be PDFs, plain text, images, or audio/video files.

🐧 Setup Instructions (Linux)

1. Create Python Environment

python3 -m venv rag_env
source rag_env/bin/activate

Or with conda:

conda create -n rag_env python=3.10
conda activate rag_env

2. Install System Dependencies

sudo apt update
sudo apt install -y ffmpeg tesseract-ocr libpulse-dev portaudio19-dev

Optional: install language packs for OCR (e.g., tesseract-ocr-eng).

3. Install Python Packages

pip install -r requirements.txt

🔁 Running the Application on Linux

CLI

python cli/main.py --system Pump_A --question "What is the restart sequence for this machine?"

API Server

uvicorn api.server:app --host 0.0.0.0 --port 8000

cURL Query

curl -X POST http://localhost:8000/query \
     -H "Content-Type: application/json" \
     -d '{"system": "Pump_A", "question": "What does error E204 indicate?"}'

📚 How it Works

Index Building:
- Files are parsed using loader.py.
- Text chunks are embedded with MiniLM.
- FAISS index stores embeddings for fast similarity search.
Query Execution:
- User provides a natural language question.
- Relevant text chunks are retrieved from the index.
- LLM generates an answer based on retrieved context.

🧠 Model Info

Default LLM: tiiuae/falcon-rw-1b (run locally via transformers)
Embedding model: sentence-transformers/all-MiniLM-L6-v2
All models are open-source and run offline.

You can replace these with any local-compatible Hugging Face model.

🔐 Security & Offline Use

No cloud or external dependencies required after initial setup.
Ideal for OT environments.
All processing is local: embeddings, LLM inference, and data storage.

📜 License

MIT License

📧 Contact

For issues, improvements, or contributions, please open an issue or PR.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.7.1

Aug 12, 2025

0.7.0

Aug 11, 2025

0.6.0

Aug 10, 2025

0.5.0

Aug 8, 2025

0.4.0

Aug 6, 2025

0.3.0

Jul 24, 2025

0.2.1

Jul 24, 2025

This version

0.2.0

Jul 24, 2025

0.1.0

Jun 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_llm_api_pipeline-0.2.0.tar.gz (11.7 kB view details)

Uploaded Jul 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rag_llm_api_pipeline-0.2.0-py3-none-any.whl (10.2 kB view details)

Uploaded Jul 24, 2025 Python 3

File details

Details for the file rag_llm_api_pipeline-0.2.0.tar.gz.

File metadata

Download URL: rag_llm_api_pipeline-0.2.0.tar.gz
Upload date: Jul 24, 2025
Size: 11.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for rag_llm_api_pipeline-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`cc921365574ba6fc4355071be56f955e2dfb57800edd87304796579c501ccab1`
MD5	`d8425e3d0008bd41c33e0f4b3547bf01`
BLAKE2b-256	`1f03e68f7f7186851011109400670f2b35cd3a7331630beedd904c0509fc31e2`

See more details on using hashes here.

File details

Details for the file rag_llm_api_pipeline-0.2.0-py3-none-any.whl.

File metadata

Download URL: rag_llm_api_pipeline-0.2.0-py3-none-any.whl
Upload date: Jul 24, 2025
Size: 10.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for rag_llm_api_pipeline-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`10a1bdb966a28fa876dbfe7751327f0b9f3d79ee4a9cbdc31e2429c546ddc200`
MD5	`0f83fbcf6750dd8a84386ec8852ce81a`
BLAKE2b-256	`66db3f518c4713abe2a77d8d8ca7d5ceb6eaba2d0ca27468d10b3dde6ab1db19`

See more details on using hashes here.

rag-llm-api-pipeline 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RAG-LLM-API-Pipeline

✅ Key Features

📦 Installation

🛠️ Setup Instructions (Windows + Anaconda)

1. Create Python Environment

2. Install Dependencies

Via Conda (system-level tools):

Via Pip (Python packages):

🚀 Usage

CLI Example

API Server

🧱 Configuration

🐧 Setup Instructions (Linux)

1. Create Python Environment

2. Install System Dependencies

3. Install Python Packages

🔁 Running the Application on Linux

CLI

API Server

cURL Query

📚 How it Works

🧠 Model Info

🔐 Security & Offline Use

📜 License

📧 Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes