Skip to main content

Universal LLM Training & RAG Agent for HuggingFace

Project description


license: apache-2.0

KerdosAI โ€” Universal LLM Training & RAG Agent

PyPI version Python License HuggingFace Space GitHub

Enterprise-grade LLM Training + Retrieval-Augmented Generation (RAG) toolkit.
Fine-tune any HuggingFace model on your data, then deploy it with a full document Q&A chat interface โ€” locally or on cloud.


What's New in v0.2.0

  • ๐Ÿ†• kerdosai.rag submodule โ€” full RAG pipeline (document loading โ†’ FAISS indexing โ†’ LLM answering)
  • ๐Ÿ†• KnowledgeBase โ€” high-level API to index PDF/DOCX/TXT/MD/CSV files
  • ๐Ÿ†• RAGAgent โ€” streaming and blocking chat with conversation history
  • ๐Ÿ†• deployment_type="gradio-rag" โ€” launch the HuggingFace Space UI locally with one line
  • ๐Ÿ†• kerdosai rag-chat CLI command โ€” on-premise RAG UI in one command
  • Updated dependencies: faiss-cpu, sentence-transformers, PyMuPDF, python-docx, gradio, huggingface-hub, tenacity

Installation

# Standard install
pip install kerdosai

# With all optional extras
pip install "kerdosai[all]"

Requirements: Python 3.8+ ยท PyTorch 2.0+ ยท CUDA-compatible GPU (recommended for training)


Quick Start

1. RAG โ€” Document Q&A (no GPU needed)

from kerdosai.rag import KnowledgeBase, RAGAgent

# Index your documents (PDF, DOCX, TXT, MD, CSV)
kb = KnowledgeBase()
kb.index_documents(["handbook.pdf", "faq.docx", "policy.txt"])
print(f"Indexed {kb.chunk_count} chunks from {kb.indexed_sources}")

# Chat with your documents (HF Inference API โ€” no GPU required)
agent = RAGAgent(hf_token="hf_...", knowledge_base=kb)

# Blocking answer
print(agent.chat("What is the leave policy?"))

# Streaming answer
for partial in agent.chat_stream("Summarise the refund section."):
    print(partial, end="\r")

2. Launch the Enterprise Chat UI Locally

from kerdosai.deployer import Deployer
from kerdosai.rag import KnowledgeBase

kb = KnowledgeBase()
kb.index_documents(["report.pdf"])

deployer = Deployer(model=None, tokenizer=None)
deployer.deploy(
    deployment_type="gradio-rag",
    host="0.0.0.0",
    port=7860,
    hf_token="hf_...",       # or set HF_TOKEN env var
    knowledge_base=kb,
)
# โ†’ Open http://localhost:7860

3. CLI โ€” One-Command RAG Chat Server

# Pre-index files and open the Gradio UI
kerdosai rag-chat \
  --data handbook.pdf faq.docx policy.txt \
  --hf-token hf_... \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --port 7860

# Use HF_TOKEN env var instead of --hf-token
export HF_TOKEN=hf_...
kerdosai rag-chat --data ./company_docs/ --port 7860

4. Fine-Tune a Model (Training Pipeline)

from kerdosai import KerdosAgent

agent = KerdosAgent(
    base_model="meta-llama/Llama-3.1-8B",
    training_data="data/training.csv",
)

metrics = agent.train(epochs=3, batch_size=4, learning_rate=2e-5)
agent.save("./my-finetuned-model")
print(metrics)

5. Deploy Fine-Tuned Model as REST API

from kerdosai import KerdosAgent

agent = KerdosAgent.load("./my-finetuned-model")
agent.deploy(deployment_type="rest", host="0.0.0.0", port=8000)
# POST /generate โ†’ {"text": "...", "max_length": 200}

RAG Module Reference

KnowledgeBase

from kerdosai.rag import KnowledgeBase

kb = KnowledgeBase(
    embedding_model="BAAI/bge-small-en-v1.5",  # SentenceTransformer model
    chunk_size=512,                              # Max chars per chunk
    chunk_overlap=64,                            # Overlap between chunks
    min_score=0.30,                              # Min cosine similarity threshold
)

kb.index_documents(["doc1.pdf", "doc2.docx"])   # Add documents (duplicate-safe)
kb.index_documents(["doc3.txt"])                 # Incrementally add more

results = kb.search("What is the refund policy?", top_k=5)
# โ†’ [{"source": "doc1.pdf", "text": "...", "score": 0.87}, ...]

print(kb.chunk_count)       # Total indexed chunks
print(kb.indexed_sources)   # Set of indexed filenames
kb.clear()                  # Reset index

RAGAgent

from kerdosai.rag import RAGAgent

agent = RAGAgent(
    hf_token="hf_...",                              # HF API token
    model="meta-llama/Llama-3.1-8B-Instruct",       # LLM model ID
    top_k=5,                                         # Chunks retrieved per query
    embedding_model="BAAI/bge-small-en-v1.5",        # Embedding model
)

agent.index(["report.pdf", "handbook.docx"])         # Index documents

# Blocking
reply = agent.chat("What are the payment terms?")

# Streaming
for partial in agent.chat_stream("Summarise section 3."):
    print(partial, end="\r")

agent.reset_history()                                # Clear conversation

Low-Level RAG API

from kerdosai.rag import (
    load_documents,    # Parse files โ†’ [{"source", "text"}]
    build_index,       # List[dict] โ†’ VectorIndex
    add_to_index,      # Incrementally extend a VectorIndex
    retrieve,          # VectorIndex + query โ†’ top-K chunks
    answer_stream,     # Chunks + HF token โ†’ streaming generator
    answer,            # Chunks + HF token โ†’ full string
    VectorIndex,       # Dataclass: chunks + faiss index + embedder
)

docs     = load_documents(["report.pdf", "policy.docx"])
index    = build_index(docs, embedding_model="BAAI/bge-small-en-v1.5")
chunks   = retrieve("What is the refund policy?", index, top_k=5)
response = answer("What is the refund policy?", chunks, hf_token="hf_...")

Supported file types: .pdf (PyMuPDF) ยท .docx (python-docx, incl. tables) ยท .txt ยท .md ยท .csv


Training API Reference

KerdosAgent

Method Description
KerdosAgent(base_model, training_data, device=None) Initialize with a HuggingFace model ID and path to training data
.train(epochs, batch_size, learning_rate, **kwargs) Fine-tune the model; returns metrics dict
.deploy(deployment_type, host, port, **kwargs) Deploy as REST / Docker / Kubernetes / Gradio-RAG
.save(output_dir) Save model + tokenizer to disk
.load(model_dir) Class method โ€” load a saved model

Trainer

Method Description
Trainer(model, tokenizer, device, use_wandb=True) Initialize trainer
.train(dataset, epochs, batch_size, learning_rate, ...) Run HuggingFace training loop
.evaluate(dataset, batch_size) Evaluate and return eval_loss

DataProcessor

Method Description
DataProcessor(data_path, max_length=512, text_column="text") Initialize
.prepare_dataset(tokenizer=None) Load, clean, tokenize โ†’ HuggingFace Dataset
.validate_data() Check data quality and print warnings

Supported training data formats: .csv (with text column) ยท .json (list of objects with text key)


CLI Reference

kerdosai <command> [options]

Commands:
  train        Fine-tune a model on custom data
  deploy       Deploy a trained model
  rag-chat     Launch a RAG document Q&A UI (no local GPU needed)

kerdosai train

kerdosai train \
  --model meta-llama/Llama-3.1-8B \
  --data ./data/training.csv \
  --output ./my-model \
  --epochs 3 \
  --batch-size 4 \
  --learning-rate 2e-5

kerdosai deploy

# REST API
kerdosai deploy --model-dir ./my-model --type rest --port 8000

# Gradio RAG UI (with a fine-tuned local model)
kerdosai deploy --model-dir ./my-model --type gradio-rag --hf-token hf_... --port 7860

# Docker container
kerdosai deploy --model-dir ./my-model --type docker

kerdosai rag-chat

kerdosai rag-chat \
  --data company.pdf handbook.docx faq.txt \
  --hf-token hf_... \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --embedding-model BAAI/bge-small-en-v1.5 \
  --host 0.0.0.0 \
  --port 7860

Deployment Options

Type Command Description
rest deploy(deployment_type="rest") FastAPI REST API on POST /generate
gradio-rag deploy(deployment_type="gradio-rag") Full Kerdos RAG Chat UI (Gradio)
docker deploy(deployment_type="docker") Build & run Docker container
kubernetes deploy(deployment_type="kubernetes") Generate K8s YAML manifests

Architecture

kerdosai/
โ”œโ”€โ”€ __init__.py          # KerdosAgent, Trainer, Deployer, DataProcessor, KnowledgeBase, RAGAgent
โ”œโ”€โ”€ agent.py             # KerdosAgent โ€” orchestrates training + deployment
โ”œโ”€โ”€ trainer.py           # Trainer โ€” HuggingFace training loop + W&B logging
โ”œโ”€โ”€ deployer.py          # Deployer โ€” REST / Gradio-RAG / Docker / Kubernetes
โ”œโ”€โ”€ data_processor.py    # DataProcessor โ€” CSV/JSON loading, tokenization
โ”œโ”€โ”€ cli.py               # CLI โ€” train / deploy / rag-chat commands
โ””โ”€โ”€ rag/
    โ”œโ”€โ”€ __init__.py          # Public RAG API surface
    โ”œโ”€โ”€ document_loader.py   # PDF / DOCX / TXT / MD / CSV parser
    โ”œโ”€โ”€ embedder.py          # FAISS + SentenceTransformer index builder
    โ”œโ”€โ”€ retriever.py         # Top-K cosine-similarity retrieval
    โ”œโ”€โ”€ chain.py             # HF Inference API streaming + blocking answer
    โ”œโ”€โ”€ knowledge_base.py    # KnowledgeBase high-level class
    โ””โ”€โ”€ rag_agent.py         # RAGAgent โ€” chat + history management

HuggingFace Space Demo

Try the live RAG demo at ๐Ÿ‘‰ huggingface.co/spaces/kerdosdotio/Custom-LLM-Chat

Upload any PDF, DOCX, or TXT file and ask questions. The AI answers only from your documents โ€” never from outside knowledge.

The full kerdosai package lets you run this same UI privately on your own servers with kerdosai rag-chat.


Environment Variables

Variable Description Default
HF_TOKEN HuggingFace API token for inference โ€”
LLM_MODEL LLM model ID for generation meta-llama/Llama-3.1-8B-Instruct

Real-World Applications

Healthcare

  • Clinical documentation automation
  • Patient Q&A from medical records
  • HIPAA-compliant private deployment

Financial Services

  • Policy & compliance Q&A
  • Risk report summarisation
  • Private on-premise data security

Legal

  • Contract review and clause extraction
  • Case research from uploaded precedents
  • Confidential document handling

Enterprise Internal Tools

  • HR handbook chatbot
  • IT knowledge base
  • Onboarding document Q&A

Requirements

Python >= 3.8
torch >= 2.0.0
transformers >= 4.30.0
faiss-cpu >= 1.7.4
sentence-transformers >= 2.2.2
PyMuPDF >= 1.22.5
python-docx >= 0.8.11
gradio >= 4.0.0
huggingface-hub >= 0.28.0
tenacity >= 8.2.0

Full list in requirements.txt.


Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.


License

MIT License โ€” see LICENSE for details.


About Kerdos Infrasoft

Kerdos Infrasoft Private Limited ยท CIN: U62099KA2023PTC182869
๐ŸŒ kerdos.in ยท ๐Ÿ“ฌ partnership@kerdos.in

We are actively seeking investment & partnerships to build the fully customisable enterprise edition โ€” including private LLM hosting, custom model fine-tuning, data privacy guarantees, and white-label deployments.


Citation

@software{kerdosai2024,
  title  = {KerdosAI: Universal LLM Training & RAG Agent},
  author = {Kerdos Infrasoft Private Limited},
  year   = {2024},
  url    = {https://github.com/bhaskarvilles/kerdosai},
  note   = {v0.2.1}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kerdosai-0.2.1.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kerdosai-0.2.1-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file kerdosai-0.2.1.tar.gz.

File metadata

  • Download URL: kerdosai-0.2.1.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for kerdosai-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9b03c54d06bbceaf1dd852f6d7b34aada6db382915c6af23ab53138e09e2b837
MD5 0a2e184056374743128d527a155307f7
BLAKE2b-256 8da2adee576bf814e349a78fcf7d36c28e011b1c3b5bc119d79063ecd5ee0d82

See more details on using hashes here.

File details

Details for the file kerdosai-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: kerdosai-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for kerdosai-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f83e45ed49862743aa5b49a6256d1227e2bbc22cc6a566cf60b95fc5f4d118f2
MD5 120a00e7f460562e59304a97def52dcc
BLAKE2b-256 a8be30a45fdbdce27b961dcead41f95c6ffaded9a90bca1bbe36b6e492b12a2f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page