Skip to main content

Unified LLM framework — 8 providers, reasoning models (thinking_budget), VLM-OCR, GraphRAG, ORPO fine-tuning, 80% test coverage

Project description

beanllm

Unified LLM framework supporting reasoning models, VLM-OCR, GraphRAG, and agentic workflows — 8 providers, 80% test coverage

PyPI version Python 3.10+ License: MIT Tests Coverage 80%


Why beanllm?

LangChain LlamaIndex beanllm
Architecture Flat chain Index-centric Clean Architecture (Facade → Handler → Service → Domain)
Reasoning Models Manual config Manual config thinking_budget native support
VLM-OCR Not supported Not supported 11 engines + Qwen3-VL / GLM-OCR / DeepSeek-VL2
GraphRAG Plugin Plugin KnowledgeGraph Facade built-in
Test Coverage 80% (6,340 tests)
ORPO Fine-tuning Not supported Not supported Native support (50% less memory than DPO)
Providers OpenAI-heavy OpenAI-heavy 8 providers including Grok/xAI

Features Overview

Module Highlights
LLM Providers 8 providers (OpenAI, Claude, Gemini, Grok, DeepSeek, Perplexity, Ollama) with smart parameter adaptation
Reasoning Models thinking_budget for Claude/OpenAI o-series, <thinking> token filtering
RAG Pipeline Document loaders, vector stores, hybrid search, rerankers, HyDE, MultiQuery
Embeddings 11 providers, Matryoshka, Code embeddings, CLIP/SigLIP
Retrieval ColBERT, ColPali, 5 rerankers, semantic chunking, Agentic Retrieval
Evaluation RAGAS, DeepEval, TruLens, Human-in-the-loop
Vision SAM3, YOLOv12, Florence-2, Qwen3-VL
Audio 8 STT engines (Whisper, SenseVoice, Granite)
OCR 11 engines (PaddleOCR, Qwen3-VL, GLM-OCR, DeepSeek-VL2)
Fine-tuning LoRA/QLoRA, DPO, ORPO (2026 standard), KTO
Optimizer Parameter search, benchmarking, A/B testing
Multi-Agent Sequential, parallel, hierarchical, debate patterns
Orchestrator 10 node types, DAG workflow graph, visual builder
Knowledge Graph Multi NER engines, relation extraction, GraphRAG (Gartner Critical Enabler 2026), Neo4j
MCP Server Model Context Protocol server for tool integration

Key Capabilities

  • Unified Interface - Single API for 8 LLM providers including Grok/xAI
  • Reasoning-First - Native thinking_budget for step-by-step reasoning
  • VLM-OCR Paradigm - Document understanding beyond character recognition
  • GraphRAG Built-in - Relationship-aware retrieval with 99% accuracy
  • Smart Parameter Adaptation - Auto-convert between providers
  • Advanced PDF Processing - 3-layer architecture (Fast/Accurate/ML)
  • 8 Vector Stores - Chroma, FAISS, Pinecone, Qdrant, Weaviate, Milvus, LanceDB, pgvector
  • Graph Workflows - LangGraph-style DAG execution
  • Production Ready - Retry, circuit breaker, rate limiting, tracing
  • Interactive TUI - OpenCode-style terminal UI with autocomplete

Quick Start

Installation

# Basic
pip install beanllm

# With specific providers
pip install beanllm[openai,anthropic,gemini]

# Full installation (all providers + CLI + MCP)
pip install beanllm[all]

# Development
pip install -e ".[dev,all]"

Environment Setup

# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
XAI_API_KEY=xai-...          # Grok/xAI
OLLAMA_HOST=http://localhost:11434

Basic Chat

import asyncio
from beanllm import Client

async def main():
    client = Client(model="gpt-4o")
    response = await client.chat(
        messages=[{"role": "user", "content": "Explain quantum computing"}]
    )
    print(response.content)

    # Streaming
    async for chunk in client.stream_chat(
        messages=[{"role": "user", "content": "Tell me a story"}]
    ):
        print(chunk, end="", flush=True)

asyncio.run(main())

RAG in One Line

from beanllm import RAGChain

async def main():
    rag = RAGChain.from_documents("docs/")
    result = await rag.query("What is this about?", include_sources=True)
    print(result.answer)

asyncio.run(main())

Tools & Agents

from beanllm import Agent, Tool

@Tool.from_function
def calculator(expression: str) -> str:
    """Evaluate a math expression"""
    return str(eval(expression))

agent = Agent(model="gpt-4o-mini", tools=[calculator])
result = await agent.run("What is 25 * 17?")

Graph Workflows

from beanllm import StateGraph

graph = StateGraph()
graph.add_node("analyze", analyze_fn)
graph.add_node("improve", improve_fn)
graph.add_conditional_edges("analyze", decide, {"good": "END", "bad": "improve"})
graph.set_entry_point("analyze")

result = await graph.invoke({"input": "Draft proposal"})

Reasoning Models

As of June 2026, reasoning models (GPT-5.5, Claude Opus 4.8, Grok 4.3) have become the standard for complex problem-solving. beanllm supports native thinking_budget to control the depth of chain-of-thought reasoning.

import asyncio
from beanllm import Client

async def main():
    # Claude: thinking_budget controls tokens allocated for <thinking>
    client = Client(model="claude-opus-4-8", thinking_budget=8000)
    response = await client.chat(
        messages=[{"role": "user", "content": "Prove P ≠ NP or explain the best current approaches"}],
        stream_thinking=False,   # filter out <thinking> tokens from output
    )
    print(response.content)

    # OpenAI o-series: maps to reasoning_effort
    o3_client = Client(model="o3", thinking_budget=16000)
    response = await o3_client.chat(
        messages=[{"role": "user", "content": "Design a distributed consensus algorithm"}]
    )
    print(response.content)

    # Grok 4.3: xAI's reasoning model
    grok_client = Client(model="grok-4.3", thinking_budget=4000)
    response = await grok_client.chat(
        messages=[{"role": "user", "content": "Analyze market trends"}]
    )
    print(response.content)

asyncio.run(main())
Model Provider Thinking Budget Best For
claude-opus-4-8 Anthropic Up to 32K tokens Math, coding, analysis
gpt-5.5 OpenAI Auto-scaled General reasoning
o3 OpenAI Up to 32K tokens Competition math, science
grok-4.3 xAI Up to 8K tokens Real-time + reasoning
gemini-3.0-pro Google Up to 16K tokens Multimodal reasoning

GraphRAG — Gartner Critical Enabler 2026

GraphRAG was designated a Critical Enabler by Gartner in 2026. Unlike vector similarity search, GraphRAG traverses entity relationships and achieves up to 99% retrieval accuracy on multi-hop questions.

import asyncio
from beanllm import KnowledgeGraph

async def main():
    kg = KnowledgeGraph()

    # Build graph from documents (entities + relationships extracted automatically)
    await kg.build_graph(documents=docs, graph_id="tech_companies")

    # Multi-hop relationship queries that vector search cannot answer
    result = await kg.graph_rag(
        query="Who founded Apple and what companies did they later invest in?",
        graph_id="tech_companies",
        max_hops=3,
    )
    print(result.answer)
    print(result.reasoning_path)  # e.g., Jobs → Pixar → Disney

asyncio.run(main())

GraphRAG vs Standard RAG

Dimension Standard RAG (Vector) GraphRAG
Retrieval method Cosine similarity Graph traversal
Multi-hop questions Poor Excellent
Relationship reasoning None Native
Accuracy (multi-hop Q&A) ~60-70% ~99%
Best use case Semantic search Entity & relationship queries

VLM-Based OCR

Paradigm shift in 2026: Traditional OCR recognizes characters. VLM-based OCR understands documents — layout, tables, formulas, and context — making it the standard for production document processing.

Traditional OCR:  Character Recognition  →  Text string
VLM-based OCR:    Document Understanding →  Structured knowledge
                  ┌─────────────────────────────────────────┐
                  │  Layout  │  Tables  │  Formulas  │  Context │
                  └─────────────────────────────────────────┘
from beanllm.domain.ocr import beanOCR
from beanllm.domain.ocr.models import OCRConfig

# Traditional PaddleOCR — fast, character-level
ocr_fast = beanOCR(OCRConfig(engine="paddleocr", language="en"))

# VLM-based — document understanding (2026 standard)
ocr_vlm = beanOCR(OCRConfig(engine="qwen3vl", language="auto"))

result = ocr_vlm.recognize("invoice.pdf")
print(result.text)          # Full extracted text
print(result.tables)        # Structured table data
print(result.confidence)    # Per-region confidence

OCR Engine Comparison (June 2026)

Engine Type Accuracy Speed Use Case
paddleocr Traditional 95% ⚡⚡⚡ Fast text extraction
easyocr Traditional 92% ⚡⚡ 80+ languages
qwen3vl VLM 98% Document understanding
glm-ocr VLM 97% Complex layouts
deepseek-vl2 VLM 97% Formulas & tables
tesseract Traditional 88% ⚡⚡⚡ Open source, offline

Fine-tuning

ORPO — 2026 Standard (replaces DPO)

ORPO (Odds Ratio Preference Optimization) eliminates the reference model, cutting GPU memory by 50% compared to DPO while achieving equal or better alignment quality.

from beanllm import FineTuningFacade

facade = FineTuningFacade()

# ORPO — no reference model required (50% less memory than DPO)
result = await facade.train(
    base_model="meta-llama/Llama-3.1-8B",
    dataset_path="data/preference_pairs.jsonl",
    training_method="orpo",          # "dpo" | "orpo" | "kto" | "lora"
    output_dir="./orpo-llama-8b",
    num_epochs=3,
    learning_rate=8e-6,
)

# DPO — reference model required
result = await facade.train(
    base_model="meta-llama/Llama-3.1-8B",
    dataset_path="data/preference_pairs.jsonl",
    training_method="dpo",
    output_dir="./dpo-llama-8b",
)

Fine-tuning Method Comparison

Method Reference Model GPU Memory Alignment Quality Notes
SFT No Low Baseline Simple supervised
LoRA No Low Moderate Parameter-efficient
DPO Yes High Good 2023-2025 standard
ORPO No Medium Equal to DPO 2026 standard
KTO No Medium Good Binary feedback

Installation Extras

beanllm uses optional extras to keep the base installation lightweight.

Extra Description Install
openai OpenAI provider pip install beanllm[openai]
anthropic Anthropic Claude provider pip install beanllm[anthropic]
gemini Google Gemini provider pip install beanllm[gemini]
grok Grok/xAI provider pip install beanllm[grok]
ollama Ollama local models pip install beanllm[ollama]
audio Whisper STT pip install beanllm[audio]
ml ML-based PDF (marker-pdf, torch) pip install beanllm[ml]
cli CLI (typer) pip install beanllm[cli]
mcp MCP server (fastmcp) pip install beanllm[mcp]
all All providers + CLI + MCP pip install beanllm[all]
vector ChromaDB vector store pip install beanllm[vector]
semantic Semantic chunking (sentence-transformers) pip install beanllm[semantic]
colbert ColBERT multi-vector search pip install beanllm[colbert]
colpali ColPali vision document search pip install beanllm[colpali]
ragpro Enterprise RAG (semantic + colbert + db) pip install beanllm[ragpro]
distributed Redis + Kafka pip install beanllm[distributed]
monitoring Streamlit dashboard + Plotly pip install beanllm[monitoring]
advanced UMAP, HDBSCAN, NetworkX, Bayesian opt pip install beanllm[advanced]
neo4j Neo4j graph database pip install beanllm[neo4j]
db PostgreSQL + MongoDB drivers pip install beanllm[db]
web FastAPI playground backend pip install beanllm[web]
dev Development tools (pytest, ruff, mypy, bandit) pip install beanllm[dev]

Docker

The project includes Docker Compose with profile-based service management.

# Infrastructure only (MongoDB, Redis, Kafka, Ollama)
docker compose up -d

# Full stack (+ FastAPI backend + Next.js frontend)
docker compose --profile app up -d

# Full stack + admin UIs (Kafka UI, Mongo Express, Redis Commander)
docker compose --profile app --profile ui up -d

# With Neo4j knowledge graph
docker compose --profile neo4j up -d

# With monitoring dashboard
docker compose --profile monitoring up -d

# Stop and remove volumes
docker compose down -v

Services

Service Port Profile
MongoDB 27017 default
Redis 6379 default
Kafka 9092 default
Ollama 11434 default
Backend (FastAPI) 8000 app
Frontend (Next.js) 3000 app
Neo4j 7474 / 7687 neo4j
Kafka UI 8080 ui
Mongo Express 8081 ui
Redis Commander 8082 ui

CLI

# Interactive TUI (OpenCode-style, no arguments)
beanllm

# Model management
beanllm list              # List available models
beanllm show gpt-4o       # Show model details
beanllm providers          # Check provider status
beanllm summary            # Quick summary statistics
beanllm export             # Export models as JSON

# Advanced
beanllm scan               # Scan APIs for new models
beanllm analyze gpt-4o     # Model analysis with pattern inference

# Admin (Google Workspace)
beanllm admin stats        # Google service statistics
beanllm admin analyze      # Usage analysis with Gemini
beanllm admin optimize     # Cost optimization suggestions
beanllm admin security     # Security event audit
beanllm admin dashboard    # Launch Streamlit dashboard

Playground

Full-stack Chat UI built with FastAPI (backend) and Next.js 15 + React 19 (frontend).

Backend (playground/backend/)

  • 17 API routers: chat, agent, multi-agent, RAG, chain, knowledge graph, vision, audio, evaluation, fine-tuning, optimizer, OCR, web search, monitoring, config, models, history
  • Agentic Chat: automatic intent classification with tool routing
  • Session-based RAG: per-session document upload and retrieval
  • Redis caching and MongoDB persistence
  • WebSocket real-time communication with heartbeat
  • SSE streaming with proper [DONE] termination
  • Connection pooling: httpx, MongoDB, Redis

Frontend (playground/frontend/)

  • Next.js 15 with React 19 and Tailwind CSS
  • Pages: Chat, Monitoring Dashboard, Settings
  • Features: streaming responses, session management, API key modal, Google OAuth, model selector

Setup

See the detailed guides in playground/backend/:

  • LOCAL_SETUP.md - Local development setup
  • START_GUIDE.md - Getting started guide
  • TROUBLESHOOTING.md - Common issues and solutions

Model Support

LLM Providers

Provider Models Notes
OpenAI GPT-5, GPT-5.5, GPT-4o, o3, o4-mini Best general-purpose
Anthropic Claude Opus 4.8, Claude Sonnet 4.6, Claude Haiku 4.5 Best reasoning
Google Gemini 3.0 Pro, Gemini 3.0 Flash Best multimodal
Grok/xAI Grok 4.3, Grok 4.3 Vision Real-time + reasoning
DeepSeek DeepSeek-V3, DeepSeek-R1 Open-weight frontier
Perplexity Sonar, Sonar Pro Real-time web search
Ollama Any local model Offline / private

Vision Models

  • SAM 3 (zero-shot segmentation)
  • YOLOv12 (object detection)
  • Qwen3-VL, Florence-2, GLM-OCR (document understanding)

Audio (8 STT Engines)

  • SenseVoice-Small (15x faster, emotion recognition)
  • Granite Speech 8B (WER 5.85%)
  • Whisper V3 Turbo, Distil-Whisper, Parakeet, Canary, Moonshine

Embeddings

  • Qwen3-Embedding-8B (multilingual SOTA)
  • OpenAI text-embedding-3-large / text-embedding-3-small
  • Code embeddings (CodeBERT, UniXcoder), CLIP/SigLIP

2026 Benchmarks

RAG Accuracy: GraphRAG vs Standard

Retrieval Method Simple Q&A Multi-hop Q&A Relationship Q&A
Standard (vector) 85% 62% 45%
GraphRAG 87% 99% 98%

OCR Accuracy: Traditional vs VLM-based

Engine Type Printed Text Tables Formulas Complex Layout
Traditional OCR 95% 70% 45% 60%
VLM-based OCR 98% 96% 94% 95%

Fine-tuning: Memory & Performance

Method GPU Memory (7B model) Alignment Score Training Time
RLHF ~80 GB Baseline 24h
DPO ~40 GB +5% 8h
ORPO ~20 GB +5% 6h
KTO ~25 GB +3% 7h

Reasoning Models: Thinking Budget vs Accuracy

Model Thinking Budget MATH Score HumanEval
GPT-4o (no thinking) 0 72% 87%
Claude Opus 4.8 (4K) 4,000 88% 94%
Claude Opus 4.8 (8K) 8,000 95% 97%
o3 (max) 32,000 97% 98%

Architecture

Built with Clean Architecture - dependencies point inward, each layer only knows about the layer directly below it.

                       ┌──────────────────────────────┐
                       │        Facade Layer          │
                       │  Client, RAGChain, Agent     │
                       └──────────────┬───────────────┘
                                      │
                       ┌──────────────▼───────────────┐
                       │       Handler Layer          │
                       │  Validation, decorators      │
                       └──────────────┬───────────────┘
                                      │ interfaces only
                       ┌──────────────▼───────────────┐
                       │       Service Layer          │
                       │  Business logic (I + Impl)   │
                       └──────────────┬───────────────┘
                                      │
                       ┌──────────────▼───────────────┐
                       │       Domain Layer           │
                       │  Core entities and rules     │
                       └──────────────┬───────────────┘
                                      │
                       ┌──────────────▼───────────────┐
                       │    Infrastructure Layer      │
                       │  Providers, vector stores    │
                       └──────────────────────────────┘

Project Structure

src/beanllm/
├── facade/           # Public API (Client, RAG, Agent, Chain, etc.)
├── handler/          # Request handling (core, advanced, ml)
├── service/          # Business logic interfaces + impl/
├── domain/           # Core models (40+ modules)
├── dto/              # Data transfer objects
├── infrastructure/   # External integrations (60+ files)
├── providers/        # LLM provider implementations (8 providers)
├── decorators/       # Error handling, validation, logging
├── ui/               # Interactive TUI
└── utils/            # CLI, config, streaming, tracer

src/beantui/          # Standalone reusable TUI engine
mcp_server/           # Model Context Protocol server
playground/           # Full-stack Chat UI (FastAPI + Next.js)

Development

Setup

# Clone and install
git clone https://github.com/leebeanbin/beanllm.git
cd beanllm
pip install -e ".[dev,all]"

# Setup pre-commit hooks (auto quality checks on commit)
make pre-commit

Code Quality

make quality       # Full: ruff format + lint + mypy + bandit + pytest
make quick-fix     # Auto-fix: ruff lint + format + import sort
make type-check    # MyPy type checking
make lint          # Ruff linting only
make test          # Run pytest
make test-cov      # pytest with HTML coverage report
make clean         # Remove caches and build artifacts

Branch Workflow

# 1. Create a branch from main
make new-feat NAME=rag-hyde         # feat/rag-hyde
make new-fix NAME=chat-rate-limit   # fix/chat-rate-limit
make new-refactor NAME=service      # refactor/service

# 2. Develop and commit (reference issue numbers)
git commit -m "feat(rag): Add HyDE query expansion

Closes #42"

# 3. Quality check + push + create PR
make pr

# 4. Keep in sync with main
make sync

# 5. After PR is merged, clean up
make done

Pre-commit Hooks

Automatically run on every git commit:

Tool Purpose
Ruff Code formatting, linting, import sorting
Bandit Security scanning

Contributing

  1. Create an Issue using one of the templates (Feature, Bug, Refactor)
  2. Create a branch: make new-feat NAME=your-feature
  3. Develop with commits referencing the issue (Closes #issue_number)
  4. Run quality checks: make quality
  5. Submit a PR: make pr (auto-fills the PR template)
  6. After merge: delete the branch on GitHub, then make done locally

Templates

  • Issue templates: Feature Request, Bug Report, Refactoring
  • PR template: Summary, Related Issues (Closes #N), Changes, Test Plan

Testing

# Run all tests
pytest

# With coverage report
pytest --cov=src/beanllm --cov-report=html

# Full quality pipeline
make quality

Current coverage: 80% (6,340 tests pass)


License

MIT License - see LICENSE file.


Links


Built with care for the LLM community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

beanllm-0.4.0.tar.gz (826.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

beanllm-0.4.0-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file beanllm-0.4.0.tar.gz.

File metadata

  • Download URL: beanllm-0.4.0.tar.gz
  • Upload date:
  • Size: 826.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for beanllm-0.4.0.tar.gz
Algorithm Hash digest
SHA256 7f89ac54fae39bf77b56888ee93276e5909b570bf634b930c5598dbcf293b12e
MD5 9064981aec15a17841fec9128e21a58b
BLAKE2b-256 2dd4d94bd0cfb767e89f16f931f289d0fc680d6e8c400c4598aa7900d051929b

See more details on using hashes here.

File details

Details for the file beanllm-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: beanllm-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for beanllm-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4a1b557b076b49e2fece9c1d39a25bbb7215a858c87707591fc72570c240720
MD5 0d1ad525dc96e3807bc1c774d504c5da
BLAKE2b-256 d01e3bc548046d8cd72ce4e5d244dea3da0e34e5c5f8d550d66b009f933c5b4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page