Unified LLM framework — 8 providers, reasoning models (thinking_budget), VLM-OCR, GraphRAG, ORPO fine-tuning, 80% test coverage

These details have not been verified by PyPI

Project links

Project description

beanllm

Unified LLM framework supporting reasoning models, VLM-OCR, GraphRAG, and agentic workflows — 8 providers, 80% test coverage

Why beanllm?

	LangChain	LlamaIndex	beanllm
Architecture	Flat chain	Index-centric	Clean Architecture (Facade → Handler → Service → Domain)
Reasoning Models	Manual config	Manual config	`thinking_budget` native support
VLM-OCR	Not supported	Not supported	11 engines + Qwen3-VL / GLM-OCR / DeepSeek-VL2
GraphRAG	Plugin	Plugin	KnowledgeGraph Facade built-in
Test Coverage	—	—	80% (6,340 tests)
ORPO Fine-tuning	Not supported	Not supported	Native support (50% less memory than DPO)
Providers	OpenAI-heavy	OpenAI-heavy	8 providers including Grok/xAI

Features Overview

Module	Highlights
LLM Providers	8 providers (OpenAI, Claude, Gemini, Grok, DeepSeek, Perplexity, Ollama) with smart parameter adaptation
Reasoning Models	`thinking_budget` for Claude/OpenAI o-series, `<thinking>` token filtering
RAG Pipeline	Document loaders, vector stores, hybrid search, rerankers, HyDE, MultiQuery
Embeddings	11 providers, Matryoshka, Code embeddings, CLIP/SigLIP
Retrieval	ColBERT, ColPali, 5 rerankers, semantic chunking, Agentic Retrieval
Evaluation	RAGAS, DeepEval, TruLens, Human-in-the-loop
Vision	SAM3, YOLOv12, Florence-2, Qwen3-VL
Audio	8 STT engines (Whisper, SenseVoice, Granite)
OCR	11 engines (PaddleOCR, Qwen3-VL, GLM-OCR, DeepSeek-VL2)
Fine-tuning	LoRA/QLoRA, DPO, ORPO (2026 standard), KTO
Optimizer	Parameter search, benchmarking, A/B testing
Multi-Agent	Sequential, parallel, hierarchical, debate patterns
Orchestrator	10 node types, DAG workflow graph, visual builder
Knowledge Graph	Multi NER engines, relation extraction, GraphRAG (Gartner Critical Enabler 2026), Neo4j
MCP Server	Model Context Protocol server for tool integration

Key Capabilities

Unified Interface - Single API for 8 LLM providers including Grok/xAI
Reasoning-First - Native thinking_budget for step-by-step reasoning
VLM-OCR Paradigm - Document understanding beyond character recognition
GraphRAG Built-in - Relationship-aware retrieval with 99% accuracy
Smart Parameter Adaptation - Auto-convert between providers
Advanced PDF Processing - 3-layer architecture (Fast/Accurate/ML)
8 Vector Stores - Chroma, FAISS, Pinecone, Qdrant, Weaviate, Milvus, LanceDB, pgvector
Graph Workflows - LangGraph-style DAG execution
Production Ready - Retry, circuit breaker, rate limiting, tracing
Interactive TUI - OpenCode-style terminal UI with autocomplete

Quick Start

Installation

# Basic
pip install beanllm

# With specific providers
pip install beanllm[openai,anthropic,gemini]

# Full installation (all providers + CLI + MCP)
pip install beanllm[all]

# Development
pip install -e ".[dev,all]"

Environment Setup

# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
XAI_API_KEY=xai-...          # Grok/xAI
OLLAMA_HOST=http://localhost:11434

Basic Chat

import asyncio
from beanllm import Client

async def main():
    client = Client(model="gpt-4o")
    response = await client.chat(
        messages=[{"role": "user", "content": "Explain quantum computing"}]
    )
    print(response.content)

    # Streaming
    async for chunk in client.stream_chat(
        messages=[{"role": "user", "content": "Tell me a story"}]
    ):
        print(chunk, end="", flush=True)

asyncio.run(main())

RAG in One Line

from beanllm import RAGChain

async def main():
    rag = RAGChain.from_documents("docs/")
    result = await rag.query("What is this about?", include_sources=True)
    print(result.answer)

asyncio.run(main())

Tools & Agents

from beanllm import Agent, Tool

@Tool.from_function
def calculator(expression: str) -> str:
    """Evaluate a math expression"""
    return str(eval(expression))

agent = Agent(model="gpt-4o-mini", tools=[calculator])
result = await agent.run("What is 25 * 17?")

Graph Workflows

from beanllm import StateGraph

graph = StateGraph()
graph.add_node("analyze", analyze_fn)
graph.add_node("improve", improve_fn)
graph.add_conditional_edges("analyze", decide, {"good": "END", "bad": "improve"})
graph.set_entry_point("analyze")

result = await graph.invoke({"input": "Draft proposal"})

Reasoning Models

As of June 2026, reasoning models (GPT-5.5, Claude Opus 4.8, Grok 4.3) have become the standard for complex problem-solving. beanllm supports native thinking_budget to control the depth of chain-of-thought reasoning.

import asyncio
from beanllm import Client

async def main():
    # Claude: thinking_budget controls tokens allocated for <thinking>
    client = Client(model="claude-opus-4-8", thinking_budget=8000)
    response = await client.chat(
        messages=[{"role": "user", "content": "Prove P ≠ NP or explain the best current approaches"}],
        stream_thinking=False,   # filter out <thinking> tokens from output
    )
    print(response.content)

    # OpenAI o-series: maps to reasoning_effort
    o3_client = Client(model="o3", thinking_budget=16000)
    response = await o3_client.chat(
        messages=[{"role": "user", "content": "Design a distributed consensus algorithm"}]
    )
    print(response.content)

    # Grok 4.3: xAI's reasoning model
    grok_client = Client(model="grok-4.3", thinking_budget=4000)
    response = await grok_client.chat(
        messages=[{"role": "user", "content": "Analyze market trends"}]
    )
    print(response.content)

asyncio.run(main())

Model	Provider	Thinking Budget	Best For
`claude-opus-4-8`	Anthropic	Up to 32K tokens	Math, coding, analysis
`gpt-5.5`	OpenAI	Auto-scaled	General reasoning
`o3`	OpenAI	Up to 32K tokens	Competition math, science
`grok-4.3`	xAI	Up to 8K tokens	Real-time + reasoning
`gemini-3.0-pro`	Google	Up to 16K tokens	Multimodal reasoning

GraphRAG — Gartner Critical Enabler 2026

GraphRAG was designated a Critical Enabler by Gartner in 2026. Unlike vector similarity search, GraphRAG traverses entity relationships and achieves up to 99% retrieval accuracy on multi-hop questions.

import asyncio
from beanllm import KnowledgeGraph

async def main():
    kg = KnowledgeGraph()

    # Build graph from documents (entities + relationships extracted automatically)
    await kg.build_graph(documents=docs, graph_id="tech_companies")

    # Multi-hop relationship queries that vector search cannot answer
    result = await kg.graph_rag(
        query="Who founded Apple and what companies did they later invest in?",
        graph_id="tech_companies",
        max_hops=3,
    )
    print(result.answer)
    print(result.reasoning_path)  # e.g., Jobs → Pixar → Disney

asyncio.run(main())

GraphRAG vs Standard RAG

Dimension	Standard RAG (Vector)	GraphRAG
Retrieval method	Cosine similarity	Graph traversal
Multi-hop questions	Poor	Excellent
Relationship reasoning	None	Native
Accuracy (multi-hop Q&A)	~60-70%	~99%
Best use case	Semantic search	Entity & relationship queries

VLM-Based OCR

Paradigm shift in 2026: Traditional OCR recognizes characters. VLM-based OCR understands documents — layout, tables, formulas, and context — making it the standard for production document processing.

Traditional OCR:  Character Recognition  →  Text string
VLM-based OCR:    Document Understanding →  Structured knowledge
                  ┌─────────────────────────────────────────┐
                  │  Layout  │  Tables  │  Formulas  │  Context │
                  └─────────────────────────────────────────┘

from beanllm.domain.ocr import beanOCR
from beanllm.domain.ocr.models import OCRConfig

# Traditional PaddleOCR — fast, character-level
ocr_fast = beanOCR(OCRConfig(engine="paddleocr", language="en"))

# VLM-based — document understanding (2026 standard)
ocr_vlm = beanOCR(OCRConfig(engine="qwen3vl", language="auto"))

result = ocr_vlm.recognize("invoice.pdf")
print(result.text)          # Full extracted text
print(result.tables)        # Structured table data
print(result.confidence)    # Per-region confidence

OCR Engine Comparison (June 2026)

Engine	Type	Accuracy	Speed	Use Case
`paddleocr`	Traditional	95%	⚡⚡⚡	Fast text extraction
`easyocr`	Traditional	92%	⚡⚡	80+ languages
`qwen3vl`	VLM	98%	⚡	Document understanding
`glm-ocr`	VLM	97%	⚡	Complex layouts
`deepseek-vl2`	VLM	97%	⚡	Formulas & tables
`tesseract`	Traditional	88%	⚡⚡⚡	Open source, offline

Fine-tuning

ORPO — 2026 Standard (replaces DPO)

ORPO (Odds Ratio Preference Optimization) eliminates the reference model, cutting GPU memory by 50% compared to DPO while achieving equal or better alignment quality.

from beanllm import FineTuningFacade

facade = FineTuningFacade()

# ORPO — no reference model required (50% less memory than DPO)
result = await facade.train(
    base_model="meta-llama/Llama-3.1-8B",
    dataset_path="data/preference_pairs.jsonl",
    training_method="orpo",          # "dpo" | "orpo" | "kto" | "lora"
    output_dir="./orpo-llama-8b",
    num_epochs=3,
    learning_rate=8e-6,
)

# DPO — reference model required
result = await facade.train(
    base_model="meta-llama/Llama-3.1-8B",
    dataset_path="data/preference_pairs.jsonl",
    training_method="dpo",
    output_dir="./dpo-llama-8b",
)

Fine-tuning Method Comparison

Method	Reference Model	GPU Memory	Alignment Quality	Notes
SFT	No	Low	Baseline	Simple supervised
LoRA	No	Low	Moderate	Parameter-efficient
DPO	Yes	High	Good	2023-2025 standard
ORPO	No	Medium	Equal to DPO	2026 standard
KTO	No	Medium	Good	Binary feedback

Installation Extras

beanllm uses optional extras to keep the base installation lightweight.

Extra	Description	Install
`openai`	OpenAI provider	`pip install beanllm[openai]`
`anthropic`	Anthropic Claude provider	`pip install beanllm[anthropic]`
`gemini`	Google Gemini provider	`pip install beanllm[gemini]`
`grok`	Grok/xAI provider	`pip install beanllm[grok]`
`ollama`	Ollama local models	`pip install beanllm[ollama]`
`audio`	Whisper STT	`pip install beanllm[audio]`
`ml`	ML-based PDF (marker-pdf, torch)	`pip install beanllm[ml]`
`cli`	CLI (typer)	`pip install beanllm[cli]`
`mcp`	MCP server (fastmcp)	`pip install beanllm[mcp]`
`all`	All providers + CLI + MCP	`pip install beanllm[all]`
`vector`	ChromaDB vector store	`pip install beanllm[vector]`
`semantic`	Semantic chunking (sentence-transformers)	`pip install beanllm[semantic]`
`colbert`	ColBERT multi-vector search	`pip install beanllm[colbert]`
`colpali`	ColPali vision document search	`pip install beanllm[colpali]`
`ragpro`	Enterprise RAG (semantic + colbert + db)	`pip install beanllm[ragpro]`
`distributed`	Redis + Kafka	`pip install beanllm[distributed]`
`monitoring`	Streamlit dashboard + Plotly	`pip install beanllm[monitoring]`
`advanced`	UMAP, HDBSCAN, NetworkX, Bayesian opt	`pip install beanllm[advanced]`
`neo4j`	Neo4j graph database	`pip install beanllm[neo4j]`
`db`	PostgreSQL + MongoDB drivers	`pip install beanllm[db]`
`web`	FastAPI playground backend	`pip install beanllm[web]`
`dev`	Development tools (pytest, ruff, mypy, bandit)	`pip install beanllm[dev]`

Docker

The project includes Docker Compose with profile-based service management.

# Infrastructure only (MongoDB, Redis, Kafka, Ollama)
docker compose up -d

# Full stack (+ FastAPI backend + Next.js frontend)
docker compose --profile app up -d

# Full stack + admin UIs (Kafka UI, Mongo Express, Redis Commander)
docker compose --profile app --profile ui up -d

# With Neo4j knowledge graph
docker compose --profile neo4j up -d

# With monitoring dashboard
docker compose --profile monitoring up -d

# Stop and remove volumes
docker compose down -v

Services

Service	Port	Profile
MongoDB	27017	default
Redis	6379	default
Kafka	9092	default
Ollama	11434	default
Backend (FastAPI)	8000	`app`
Frontend (Next.js)	3000	`app`
Neo4j	7474 / 7687	`neo4j`
Kafka UI	8080	`ui`
Mongo Express	8081	`ui`
Redis Commander	8082	`ui`

CLI

# Interactive TUI (OpenCode-style, no arguments)
beanllm

# Model management
beanllm list              # List available models
beanllm show gpt-4o       # Show model details
beanllm providers          # Check provider status
beanllm summary            # Quick summary statistics
beanllm export             # Export models as JSON

# Advanced
beanllm scan               # Scan APIs for new models
beanllm analyze gpt-4o     # Model analysis with pattern inference

# Admin (Google Workspace)
beanllm admin stats        # Google service statistics
beanllm admin analyze      # Usage analysis with Gemini
beanllm admin optimize     # Cost optimization suggestions
beanllm admin security     # Security event audit
beanllm admin dashboard    # Launch Streamlit dashboard

Playground

Full-stack Chat UI built with FastAPI (backend) and Next.js 15 + React 19 (frontend).

Backend (`playground/backend/`)

17 API routers: chat, agent, multi-agent, RAG, chain, knowledge graph, vision, audio, evaluation, fine-tuning, optimizer, OCR, web search, monitoring, config, models, history
Agentic Chat: automatic intent classification with tool routing
Session-based RAG: per-session document upload and retrieval
Redis caching and MongoDB persistence
WebSocket real-time communication with heartbeat
SSE streaming with proper [DONE] termination
Connection pooling: httpx, MongoDB, Redis

Frontend (`playground/frontend/`)

Next.js 15 with React 19 and Tailwind CSS
Pages: Chat, Monitoring Dashboard, Settings
Features: streaming responses, session management, API key modal, Google OAuth, model selector

Setup

See the detailed guides in playground/backend/:

LOCAL_SETUP.md - Local development setup
START_GUIDE.md - Getting started guide
TROUBLESHOOTING.md - Common issues and solutions

Model Support

LLM Providers

Provider	Models	Notes
OpenAI	GPT-5, GPT-5.5, GPT-4o, o3, o4-mini	Best general-purpose
Anthropic	Claude Opus 4.8, Claude Sonnet 4.6, Claude Haiku 4.5	Best reasoning
Google	Gemini 3.0 Pro, Gemini 3.0 Flash	Best multimodal
Grok/xAI	Grok 4.3, Grok 4.3 Vision	Real-time + reasoning
DeepSeek	DeepSeek-V3, DeepSeek-R1	Open-weight frontier
Perplexity	Sonar, Sonar Pro	Real-time web search
Ollama	Any local model	Offline / private

Vision Models

SAM 3 (zero-shot segmentation)
YOLOv12 (object detection)
Qwen3-VL, Florence-2, GLM-OCR (document understanding)

Audio (8 STT Engines)

SenseVoice-Small (15x faster, emotion recognition)
Granite Speech 8B (WER 5.85%)
Whisper V3 Turbo, Distil-Whisper, Parakeet, Canary, Moonshine

Embeddings

Qwen3-Embedding-8B (multilingual SOTA)
OpenAI text-embedding-3-large / text-embedding-3-small
Code embeddings (CodeBERT, UniXcoder), CLIP/SigLIP

2026 Benchmarks

RAG Accuracy: GraphRAG vs Standard

Retrieval Method	Simple Q&A	Multi-hop Q&A	Relationship Q&A
Standard (vector)	85%	62%	45%
GraphRAG	87%	99%	98%

OCR Accuracy: Traditional vs VLM-based

Engine Type	Printed Text	Tables	Formulas	Complex Layout
Traditional OCR	95%	70%	45%	60%
VLM-based OCR	98%	96%	94%	95%

Fine-tuning: Memory & Performance

Method	GPU Memory (7B model)	Alignment Score	Training Time
RLHF	~80 GB	Baseline	24h
DPO	~40 GB	+5%	8h
ORPO	~20 GB	+5%	6h
KTO	~25 GB	+3%	7h

Reasoning Models: Thinking Budget vs Accuracy

Model	Thinking Budget	MATH Score	HumanEval
GPT-4o (no thinking)	0	72%	87%
Claude Opus 4.8 (4K)	4,000	88%	94%
Claude Opus 4.8 (8K)	8,000	95%	97%
o3 (max)	32,000	97%	98%

Architecture

Built with Clean Architecture - dependencies point inward, each layer only knows about the layer directly below it.

                       ┌──────────────────────────────┐
                       │        Facade Layer          │
                       │  Client, RAGChain, Agent     │
                       └──────────────┬───────────────┘
                                      │
                       ┌──────────────▼───────────────┐
                       │       Handler Layer          │
                       │  Validation, decorators      │
                       └──────────────┬───────────────┘
                                      │ interfaces only
                       ┌──────────────▼───────────────┐
                       │       Service Layer          │
                       │  Business logic (I + Impl)   │
                       └──────────────┬───────────────┘
                                      │
                       ┌──────────────▼───────────────┐
                       │       Domain Layer           │
                       │  Core entities and rules     │
                       └──────────────┬───────────────┘
                                      │
                       ┌──────────────▼───────────────┐
                       │    Infrastructure Layer      │
                       │  Providers, vector stores    │
                       └──────────────────────────────┘

Project Structure

src/beanllm/
├── facade/           # Public API (Client, RAG, Agent, Chain, etc.)
├── handler/          # Request handling (core, advanced, ml)
├── service/          # Business logic interfaces + impl/
├── domain/           # Core models (40+ modules)
├── dto/              # Data transfer objects
├── infrastructure/   # External integrations (60+ files)
├── providers/        # LLM provider implementations (8 providers)
├── decorators/       # Error handling, validation, logging
├── ui/               # Interactive TUI
└── utils/            # CLI, config, streaming, tracer

src/beantui/          # Standalone reusable TUI engine
mcp_server/           # Model Context Protocol server
playground/           # Full-stack Chat UI (FastAPI + Next.js)

Development

Setup

# Clone and install
git clone https://github.com/leebeanbin/beanllm.git
cd beanllm
pip install -e ".[dev,all]"

# Setup pre-commit hooks (auto quality checks on commit)
make pre-commit

Code Quality

make quality       # Full: ruff format + lint + mypy + bandit + pytest
make quick-fix     # Auto-fix: ruff lint + format + import sort
make type-check    # MyPy type checking
make lint          # Ruff linting only
make test          # Run pytest
make test-cov      # pytest with HTML coverage report
make clean         # Remove caches and build artifacts

Branch Workflow

# 1. Create a branch from main
make new-feat NAME=rag-hyde         # feat/rag-hyde
make new-fix NAME=chat-rate-limit   # fix/chat-rate-limit
make new-refactor NAME=service      # refactor/service

# 2. Develop and commit (reference issue numbers)
git commit -m "feat(rag): Add HyDE query expansion

Closes #42"

# 3. Quality check + push + create PR
make pr

# 4. Keep in sync with main
make sync

# 5. After PR is merged, clean up
make done

Pre-commit Hooks

Automatically run on every git commit:

Tool	Purpose
Ruff	Code formatting, linting, import sorting
Bandit	Security scanning

Contributing

Create an Issue using one of the templates (Feature, Bug, Refactor)
Create a branch: make new-feat NAME=your-feature
Develop with commits referencing the issue (Closes #issue_number)
Run quality checks: make quality
Submit a PR: make pr (auto-fills the PR template)
After merge: delete the branch on GitHub, then make done locally

Templates

Issue templates: Feature Request, Bug Report, Refactoring
PR template: Summary, Related Issues (Closes #N), Changes, Test Plan

Testing

# Run all tests
pytest

# With coverage report
pytest --cov=src/beanllm --cov-report=html

# Full quality pipeline
make quality

Current coverage: 80% (6,340 tests pass)

License

MIT License - see LICENSE file.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Jun 8, 2026

0.3.0

Feb 9, 2026

0.2.2

Jan 5, 2026

0.2.1

Jan 5, 2026

0.2.0

Jan 1, 2026

0.1.1

Dec 25, 2025

0.1.0

Dec 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

beanllm-0.4.0.tar.gz (826.8 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

beanllm-0.4.0-py3-none-any.whl (1.2 MB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file beanllm-0.4.0.tar.gz.

File metadata

Download URL: beanllm-0.4.0.tar.gz
Upload date: Jun 8, 2026
Size: 826.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for beanllm-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`7f89ac54fae39bf77b56888ee93276e5909b570bf634b930c5598dbcf293b12e`
MD5	`9064981aec15a17841fec9128e21a58b`
BLAKE2b-256	`2dd4d94bd0cfb767e89f16f931f289d0fc680d6e8c400c4598aa7900d051929b`

See more details on using hashes here.

File details

Details for the file beanllm-0.4.0-py3-none-any.whl.

File metadata

Download URL: beanllm-0.4.0-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 1.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for beanllm-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c4a1b557b076b49e2fece9c1d39a25bbb7215a858c87707591fc72570c240720`
MD5	`0d1ad525dc96e3807bc1c774d504c5da`
BLAKE2b-256	`d01e3bc548046d8cd72ce4e5d244dea3da0e34e5c5f8d550d66b009f933c5b4e`

See more details on using hashes here.

beanllm 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

beanllm

Why beanllm?

Features Overview

Key Capabilities

Quick Start

Installation

Environment Setup

Basic Chat

RAG in One Line

Tools & Agents

Graph Workflows

Reasoning Models

GraphRAG — Gartner Critical Enabler 2026

GraphRAG vs Standard RAG

VLM-Based OCR

OCR Engine Comparison (June 2026)

Fine-tuning

ORPO — 2026 Standard (replaces DPO)

Fine-tuning Method Comparison

Installation Extras

Docker

Services

CLI

Playground

Backend (playground/backend/)

Frontend (playground/frontend/)

Setup

Model Support

LLM Providers

Vision Models

Audio (8 STT Engines)

Embeddings

2026 Benchmarks

RAG Accuracy: GraphRAG vs Standard

OCR Accuracy: Traditional vs VLM-based

Fine-tuning: Memory & Performance

Reasoning Models: Thinking Budget vs Accuracy

Architecture

Project Structure

Development

Setup

Code Quality

Branch Workflow

Pre-commit Hooks

Contributing

Templates

Testing

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Backend (`playground/backend/`)

Frontend (`playground/frontend/`)