Powerful LLM framework — 3 lines of code to do what LangChain takes 30. Chat, Tools, RAG, Agents, Streaming.
Project description
nb_llm
A powerful LLM framework — 3 lines of code to do what LangChain takes 30 lines.
Why nb_llm?
| Comparison | nb_llm | LangChain + LangGraph |
|---|---|---|
| Core abstraction | 1 (Chat) | 10+ (LLM, Chain, Agent, Tool, Prompt, Parser, Memory...) |
| Learning curve | 1 hour | 1 week+ |
| Install | pip install nb_llm |
langchain + langchain-core + langchain-openai + langgraph + ... |
| Workflow | Native Python if/for/while | StateGraph DSL + conditional edges + state machines |
| Tool calls | @chat.tool one line |
@tool + Prompt + Agent + AgentExecutor |
Installation
pip install nb_llm
Get Started in 30 Seconds
from nb_llm import Chat, ChatConfig
chat = Chat(ChatConfig("deepseek"))
print(chat.ask("Explain Python in one sentence"))
That's it. Auto-detects model provider, auto-discovers API Key.
Core Features
Multi-turn Conversation
chat = Chat(ChatConfig("deepseek"))
chat.send("My name is Alice")
chat.send("I'm 25 years old")
print(chat.send("What's my name? How old am I?"))
# "Your name is Alice, and you are 25 years old"
send() remembers context, ask() is stateless — they don't interfere with each other.
Streaming Output
for chunk in chat.stream("Write a poem about spring"):
print(chunk, end="", flush=True)
Tool Calls (Agent)
chat = Chat(ChatConfig("deepseek"))
@chat.tool
def get_weather(city: str) -> str:
"""Get weather for a city"""
return f"{city}: Sunny, 25°C"
answer = chat.send("What's the weather in Beijing?")
# Automatically calls get_weather("Beijing"), generates answer based on result
Compare with LangChain's @tool + AgentExecutor + Prompt template — nb_llm only needs @chat.tool.
Structured Output (Pydantic)
from pydantic import BaseModel, Field
class Movie(BaseModel):
title: str = Field(description="Movie title")
year: int = Field(description="Release year")
rating: float = Field(description="Rating, 1-10")
result = chat.ask("Recommend a sci-fi movie", SendOptions(response_type=Movie))
movie = result.parsed # Movie object with full IDE completion
print(movie.title, movie.year, movie.rating)
Pipeline Composition
translator = Chat(ChatConfig("deepseek", system="Translate to English"))
summarizer = Chat(ChatConfig("deepseek", system="Summarize in one sentence"))
pipeline = translator >> summarizer
result = pipeline("AI is changing the world")
Batch Processing
answers = chat.batch(
["What is Python?", "What is Java?", "What is Go?"],
concurrency=5,
)
Multi-Agent Collaboration
from nb_llm import Team
pm = Chat(ChatConfig("deepseek", system="You are a product manager", name="PM"))
dev = Chat(ChatConfig("deepseek", system="You are a developer", name="Dev"))
qa = Chat(ChatConfig("deepseek", system="You are a QA engineer", name="QA"))
result = Team(pm, dev, qa).discuss("Design login feature", rounds=3)
print(result.conclusion)
Vision / Multimodal
# Analyze an image (URL or local file)
result = chat.ask(
"What's in this image?",
SendOptions(image="https://example.com/photo.jpg")
)
# Multiple images + local file
result = chat.ask(
"Compare these two diagrams",
SendOptions(images=["diagram1.png", "diagram2.png"])
)
Local files are automatically converted to base64 data URLs.
Router (Intelligent Routing)
from nb_llm import Router
math_chat = Chat(ChatConfig("deepseek", system="You are a math expert"))
code_chat = Chat(ChatConfig("deepseek", system="You are a coding expert"))
classifier = Chat(ChatConfig("deepseek"))
router = Router(
experts={"math": math_chat, "code": code_chat},
classifier=classifier,
)
# Automatically routes to the appropriate expert
answer = router.send("Solve x^2 + 3x - 4 = 0") # → math expert
answer = router.send("Write a Python sort function") # → code expert
RAG (Retrieval-Augmented Generation)
from nb_llm import RAG, RAGConfig
# Basic usage
rag = RAG(RAGConfig(model="deepseek"))
rag.add("./docs/") # Load a directory
rag.add("manual.pdf") # Load a single file
answer = rag.chat("How to install this product?")
print(answer)
for src in answer.sources:
print(f"{src.file}: {src.chunk[:50]}...")
RAG with ChromaDB Persistence + Custom Embedding
rag = RAG(RAGConfig(
model="deepseek",
embedding_model="BAAI/bge-m3",
embedding_api_key="your-key",
embedding_base_url="https://api.siliconflow.cn/v1",
chunk_size=5000,
chunk_overlap=500,
top_k=15,
vectorstore="chromadb",
vectorstore_path="./my_vectordb",
))
# First run: vectorizes and persists to disk
# Second run: loads from disk, skips re-vectorization
if len(rag.vectorstore) == 0:
rag.add("large_document.txt")
answer = rag.chat("How does it work?")
Standalone Embedding
from nb_llm import Embedding
emb = Embedding(model="BAAI/bge-m3", api_key="...", base_url="...")
vector = emb.embed("Hello world") # Single text → List[float]
vectors = emb.embed(["Hello", "World"]) # Batch → List[List[float]]
Advanced Features
ChatConfig — Reusable Configuration
from dataclasses import dataclass
@dataclass
class ProductionConfig(ChatConfig):
model: str = "deepseek"
retry: int = 3
cache: bool = True
temperature: float = 0.3
chat = Chat(ProductionConfig())
SendOptions — Per-call Options
from nb_llm import SendOptions
# IDE shows all available fields after typing SendOptions(
result = chat.ask("Give a precise answer", SendOptions(
temperature=0,
json=True,
max_tokens=100,
))
Async Support
import asyncio
async def main():
answer = await chat.aio_send("Hello")
async for chunk in chat.aio_stream("Write a poem"):
print(chunk, end="")
answers = await chat.aio_batch(["Q1", "Q2"], concurrency=5)
asyncio.run(main())
History Persistence
# File / SQLite / Redis backends
chat = Chat(ChatConfig("deepseek",
history_backend="sqlite",
history_url="./history.db",
))
Fault Tolerance & Fallback
chat = Chat(ChatConfig("deepseek",
retry=3,
fallback="qwen",
cache=True,
cache_ttl=3600,
))
Conversation Save / Load
# Save conversation history to file
chat.save("conversation.json")
# Load conversation from file
chat.load("conversation.json")
# View current history
print(chat.history)
# Use as context manager (auto-clears history on exit)
with Chat(ChatConfig("deepseek")) as chat:
chat.send("Hello")
chat.send("How are you?")
# history is cleared here
Clone
# Create an independent copy with same config and tools
chat2 = chat.clone()
Token Counting
token_count = chat.count_tokens("Some long text...")
is_ok = chat.check_tokens("Some text", max_tokens=4000)
Multi-model Strategy
# Race mode: use whichever responds first
chat = Chat(ChatConfig(
model=["gpt-4o", "deepseek", "qwen"],
strategy="fastest",
))
Session Management
session_a = chat.session("user_001")
session_b = chat.session("user_002")
session_a.send("My name is Alice")
session_b.send("My name is Bob")
Cost Tracking
with chat.track_cost() as tracker:
chat.send("Question 1")
chat.send("Question 2")
print(f"Total tokens: {tracker.total_tokens}")
Hook System
Three ways to register hooks — choose the best fit for your scenario:
Method 1: Decorators (most concise)
chat = Chat(ChatConfig("deepseek"))
@chat.on_before
def log_request(messages, options):
print(f"[Sending] {messages[-1]['content'][:50]}...")
@chat.on_after
def log_response(response, usage):
print(f"[Received] tokens: {usage.total_tokens}")
@chat.on_error
def handle_error(error):
print(f"[Error] {error}")
@chat.on_tool_call
def log_tool(name, args):
print(f"[Tool] {name}({args})")
Method 2: Manual Registration (dynamic)
The decorator syntax @chat.on_before is sugar for chat.on_before(func) — you can call it directly:
def my_logger(messages, options):
print(f"[LOG] {messages[-1]['content'][:50]}...")
chat.on_before(my_logger) # same as @chat.on_before
chat.on_after(my_callback) # same as @chat.on_after
chat.on_error(my_handler) # same as @chat.on_error
chat.on_tool_call(my_tracer) # same as @chat.on_tool_call
Useful for runtime dynamic hook registration, or batch-registering the same hooks across multiple Chat instances.
Method 3: Inherit Chat (unified behavior)
Subclass Chat to define hooks once — all instances get them automatically:
class LoggedChat(Chat):
def __init__(self, config):
super().__init__(config)
self.on_before(self._log_before)
self.on_after(self._log_after)
def _log_before(self, messages, options):
print(f"[Sending] {messages[-1]['content'][:50]}...")
def _log_after(self, response, usage):
print(f"[Received] tokens: {usage.total_tokens}")
chat = LoggedChat(ChatConfig("deepseek"))
# No need to repeat registration for each instance
| Hook | Trigger | Callback Signature |
|---|---|---|
on_before |
Before sending request | (messages, options) |
on_after |
After receiving response | (response, usage) |
on_error |
On exception | (error) |
on_tool_call |
After each tool call | (name, args) |
Tool Management
Beyond @chat.tool, you can manage tools programmatically:
def search(query: str) -> str:
"""Search the web"""
return f"Results for {query}"
chat.add_tool(search) # Register a tool
chat.add_tools([func1, func2]) # Register multiple tools
chat.remove_tool("search") # Remove by name
chat.remove_tool(search) # Remove by function
chat.clear_tools() # Remove all tools
print(chat.tools) # List all tool schemas
@step Observability
from nb_llm import step
@step("translate")
def translate(text):
return chat.ask(f"Translate to English: {text}")
@step("summarize")
def summarize(text):
return chat.ask(f"Summarize: {text}")
# Each step emits events (start/end/error) with timing info
result = translate("你好世界")
Register listeners to capture step events:
from nb_llm.workflow.step import on_step_event
@on_step_event
def log_steps(event_type, data):
print(f"[{event_type}] {data['name']} ({data.get('elapsed', 0):.2f}s)")
Custom Provider Registration
from nb_llm import register_provider
register_provider(
name="my-llm",
model="my-model-v1",
base_url="https://my-api.com/v1",
api_key_env="MY_API_KEY",
)
# Now use it like any built-in model
chat = Chat(ChatConfig("my-llm"))
CLI Tools
# Version
python -m nb_llm version
# List models
python -m nb_llm models
# Single query (requires provider's API Key)
python -m nb_llm ask "Hello" --model deepseek --api-key "$DEEPSEEK_API_KEY"
# Interactive chat
python -m nb_llm chat --model deepseek --stream --api-key "$DEEPSEEK_API_KEY"
# Real-world example using SiliconFlow
python -m nb_llm ask "Hello" --model "tencent/Hunyuan-MT-7B" --base-url "https://api.siliconflow.cn/v1" --api-key "$YOUR_SILICONFLOW_API_KEY"
Built-in Model Support
| Alias | Actual Model | Provider |
|---|---|---|
deepseek |
deepseek-chat | DeepSeek |
gpt-4o |
gpt-4o | OpenAI |
claude |
claude-sonnet-4 | Anthropic |
qwen |
qwen-plus | Qwen (Alibaba) |
glm |
glm-4 | Zhipu GLM |
siliconflow |
DeepSeek-V3 | SiliconFlow |
ollama |
llama3 | Ollama (local) |
Also supports any OpenAI-compatible API:
chat = Chat(ChatConfig(
model="any-model-name",
base_url="https://your-api.com/v1",
api_key="sk-xxx",
))
Response Objects
All response objects support .attribute access (IDE completion) + .to_dict() conversion.
result = chat.send("Hello")
print(result) # Use as string
print(result.text) # Explicit text access
print(result.usage.total_tokens) # Token usage
print(result.model) # Model used
print(result.to_dict()) # Convert to dict
print(result.to_json_str()) # Formatted JSON string (indent=4)
| Object | Purpose | Key Attributes |
|---|---|---|
ChatResponse(str) |
Response value | .text .usage .tool_calls_made .parsed .model |
UsageInfo |
Token usage | .prompt_tokens .completion_tokens .total_tokens |
ToolCallRecord |
Tool call record | .name .args .result .elapsed |
StreamResponse |
Streaming response | .text .usage .finish_reason |
TeamResult |
Multi-agent result | .conclusion .transcript .rounds |
Project Structure
nb_llm/
├── core/ # Chat core, config, response, history backends
├── providers/ # OpenAI / Anthropic adapters, model registry
├── tools/ # Tool Schema generation, execution engine
├── agents/ # Router, Team multi-agent
├── rag/ # RAG retrieval-augmented generation
├── embedding/ # Embedding vectorization
├── middleware/ # Cache middleware
├── workflow/ # @step observability
└── __main__.py # CLI entry point
Documentation
- API Documentation (CN) — Full API usage + LangChain comparison
- Design Documentation (CN) — Architecture + module details
- API Documentation (EN) — Full API usage (English)
- Design Documentation (EN) — Architecture design (English)
- Changelog — All major changes
Compatibility
- Python 3.7+
- Core dependency:
openai - Optional:
httpx,python-dotenv,tiktoken,pydantic,anthropic,redis
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nb_llm-0.2.0.tar.gz.
File metadata
- Download URL: nb_llm-0.2.0.tar.gz
- Upload date:
- Size: 48.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f451cb1bac6b11e4a39be8153662ebc1c38b2a59e2396f98db30cc4833777ef4
|
|
| MD5 |
11dfc78e9121e4bf3018f657b99b949b
|
|
| BLAKE2b-256 |
0de64b010818b797ce233c38df85eab8bb3a8a225575938e9cdb0bbc0ef38600
|
File details
Details for the file nb_llm-0.2.0-py3-none-any.whl.
File metadata
- Download URL: nb_llm-0.2.0-py3-none-any.whl
- Upload date:
- Size: 51.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f70d14fce6fef56fc41942fa72719e7d91cabd148bd7758980442bcf6a6a7b19
|
|
| MD5 |
1ccb9c5982d5faafb49838ce960a586a
|
|
| BLAKE2b-256 |
865aca0d946e3cba66062942aa31d261561b680995802dda9da4108bcbd8ea37
|