Skip to main content

Powerful LLM framework — 3 lines of code to do what LangChain takes 30. Chat, Tools, RAG, Agents, Streaming.

Project description

nb_llm

A powerful LLM framework — 3 lines of code to do what LangChain takes 30 lines.

Python 3.7+ License: MIT

中文文档 (Chinese)


Why nb_llm?

Comparison nb_llm LangChain + LangGraph
Core abstraction 1 (Chat) 10+ (LLM, Chain, Agent, Tool, Prompt, Parser, Memory...)
Learning curve 1 hour 1 week+
Install pip install nb_llm langchain + langchain-core + langchain-openai + langgraph + ...
Workflow Native Python if/for/while StateGraph DSL + conditional edges + state machines
Tool calls @chat.tool one line @tool + Prompt + Agent + AgentExecutor

Installation

pip install nb_llm

Get Started in 30 Seconds

from nb_llm import Chat, ChatConfig

chat = Chat(ChatConfig("deepseek"))
print(chat.ask("Explain Python in one sentence"))

That's it. Auto-detects model provider, auto-discovers API Key.


Core Features

Multi-turn Conversation

chat = Chat(ChatConfig("deepseek"))
chat.send("My name is Alice")
chat.send("I'm 25 years old")
print(chat.send("What's my name? How old am I?"))
# "Your name is Alice, and you are 25 years old"

send() remembers context, ask() is stateless — they don't interfere with each other.

Streaming Output

for chunk in chat.stream("Write a poem about spring"):
    print(chunk, end="", flush=True)

Tool Calls (Agent)

chat = Chat(ChatConfig("deepseek"))

@chat.tool
def get_weather(city: str) -> str:
    """Get weather for a city"""
    return f"{city}: Sunny, 25°C"

answer = chat.send("What's the weather in Beijing?")
# Automatically calls get_weather("Beijing"), generates answer based on result

Compare with LangChain's @tool + AgentExecutor + Prompt template — nb_llm only needs @chat.tool.

Structured Output (Pydantic)

from pydantic import BaseModel, Field

class Movie(BaseModel):
    title: str = Field(description="Movie title")
    year: int = Field(description="Release year")
    rating: float = Field(description="Rating, 1-10")

result = chat.ask("Recommend a sci-fi movie", SendOptions(response_type=Movie))
movie = result.parsed  # Movie object with full IDE completion
print(movie.title, movie.year, movie.rating)

Pipeline Composition

translator = Chat(ChatConfig("deepseek", system="Translate to English"))
summarizer = Chat(ChatConfig("deepseek", system="Summarize in one sentence"))

pipeline = translator >> summarizer
result = pipeline("AI is changing the world")

Batch Processing

answers = chat.batch(
    ["What is Python?", "What is Java?", "What is Go?"],
    concurrency=5,
)

Multi-Agent Collaboration

from nb_llm import Team

pm = Chat(ChatConfig("deepseek", system="You are a product manager", name="PM"))
dev = Chat(ChatConfig("deepseek", system="You are a developer", name="Dev"))
qa = Chat(ChatConfig("deepseek", system="You are a QA engineer", name="QA"))

result = Team(pm, dev, qa).discuss("Design login feature", rounds=3)
print(result.conclusion)

Vision / Multimodal

# Analyze an image (URL or local file)
result = chat.ask(
    "What's in this image?",
    SendOptions(image="https://example.com/photo.jpg")
)

# Multiple images + local file
result = chat.ask(
    "Compare these two diagrams",
    SendOptions(images=["diagram1.png", "diagram2.png"])
)

Local files are automatically converted to base64 data URLs.

Router (Intelligent Routing)

from nb_llm import Router

math_chat = Chat(ChatConfig("deepseek", system="You are a math expert"))
code_chat = Chat(ChatConfig("deepseek", system="You are a coding expert"))
classifier = Chat(ChatConfig("deepseek"))

router = Router(
    experts={"math": math_chat, "code": code_chat},
    classifier=classifier,
)

# Automatically routes to the appropriate expert
answer = router.send("Solve x^2 + 3x - 4 = 0")  # → math expert
answer = router.send("Write a Python sort function")  # → code expert

RAG (Retrieval-Augmented Generation)

from nb_llm import RAG, RAGConfig

# Basic usage
rag = RAG(RAGConfig(model="deepseek"))
rag.add("./docs/")  # Load a directory
rag.add("manual.pdf")  # Load a single file
answer = rag.chat("How to install this product?")
print(answer)
for src in answer.sources:
    print(f"{src.file}: {src.chunk[:50]}...")

RAG with ChromaDB Persistence + Custom Embedding

rag = RAG(RAGConfig(
    model="deepseek",
    embedding_model="BAAI/bge-m3",
    embedding_api_key="your-key",
    embedding_base_url="https://api.siliconflow.cn/v1",
    chunk_size=5000,
    chunk_overlap=500,
    top_k=15,
    vectorstore="chromadb",
    vectorstore_path="./my_vectordb",
))

# First run: vectorizes and persists to disk
# Second run: loads from disk, skips re-vectorization
if len(rag.vectorstore) == 0:
    rag.add("large_document.txt")

answer = rag.chat("How does it work?")

Standalone Embedding

from nb_llm import Embedding

emb = Embedding(model="BAAI/bge-m3", api_key="...", base_url="...")
vector = emb.embed("Hello world")  # Single text → List[float]
vectors = emb.embed(["Hello", "World"])  # Batch → List[List[float]]

Advanced Features

ChatConfig — Reusable Configuration

from dataclasses import dataclass

@dataclass
class ProductionConfig(ChatConfig):
    model: str = "deepseek"
    retry: int = 3
    cache: bool = True
    temperature: float = 0.3

chat = Chat(ProductionConfig())

SendOptions — Per-call Options

from nb_llm import SendOptions

# IDE shows all available fields after typing SendOptions(
result = chat.ask("Give a precise answer", SendOptions(
    temperature=0,
    json=True,
    max_tokens=100,
))

Async Support

import asyncio

async def main():
    answer = await chat.aio_send("Hello")
    async for chunk in chat.aio_stream("Write a poem"):
        print(chunk, end="")
    answers = await chat.aio_batch(["Q1", "Q2"], concurrency=5)

asyncio.run(main())

History Persistence

# File / SQLite / Redis backends
chat = Chat(ChatConfig("deepseek",
    history_backend="sqlite",
    history_url="./history.db",
))

Fault Tolerance & Fallback

chat = Chat(ChatConfig("deepseek",
    retry=3,
    fallback="qwen",
    cache=True,
    cache_ttl=3600,
))

Conversation Save / Load

# Save conversation history to file
chat.save("conversation.json")

# Load conversation from file
chat.load("conversation.json")

# View current history
print(chat.history)

# Use as context manager (auto-clears history on exit)
with Chat(ChatConfig("deepseek")) as chat:
    chat.send("Hello")
    chat.send("How are you?")
# history is cleared here

Clone

# Create an independent copy with same config and tools
chat2 = chat.clone()

Token Counting

token_count = chat.count_tokens("Some long text...")
is_ok = chat.check_tokens("Some text", max_tokens=4000)

Multi-model Strategy

# Race mode: use whichever responds first
chat = Chat(ChatConfig(
    model=["gpt-4o", "deepseek", "qwen"],
    strategy="fastest",
))

Session Management

session_a = chat.session("user_001")
session_b = chat.session("user_002")
session_a.send("My name is Alice")
session_b.send("My name is Bob")

Cost Tracking

with chat.track_cost() as tracker:
    chat.send("Question 1")
    chat.send("Question 2")
print(f"Total tokens: {tracker.total_tokens}")

Hook System

Three ways to register hooks — choose the best fit for your scenario:

Method 1: Decorators (most concise)

chat = Chat(ChatConfig("deepseek"))

@chat.on_before
def log_request(messages, options):
    print(f"[Sending] {messages[-1]['content'][:50]}...")

@chat.on_after
def log_response(response, usage):
    print(f"[Received] tokens: {usage.total_tokens}")

@chat.on_error
def handle_error(error):
    print(f"[Error] {error}")

@chat.on_tool_call
def log_tool(name, args):
    print(f"[Tool] {name}({args})")

Method 2: Manual Registration (dynamic)

The decorator syntax @chat.on_before is sugar for chat.on_before(func) — you can call it directly:

def my_logger(messages, options):
    print(f"[LOG] {messages[-1]['content'][:50]}...")

chat.on_before(my_logger)     # same as @chat.on_before
chat.on_after(my_callback)    # same as @chat.on_after
chat.on_error(my_handler)     # same as @chat.on_error
chat.on_tool_call(my_tracer)  # same as @chat.on_tool_call

Useful for runtime dynamic hook registration, or batch-registering the same hooks across multiple Chat instances.

Method 3: Inherit Chat (unified behavior)

Subclass Chat to define hooks once — all instances get them automatically:

class LoggedChat(Chat):
    def __init__(self, config):
        super().__init__(config)
        self.on_before(self._log_before)
        self.on_after(self._log_after)

    def _log_before(self, messages, options):
        print(f"[Sending] {messages[-1]['content'][:50]}...")

    def _log_after(self, response, usage):
        print(f"[Received] tokens: {usage.total_tokens}")

chat = LoggedChat(ChatConfig("deepseek"))
# No need to repeat registration for each instance
Hook Trigger Callback Signature
on_before Before sending request (messages, options)
on_after After receiving response (response, usage)
on_error On exception (error)
on_tool_call After each tool call (name, args)

Tool Management

Beyond @chat.tool, you can manage tools programmatically:

def search(query: str) -> str:
    """Search the web"""
    return f"Results for {query}"

chat.add_tool(search)            # Register a tool
chat.add_tools([func1, func2])   # Register multiple tools
chat.remove_tool("search")       # Remove by name
chat.remove_tool(search)         # Remove by function
chat.clear_tools()               # Remove all tools
print(chat.tools)                # List all tool schemas

@step Observability

from nb_llm import step

@step("translate")
def translate(text):
    return chat.ask(f"Translate to English: {text}")

@step("summarize")
def summarize(text):
    return chat.ask(f"Summarize: {text}")

# Each step emits events (start/end/error) with timing info
result = translate("你好世界")

Register listeners to capture step events:

from nb_llm.workflow.step import on_step_event

@on_step_event
def log_steps(event_type, data):
    print(f"[{event_type}] {data['name']} ({data.get('elapsed', 0):.2f}s)")

Custom Provider Registration

from nb_llm import register_provider

register_provider(
    name="my-llm",
    model="my-model-v1",
    base_url="https://my-api.com/v1",
    api_key_env="MY_API_KEY",
)

# Now use it like any built-in model
chat = Chat(ChatConfig("my-llm"))

CLI Tools

# Version
python -m nb_llm version

# List models
python -m nb_llm models

# Single query (requires provider's API Key)
python -m nb_llm ask "Hello" --model deepseek --api-key "$DEEPSEEK_API_KEY"

# Interactive chat
python -m nb_llm chat --model deepseek --stream --api-key "$DEEPSEEK_API_KEY"

# Real-world example using SiliconFlow
python -m nb_llm ask "Hello"  --model "tencent/Hunyuan-MT-7B" --base-url "https://api.siliconflow.cn/v1" --api-key "$YOUR_SILICONFLOW_API_KEY"

Built-in Model Support

Alias Actual Model Provider
deepseek deepseek-chat DeepSeek
gpt-4o gpt-4o OpenAI
claude claude-sonnet-4 Anthropic
qwen qwen-plus Qwen (Alibaba)
glm glm-4 Zhipu GLM
siliconflow DeepSeek-V3 SiliconFlow
ollama llama3 Ollama (local)

Also supports any OpenAI-compatible API:

chat = Chat(ChatConfig(
    model="any-model-name",
    base_url="https://your-api.com/v1",
    api_key="sk-xxx",
))

Response Objects

All response objects support .attribute access (IDE completion) + .to_dict() conversion.

result = chat.send("Hello")
print(result)                    # Use as string
print(result.text)               # Explicit text access
print(result.usage.total_tokens) # Token usage
print(result.model)              # Model used
print(result.to_dict())          # Convert to dict
print(result.to_json_str())      # Formatted JSON string (indent=4)
Object Purpose Key Attributes
ChatResponse(str) Response value .text .usage .tool_calls_made .parsed .model
UsageInfo Token usage .prompt_tokens .completion_tokens .total_tokens
ToolCallRecord Tool call record .name .args .result .elapsed
StreamResponse Streaming response .text .usage .finish_reason
TeamResult Multi-agent result .conclusion .transcript .rounds

Project Structure

nb_llm/
├── core/           # Chat core, config, response, history backends
├── providers/      # OpenAI / Anthropic adapters, model registry
├── tools/          # Tool Schema generation, execution engine
├── agents/         # Router, Team multi-agent
├── rag/            # RAG retrieval-augmented generation
├── embedding/      # Embedding vectorization
├── middleware/      # Cache middleware
├── workflow/       # @step observability
└── __main__.py     # CLI entry point

Documentation


Compatibility

  • Python 3.7+
  • Core dependency: openai
  • Optional: httpx, python-dotenv, tiktoken, pydantic, anthropic, redis

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nb_llm-0.1.0.tar.gz (48.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nb_llm-0.1.0-py3-none-any.whl (51.2 kB view details)

Uploaded Python 3

File details

Details for the file nb_llm-0.1.0.tar.gz.

File metadata

  • Download URL: nb_llm-0.1.0.tar.gz
  • Upload date:
  • Size: 48.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for nb_llm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e455b1b5b26ec4db8ebffbcb0bb3400062b6ec4590129bd52831e7a752522965
MD5 755eea0257493944d46c9ac3611b8e90
BLAKE2b-256 af828b681a0949fa72cb78cecc1ae6b3378774e872c082a27b6137642edbfb23

See more details on using hashes here.

File details

Details for the file nb_llm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nb_llm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 51.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for nb_llm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c5f10f8b9e395232b6be33d62e03ca86cfa89f2b80f03cf3c5112867f74fe627
MD5 f503bbdf887baac308368c09c10f4700
BLAKE2b-256 e6620f1d09525574497f829bfd28c3dbc12fbd26f213edaa9ea06a752a0a0dd0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page