Powerful LLM framework — 3 lines of code to do what LangChain takes 30. Chat, Tools, RAG, Agents, Streaming.

These details have not been verified by PyPI

Project links

Project description

nb_llm

A powerful LLM framework — 3 lines of code to do what LangChain takes 30 lines.

Why nb_llm?

Comparison	nb_llm	LangChain + LangGraph
Core abstraction	1 (Chat)	10+ (LLM, Chain, Agent, Tool, Prompt, Parser, Memory...)
Learning curve	1 hour	1 week+
Install	`pip install nb_llm`	langchain + langchain-core + langchain-openai + langgraph + ...
Workflow	Native Python if/for/while	StateGraph DSL + conditional edges + state machines
Tool calls	`@chat.tool` one line	@tool + Prompt + Agent + AgentExecutor

Installation

pip install nb_llm

Get Started in 30 Seconds

from nb_llm import Chat, ChatConfig

chat = Chat(ChatConfig("deepseek"))
print(chat.ask("Explain Python in one sentence"))

That's it. Auto-detects model provider, auto-discovers API Key.

Core Features

Multi-turn Conversation

chat = Chat(ChatConfig("deepseek"))
chat.send("My name is Alice")
chat.send("I'm 25 years old")
print(chat.send("What's my name? How old am I?"))
# "Your name is Alice, and you are 25 years old"

send() remembers context, ask() is stateless — they don't interfere with each other.

Streaming Output

for chunk in chat.stream("Write a poem about spring"):
    print(chunk, end="", flush=True)

Tool Calls (Agent)

chat = Chat(ChatConfig("deepseek"))

@chat.tool
def get_weather(city: str) -> str:
    """Get weather for a city"""
    return f"{city}: Sunny, 25°C"

answer = chat.send("What's the weather in Beijing?")
# Automatically calls get_weather("Beijing"), generates answer based on result

Compare with LangChain's @tool + AgentExecutor + Prompt template — nb_llm only needs @chat.tool.

Structured Output (Pydantic)

from pydantic import BaseModel, Field

class Movie(BaseModel):
    title: str = Field(description="Movie title")
    year: int = Field(description="Release year")
    rating: float = Field(description="Rating, 1-10")

result = chat.ask("Recommend a sci-fi movie", SendOptions(response_type=Movie))
movie = result.parsed  # Movie object with full IDE completion
print(movie.title, movie.year, movie.rating)

Pipeline Composition

translator = Chat(ChatConfig("deepseek", system="Translate to English"))
summarizer = Chat(ChatConfig("deepseek", system="Summarize in one sentence"))

pipeline = translator >> summarizer
result = pipeline("AI is changing the world")

Batch Processing

answers = chat.batch(
    ["What is Python?", "What is Java?", "What is Go?"],
    concurrency=5,
)

Multi-Agent Collaboration

from nb_llm import Team

pm = Chat(ChatConfig("deepseek", system="You are a product manager", name="PM"))
dev = Chat(ChatConfig("deepseek", system="You are a developer", name="Dev"))
qa = Chat(ChatConfig("deepseek", system="You are a QA engineer", name="QA"))

result = Team(pm, dev, qa).discuss("Design login feature", rounds=3)
print(result.conclusion)

Vision / Multimodal

# Analyze an image (URL or local file)
result = chat.ask(
    "What's in this image?",
    SendOptions(image="https://example.com/photo.jpg")
)

# Multiple images + local file
result = chat.ask(
    "Compare these two diagrams",
    SendOptions(images=["diagram1.png", "diagram2.png"])
)

Local files are automatically converted to base64 data URLs.

Router (Intelligent Routing)

from nb_llm import Router

math_chat = Chat(ChatConfig("deepseek", system="You are a math expert"))
code_chat = Chat(ChatConfig("deepseek", system="You are a coding expert"))
classifier = Chat(ChatConfig("deepseek"))

router = Router(
    experts={"math": math_chat, "code": code_chat},
    classifier=classifier,
)

# Automatically routes to the appropriate expert
answer = router.send("Solve x^2 + 3x - 4 = 0")  # → math expert
answer = router.send("Write a Python sort function")  # → code expert

RAG (Retrieval-Augmented Generation)

from nb_llm import RAG, RAGConfig

# Basic usage
rag = RAG(RAGConfig(model="deepseek"))
rag.add("./docs/")  # Load a directory
rag.add("manual.pdf")  # Load a single file
answer = rag.chat("How to install this product?")
print(answer)
for src in answer.sources:
    print(f"{src.file}: {src.chunk[:50]}...")

RAG with ChromaDB Persistence + Custom Embedding

rag = RAG(RAGConfig(
    model="deepseek",
    embedding_model="BAAI/bge-m3",
    embedding_api_key="your-key",
    embedding_base_url="https://api.siliconflow.cn/v1",
    chunk_size=5000,
    chunk_overlap=500,
    top_k=15,
    vectorstore="chromadb",
    vectorstore_path="./my_vectordb",
))

# First run: vectorizes and persists to disk
# Second run: loads from disk, skips re-vectorization
if len(rag.vectorstore) == 0:
    rag.add("large_document.txt")

answer = rag.chat("How does it work?")

Standalone Embedding

from nb_llm import Embedding

emb = Embedding(model="BAAI/bge-m3", api_key="...", base_url="...")
vector = emb.embed("Hello world")  # Single text → List[float]
vectors = emb.embed(["Hello", "World"])  # Batch → List[List[float]]

Advanced Features

ChatConfig — Reusable Configuration

from dataclasses import dataclass

@dataclass
class ProductionConfig(ChatConfig):
    model: str = "deepseek"
    retry: int = 3
    cache: bool = True
    temperature: float = 0.3

chat = Chat(ProductionConfig())

SendOptions — Per-call Options

from nb_llm import SendOptions

# IDE shows all available fields after typing SendOptions(
result = chat.ask("Give a precise answer", SendOptions(
    temperature=0,
    json=True,
    max_tokens=100,
))

Async Support

import asyncio

async def main():
    answer = await chat.aio_send("Hello")
    async for chunk in chat.aio_stream("Write a poem"):
        print(chunk, end="")
    answers = await chat.aio_batch(["Q1", "Q2"], concurrency=5)

asyncio.run(main())

History Persistence

# File / SQLite / Redis backends
chat = Chat(ChatConfig("deepseek",
    history_backend="sqlite",
    history_url="./history.db",
))

Fault Tolerance & Fallback

chat = Chat(ChatConfig("deepseek",
    retry=3,
    fallback="qwen",
    cache=True,
    cache_ttl=3600,
))

Conversation Save / Load

# Save conversation history to file
chat.save("conversation.json")

# Load conversation from file
chat.load("conversation.json")

# View current history
print(chat.history)

# Use as context manager (auto-clears history on exit)
with Chat(ChatConfig("deepseek")) as chat:
    chat.send("Hello")
    chat.send("How are you?")
# history is cleared here

Clone

# Create an independent copy with same config and tools
chat2 = chat.clone()

Token Counting

token_count = chat.count_tokens("Some long text...")
is_ok = chat.check_tokens("Some text", max_tokens=4000)

Multi-model Strategy

# Race mode: use whichever responds first
chat = Chat(ChatConfig(
    model=["gpt-4o", "deepseek", "qwen"],
    strategy="fastest",
))

Session Management

session_a = chat.session("user_001")
session_b = chat.session("user_002")
session_a.send("My name is Alice")
session_b.send("My name is Bob")

Cost Tracking

with chat.track_cost() as tracker:
    chat.send("Question 1")
    chat.send("Question 2")
print(f"Total tokens: {tracker.total_tokens}")

Hook System

Three ways to register hooks — choose the best fit for your scenario:

Method 1: Decorators (most concise)

chat = Chat(ChatConfig("deepseek"))

@chat.on_before
def log_request(messages, options):
    print(f"[Sending] {messages[-1]['content'][:50]}...")

@chat.on_after
def log_response(response, usage):
    print(f"[Received] tokens: {usage.total_tokens}")

@chat.on_error
def handle_error(error):
    print(f"[Error] {error}")

@chat.on_tool_call
def log_tool(name, args):
    print(f"[Tool] {name}({args})")

Method 2: Manual Registration (dynamic)

The decorator syntax @chat.on_before is sugar for chat.on_before(func) — you can call it directly:

def my_logger(messages, options):
    print(f"[LOG] {messages[-1]['content'][:50]}...")

chat.on_before(my_logger)     # same as @chat.on_before
chat.on_after(my_callback)    # same as @chat.on_after
chat.on_error(my_handler)     # same as @chat.on_error
chat.on_tool_call(my_tracer)  # same as @chat.on_tool_call

Useful for runtime dynamic hook registration, or batch-registering the same hooks across multiple Chat instances.

Method 3: Inherit Chat (unified behavior)

Subclass Chat to define hooks once — all instances get them automatically:

class LoggedChat(Chat):
    def __init__(self, config):
        super().__init__(config)
        self.on_before(self._log_before)
        self.on_after(self._log_after)

    def _log_before(self, messages, options):
        print(f"[Sending] {messages[-1]['content'][:50]}...")

    def _log_after(self, response, usage):
        print(f"[Received] tokens: {usage.total_tokens}")

chat = LoggedChat(ChatConfig("deepseek"))
# No need to repeat registration for each instance

Hook	Trigger	Callback Signature
`on_before`	Before sending request	`(messages, options)`
`on_after`	After receiving response	`(response, usage)`
`on_error`	On exception	`(error)`
`on_tool_call`	After each tool call	`(name, args)`

Tool Management

Beyond @chat.tool, you can manage tools programmatically:

def search(query: str) -> str:
    """Search the web"""
    return f"Results for {query}"

chat.add_tool(search)            # Register a tool
chat.add_tools([func1, func2])   # Register multiple tools
chat.remove_tool("search")       # Remove by name
chat.remove_tool(search)         # Remove by function
chat.clear_tools()               # Remove all tools
print(chat.tools)                # List all tool schemas

@step Observability

from nb_llm import step

@step("translate")
def translate(text):
    return chat.ask(f"Translate to English: {text}")

@step("summarize")
def summarize(text):
    return chat.ask(f"Summarize: {text}")

# Each step emits events (start/end/error) with timing info
result = translate("你好世界")

from nb_llm.workflow.step import on_step_event

@on_step_event
def log_steps(event_type, data):
    print(f"[{event_type}] {data['name']} ({data.get('elapsed', 0):.2f}s)")

Custom Provider Registration

from nb_llm import register_provider

register_provider(
    name="my-llm",
    model="my-model-v1",
    base_url="https://my-api.com/v1",
    api_key_env="MY_API_KEY",
)

# Now use it like any built-in model
chat = Chat(ChatConfig("my-llm"))

CLI Tools

# Version
python -m nb_llm version

# List models
python -m nb_llm models

# Single query (requires provider's API Key)
python -m nb_llm ask "Hello" --model deepseek --api-key "$DEEPSEEK_API_KEY"

# Interactive chat
python -m nb_llm chat --model deepseek --stream --api-key "$DEEPSEEK_API_KEY"

# Real-world example using SiliconFlow
python -m nb_llm ask "Hello"  --model "tencent/Hunyuan-MT-7B" --base-url "https://api.siliconflow.cn/v1" --api-key "$YOUR_SILICONFLOW_API_KEY"

Built-in Model Support

Alias	Actual Model	Provider
`deepseek`	deepseek-chat	DeepSeek
`gpt-4o`	gpt-4o	OpenAI
`claude`	claude-sonnet-4	Anthropic
`qwen`	qwen-plus	Qwen (Alibaba)
`glm`	glm-4	Zhipu GLM
`siliconflow`	DeepSeek-V3	SiliconFlow
`ollama`	llama3	Ollama (local)

Also supports any OpenAI-compatible API:

chat = Chat(ChatConfig(
    model="any-model-name",
    base_url="https://your-api.com/v1",
    api_key="sk-xxx",
))

Response Objects

All response objects support .attribute access (IDE completion) + .to_dict() conversion.

result = chat.send("Hello")
print(result)                    # Use as string
print(result.text)               # Explicit text access
print(result.usage.total_tokens) # Token usage
print(result.model)              # Model used
print(result.to_dict())          # Convert to dict
print(result.to_json_str())      # Formatted JSON string (indent=4)

Object	Purpose	Key Attributes
`ChatResponse(str)`	Response value	`.text` `.usage` `.tool_calls_made` `.parsed` `.model`
`UsageInfo`	Token usage	`.prompt_tokens` `.completion_tokens` `.total_tokens`
`ToolCallRecord`	Tool call record	`.name` `.args` `.result` `.elapsed`
`StreamResponse`	Streaming response	`.text` `.usage` `.finish_reason`
`TeamResult`	Multi-agent result	`.conclusion` `.transcript` `.rounds`

Project Structure

nb_llm/
├── core/           # Chat core, config, response, history backends
├── providers/      # OpenAI / Anthropic adapters, model registry
├── tools/          # Tool Schema generation, execution engine
├── agents/         # Router, Team multi-agent
├── rag/            # RAG retrieval-augmented generation
├── embedding/      # Embedding vectorization
├── middleware/      # Cache middleware
├── workflow/       # @step observability
└── __main__.py     # CLI entry point

Documentation

API Documentation (CN) — Full API usage + LangChain comparison
Design Documentation (CN) — Architecture + module details
API Documentation (EN) — Full API usage (English)
Design Documentation (EN) — Architecture design (English)
Changelog — All major changes

Compatibility

Python 3.7+
Core dependency: openai
Optional: httpx, python-dotenv, tiktoken, pydantic, anthropic, redis

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 21, 2026

0.1.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nb_llm-0.2.0.tar.gz (48.7 kB view details)

Uploaded Apr 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nb_llm-0.2.0-py3-none-any.whl (51.2 kB view details)

Uploaded Apr 21, 2026 Python 3

File details

Details for the file nb_llm-0.2.0.tar.gz.

File metadata

Download URL: nb_llm-0.2.0.tar.gz
Upload date: Apr 21, 2026
Size: 48.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for nb_llm-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`f451cb1bac6b11e4a39be8153662ebc1c38b2a59e2396f98db30cc4833777ef4`
MD5	`11dfc78e9121e4bf3018f657b99b949b`
BLAKE2b-256	`0de64b010818b797ce233c38df85eab8bb3a8a225575938e9cdb0bbc0ef38600`

See more details on using hashes here.

File details

Details for the file nb_llm-0.2.0-py3-none-any.whl.

File metadata

Download URL: nb_llm-0.2.0-py3-none-any.whl
Upload date: Apr 21, 2026
Size: 51.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for nb_llm-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f70d14fce6fef56fc41942fa72719e7d91cabd148bd7758980442bcf6a6a7b19`
MD5	`1ccb9c5982d5faafb49838ce960a586a`
BLAKE2b-256	`865aca0d946e3cba66062942aa31d261561b680995802dda9da4108bcbd8ea37`

See more details on using hashes here.

nb-llm 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nb_llm

Why nb_llm?

Installation

Get Started in 30 Seconds

Core Features

Multi-turn Conversation

Streaming Output

Tool Calls (Agent)

Structured Output (Pydantic)

Pipeline Composition

Batch Processing

Multi-Agent Collaboration

Vision / Multimodal

Router (Intelligent Routing)

RAG (Retrieval-Augmented Generation)

RAG with ChromaDB Persistence + Custom Embedding

Standalone Embedding

Advanced Features

ChatConfig — Reusable Configuration

SendOptions — Per-call Options

Async Support

History Persistence

Fault Tolerance & Fallback

Conversation Save / Load

Clone

Token Counting

Multi-model Strategy

Session Management

Cost Tracking

Hook System

Method 1: Decorators (most concise)

Method 2: Manual Registration (dynamic)

Method 3: Inherit Chat (unified behavior)

Tool Management

@step Observability

Custom Provider Registration

CLI Tools

Built-in Model Support

Response Objects

Project Structure

Documentation

Compatibility

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes