Skip to main content

Powerful LLM framework — 3 lines of code to do what LangChain takes 30. Chat, Tools, RAG, Agents, Streaming.

Project description

nb_llm

A powerful LLM framework — 3 lines of code to do what LangChain takes 30 lines.

Python 3.7+ License: MIT

中文文档 (Chinese)


Why nb_llm?

Comparison nb_llm LangChain + LangGraph
Core abstraction 1 (Chat) 10+ (LLM, Chain, Agent, Tool, Prompt, Parser, Memory...)
Learning curve 1 hour 1 week+
Install pip install nb_llm langchain + langchain-core + langchain-openai + langgraph + ...
Workflow Native Python if/for/while StateGraph DSL + conditional edges + state machines
Tool calls @chat.tool one line @tool + Prompt + Agent + AgentExecutor

Installation

pip install nb_llm

Get Started in 30 Seconds

from nb_llm import Chat, ChatConfig

chat = Chat(ChatConfig("deepseek"))
print(chat.ask("Explain Python in one sentence"))

That's it. Auto-detects model provider, auto-discovers API Key.


Core Features

Multi-turn Conversation

chat = Chat(ChatConfig("deepseek"))
chat.send("My name is Alice")
chat.send("I'm 25 years old")
print(chat.send("What's my name? How old am I?"))
# "Your name is Alice, and you are 25 years old"

send() remembers context, ask() is stateless — they don't interfere with each other.

Streaming Output

for chunk in chat.stream("Write a poem about spring"):
    print(chunk, end="", flush=True)

Tool Calls (Agent)

chat = Chat(ChatConfig("deepseek"))

@chat.tool
def get_weather(city: str) -> str:
    """Get weather for a city"""
    return f"{city}: Sunny, 25°C"

answer = chat.send("What's the weather in Beijing?")
# Automatically calls get_weather("Beijing"), generates answer based on result

Compare with LangChain's @tool + AgentExecutor + Prompt template — nb_llm only needs @chat.tool.

Structured Output (Pydantic)

from pydantic import BaseModel, Field

class Movie(BaseModel):
    title: str = Field(description="Movie title")
    year: int = Field(description="Release year")
    rating: float = Field(description="Rating, 1-10")

result = chat.ask("Recommend a sci-fi movie", SendOptions(response_type=Movie))
movie = result.parsed  # Movie object with full IDE completion
print(movie.title, movie.year, movie.rating)

Pipeline Composition

translator = Chat(ChatConfig("deepseek", system="Translate to English"))
summarizer = Chat(ChatConfig("deepseek", system="Summarize in one sentence"))

pipeline = translator >> summarizer
result = pipeline("AI is changing the world")

Batch Processing

answers = chat.batch(
    ["What is Python?", "What is Java?", "What is Go?"],
    concurrency=5,
)

Multi-Agent Collaboration

from nb_llm import Team

pm = Chat(ChatConfig("deepseek", system="You are a product manager", name="PM"))
dev = Chat(ChatConfig("deepseek", system="You are a developer", name="Dev"))
qa = Chat(ChatConfig("deepseek", system="You are a QA engineer", name="QA"))

result = Team(pm, dev, qa).discuss("Design login feature", rounds=3)
print(result.conclusion)

Vision / Multimodal

# Analyze an image (URL or local file)
result = chat.ask(
    "What's in this image?",
    SendOptions(image="https://example.com/photo.jpg")
)

# Multiple images + local file
result = chat.ask(
    "Compare these two diagrams",
    SendOptions(images=["diagram1.png", "diagram2.png"])
)

Local files are automatically converted to base64 data URLs.

Router (Intelligent Routing)

from nb_llm import Router

math_chat = Chat(ChatConfig("deepseek", system="You are a math expert"))
code_chat = Chat(ChatConfig("deepseek", system="You are a coding expert"))
classifier = Chat(ChatConfig("deepseek"))

router = Router(
    experts={"math": math_chat, "code": code_chat},
    classifier=classifier,
)

# Automatically routes to the appropriate expert
answer = router.send("Solve x^2 + 3x - 4 = 0")  # → math expert
answer = router.send("Write a Python sort function")  # → code expert

RAG (Retrieval-Augmented Generation)

from nb_llm import RAG, RAGConfig

# Basic usage
rag = RAG(RAGConfig(model="deepseek"))
rag.add("./docs/")  # Load a directory
rag.add("manual.pdf")  # Load a single file
answer = rag.chat("How to install this product?")
print(answer)
for src in answer.sources:
    print(f"{src.file}: {src.chunk[:50]}...")

RAG with ChromaDB Persistence + Custom Embedding

rag = RAG(RAGConfig(
    model="deepseek",
    embedding_model="BAAI/bge-m3",
    embedding_api_key="your-key",
    embedding_base_url="https://api.siliconflow.cn/v1",
    chunk_size=5000,
    chunk_overlap=500,
    top_k=15,
    vectorstore="chromadb",
    vectorstore_path="./my_vectordb",
))

# First run: vectorizes and persists to disk
# Second run: loads from disk, skips re-vectorization
if len(rag.vectorstore) == 0:
    rag.add("large_document.txt")

answer = rag.chat("How does it work?")

Standalone Embedding

from nb_llm import Embedding

emb = Embedding(model="BAAI/bge-m3", api_key="...", base_url="...")
vector = emb.embed("Hello world")  # Single text → List[float]
vectors = emb.embed(["Hello", "World"])  # Batch → List[List[float]]

Advanced Features

ChatConfig — Reusable Configuration

from dataclasses import dataclass

@dataclass
class ProductionConfig(ChatConfig):
    model: str = "deepseek"
    retry: int = 3
    cache: bool = True
    temperature: float = 0.3

chat = Chat(ProductionConfig())

SendOptions — Per-call Options

from nb_llm import SendOptions

# IDE shows all available fields after typing SendOptions(
result = chat.ask("Give a precise answer", SendOptions(
    temperature=0,
    json=True,
    max_tokens=100,
))

Async Support

import asyncio

async def main():
    answer = await chat.aio_send("Hello")
    async for chunk in chat.aio_stream("Write a poem"):
        print(chunk, end="")
    answers = await chat.aio_batch(["Q1", "Q2"], concurrency=5)

asyncio.run(main())

History Persistence

# File / SQLite / Redis backends
chat = Chat(ChatConfig("deepseek",
    history_backend="sqlite",
    history_url="./history.db",
))

Fault Tolerance & Fallback

chat = Chat(ChatConfig("deepseek",
    retry=3,
    fallback="qwen",
    cache=True,
    cache_ttl=3600,
))

Conversation Save / Load

# Save conversation history to file
chat.save("conversation.json")

# Load conversation from file
chat.load("conversation.json")

# View current history
print(chat.history)

# Use as context manager (auto-clears history on exit)
with Chat(ChatConfig("deepseek")) as chat:
    chat.send("Hello")
    chat.send("How are you?")
# history is cleared here

Clone

# Create an independent copy with same config and tools
chat2 = chat.clone()

Token Counting

token_count = chat.count_tokens("Some long text...")
is_ok = chat.check_tokens("Some text", max_tokens=4000)

Multi-model Strategy

# Race mode: use whichever responds first
chat = Chat(ChatConfig(
    model=["gpt-4o", "deepseek", "qwen"],
    strategy="fastest",
))

Session Management

session_a = chat.session("user_001")
session_b = chat.session("user_002")
session_a.send("My name is Alice")
session_b.send("My name is Bob")

Cost Tracking

with chat.track_cost() as tracker:
    chat.send("Question 1")
    chat.send("Question 2")
print(f"Total tokens: {tracker.total_tokens}")

Hook System

Three ways to register hooks — choose the best fit for your scenario:

Method 1: Decorators (most concise)

chat = Chat(ChatConfig("deepseek"))

@chat.on_before
def log_request(messages, options):
    print(f"[Sending] {messages[-1]['content'][:50]}...")

@chat.on_after
def log_response(response, usage):
    print(f"[Received] tokens: {usage.total_tokens}")

@chat.on_error
def handle_error(error):
    print(f"[Error] {error}")

@chat.on_tool_call
def log_tool(name, args):
    print(f"[Tool] {name}({args})")

Method 2: Manual Registration (dynamic)

The decorator syntax @chat.on_before is sugar for chat.on_before(func) — you can call it directly:

def my_logger(messages, options):
    print(f"[LOG] {messages[-1]['content'][:50]}...")

chat.on_before(my_logger)     # same as @chat.on_before
chat.on_after(my_callback)    # same as @chat.on_after
chat.on_error(my_handler)     # same as @chat.on_error
chat.on_tool_call(my_tracer)  # same as @chat.on_tool_call

Useful for runtime dynamic hook registration, or batch-registering the same hooks across multiple Chat instances.

Method 3: Inherit Chat (unified behavior)

Subclass Chat to define hooks once — all instances get them automatically:

class LoggedChat(Chat):
    def __init__(self, config):
        super().__init__(config)
        self.on_before(self._log_before)
        self.on_after(self._log_after)

    def _log_before(self, messages, options):
        print(f"[Sending] {messages[-1]['content'][:50]}...")

    def _log_after(self, response, usage):
        print(f"[Received] tokens: {usage.total_tokens}")

chat = LoggedChat(ChatConfig("deepseek"))
# No need to repeat registration for each instance
Hook Trigger Callback Signature
on_before Before sending request (messages, options)
on_after After receiving response (response, usage)
on_error On exception (error)
on_tool_call After each tool call (name, args)

Tool Management

Beyond @chat.tool, you can manage tools programmatically:

def search(query: str) -> str:
    """Search the web"""
    return f"Results for {query}"

chat.add_tool(search)            # Register a tool
chat.add_tools([func1, func2])   # Register multiple tools
chat.remove_tool("search")       # Remove by name
chat.remove_tool(search)         # Remove by function
chat.clear_tools()               # Remove all tools
print(chat.tools)                # List all tool schemas

@step Observability

from nb_llm import step

@step("translate")
def translate(text):
    return chat.ask(f"Translate to English: {text}")

@step("summarize")
def summarize(text):
    return chat.ask(f"Summarize: {text}")

# Each step emits events (start/end/error) with timing info
result = translate("你好世界")

Register listeners to capture step events:

from nb_llm.workflow.step import on_step_event

@on_step_event
def log_steps(event_type, data):
    print(f"[{event_type}] {data['name']} ({data.get('elapsed', 0):.2f}s)")

Custom Provider Registration

from nb_llm import register_provider

register_provider(
    name="my-llm",
    model="my-model-v1",
    base_url="https://my-api.com/v1",
    api_key_env="MY_API_KEY",
)

# Now use it like any built-in model
chat = Chat(ChatConfig("my-llm"))

CLI Tools

# Version
python -m nb_llm version

# List models
python -m nb_llm models

# Single query (requires provider's API Key)
python -m nb_llm ask "Hello" --model deepseek --api-key "$DEEPSEEK_API_KEY"

# Interactive chat
python -m nb_llm chat --model deepseek --stream --api-key "$DEEPSEEK_API_KEY"

# Real-world example using SiliconFlow
python -m nb_llm ask "Hello"  --model "tencent/Hunyuan-MT-7B" --base-url "https://api.siliconflow.cn/v1" --api-key "$YOUR_SILICONFLOW_API_KEY"

Built-in Model Support

Alias Actual Model Provider
deepseek deepseek-chat DeepSeek
gpt-4o gpt-4o OpenAI
claude claude-sonnet-4 Anthropic
qwen qwen-plus Qwen (Alibaba)
glm glm-4 Zhipu GLM
siliconflow DeepSeek-V3 SiliconFlow
ollama llama3 Ollama (local)

Also supports any OpenAI-compatible API:

chat = Chat(ChatConfig(
    model="any-model-name",
    base_url="https://your-api.com/v1",
    api_key="sk-xxx",
))

Response Objects

All response objects support .attribute access (IDE completion) + .to_dict() conversion.

result = chat.send("Hello")
print(result)                    # Use as string
print(result.text)               # Explicit text access
print(result.usage.total_tokens) # Token usage
print(result.model)              # Model used
print(result.to_dict())          # Convert to dict
print(result.to_json_str())      # Formatted JSON string (indent=4)
Object Purpose Key Attributes
ChatResponse(str) Response value .text .usage .tool_calls_made .parsed .model
UsageInfo Token usage .prompt_tokens .completion_tokens .total_tokens
ToolCallRecord Tool call record .name .args .result .elapsed
StreamResponse Streaming response .text .usage .finish_reason
TeamResult Multi-agent result .conclusion .transcript .rounds

Project Structure

nb_llm/
├── core/           # Chat core, config, response, history backends
├── providers/      # OpenAI / Anthropic adapters, model registry
├── tools/          # Tool Schema generation, execution engine
├── agents/         # Router, Team multi-agent
├── rag/            # RAG retrieval-augmented generation
├── embedding/      # Embedding vectorization
├── middleware/      # Cache middleware
├── workflow/       # @step observability
└── __main__.py     # CLI entry point

Documentation


Compatibility

  • Python 3.7+
  • Core dependency: openai
  • Optional: httpx, python-dotenv, tiktoken, pydantic, anthropic, redis

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nb_llm-0.2.0.tar.gz (48.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nb_llm-0.2.0-py3-none-any.whl (51.2 kB view details)

Uploaded Python 3

File details

Details for the file nb_llm-0.2.0.tar.gz.

File metadata

  • Download URL: nb_llm-0.2.0.tar.gz
  • Upload date:
  • Size: 48.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for nb_llm-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f451cb1bac6b11e4a39be8153662ebc1c38b2a59e2396f98db30cc4833777ef4
MD5 11dfc78e9121e4bf3018f657b99b949b
BLAKE2b-256 0de64b010818b797ce233c38df85eab8bb3a8a225575938e9cdb0bbc0ef38600

See more details on using hashes here.

File details

Details for the file nb_llm-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: nb_llm-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 51.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for nb_llm-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f70d14fce6fef56fc41942fa72719e7d91cabd148bd7758980442bcf6a6a7b19
MD5 1ccb9c5982d5faafb49838ce960a586a
BLAKE2b-256 865aca0d946e3cba66062942aa31d261561b680995802dda9da4108bcbd8ea37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page