A comprehensive Python framework for building AI agents with support for both cloud-based LLM APIs and local models

These details have not been verified by PyPI

Project links

Project description

NeuralNode Documentation

Introduction
Features
Architecture Overview
Installation Guide
Quick Start
Core Concepts
API Reference
Model System
Training Guide
Inference System
Configuration & Settings
Plugins / Extensions System
Performance Optimization
Use Cases
Comparison
Troubleshooting
Roadmap
Contributing Guide
License
Credits & Author

1. Introduction

NeuralNode is a comprehensive Python framework for building AI agents with support for both cloud-based LLM APIs and local models. It provides a unified interface for creating intelligent applications with advanced features like multi-modal processing, autonomous agent capabilities, and enterprise-grade security.

Key Differentiators

Dual Mode Operation: Works with both cloud APIs (OpenAI, Anthropic, Google) and local LLMs (Llama, Mistral, etc.)
Security First: Built-in sandboxing, human-in-the-loop approval, and privacy mode
Production Ready: Observability, monitoring, and distributed inference capabilities
Extensible: Plugin system for custom providers, tools, and integrations
Performance Optimized: FAISS vector search, model quantization, and caching layers

Who Should Use NeuralNode?

Developers building AI-powered applications
Data scientists creating LLM pipelines
Enterprises requiring on-premise AI solutions
Researchers experimenting with agent architectures
Startups needing scalable AI infrastructure

2. Features

2.1 Core Features

Unified LLM Interface

import neuralnode as nn

# Works with any provider
ai = nn.NeuralNode(provider="openai", model="gpt-4")
ai = nn.NeuralNode(provider="anthropic", model="claude-3-sonnet")
ai = nn.NeuralNode(provider="ollama", model="llama3")

# Same interface for all
response = ai.chat("Hello, world!")

Advanced Memory System

from neuralnode.memory import AdvancedMemorySystem

memory = AdvancedMemorySystem(
    short_term_limit=10,
    long_term_db_path="./memory.db",
    enable_semantic=True
)

# Store information
memory.add_long_term("User prefers Python over JavaScript")

# Semantic search
results = memory.search("What programming languages does the user like?", k=3)

Intelligent Agents

from neuralnode import Agent
from neuralnode.tools import WebSearch, FileManager

agent = Agent(
    llm=ai,
    tools=[WebSearch(), FileManager()],
    system_prompt="You are a helpful assistant."
)

# Agent automatically decides which tools to use
result = agent.run("Find the latest AI news and save to file")

Multi-Modal Processing

from neuralnode.chains import MultiModalChain

mm_chain = MultiModalChain(llm=ai)

# Process text + image + audio
result = mm_chain.chat_with_multimodal(
    text="What's in this image?",
    image_path="photo.jpg",
    audio_path="question.mp3"
)

2.2 Security Features

Human-in-the-Loop

from neuralnode.security import HumanInTheLoop

hitl = HumanInTheLoop(require_confirmation=["HIGH", "CRITICAL"])

# Will prompt user before executing dangerous operations
approved, reason = hitl.check_operation(
    "delete_file",
    {"path": "/important/data.txt"}
)

Sandboxed Code Execution

from neuralnode.tools.secure_code_interpreter import safe_execute

# Runs in isolated environment
result = safe_execute("""
def fibonacci(n):
    if n <= 1: return n
    return fibonacci(n-1) + fibonacci(n-2)

print(fibonacci(10))
""")

Privacy Mode

from neuralnode.security import PrivacyMode

privacy = PrivacyMode()
privacy.enable(password="secure_key")

# All data encrypted automatically
encrypted = privacy.encrypt_message("sensitive data")

2.3 Performance Features

FAISS Vector Search

from neuralnode.rag import FAISSVectorStore

store = FAISSVectorStore(embedding_dim=384, index_type="flat")
store.add_batch(documents)

# Search millions of documents in milliseconds
results = store.search(query_embedding, k=5)

Distributed Inference

from neuralnode.distributed import DistributedInferenceEngine

engine = DistributedInferenceEngine()
engine.add_node("gpu-1", "192.168.1.10", 8000)
engine.add_node("gpu-2", "192.168.1.11", 8000)

# Automatically distributes across GPUs
results = engine.parallel_generate(prompts)

Model Quantization

from neuralnode.local import ModelQuantizer

quantizer = ModelQuantizer()
quantizer.quantize(
    "meta-llama/Llama-2-7b",
    output_path="./llama-7b-q4.gguf",
    method="Q4_K_M"
)

2.4 Training Features

RLHF Pipeline

from neuralnode.training import CompleteRLHFPipeline

rlhf = CompleteRLHFPipeline(model, tokenizer)
rlhf.run_full_pipeline(
    prompts=training_prompts,
    test_prompts=eval_prompts,
    output_dir="./rlhf_output"
)

Fine-tuning with LoRA

from neuralnode.training import FineTuner

tuner = FineTuner(model="meta-llama/Llama-2-7b")
tuner.finetune(
    dataset="./training_data.json",
    method="lora",
    epochs=3,
    batch_size=4
)

Federated Learning

from neuralnode.training import FederatedLearningServer

server = FederatedLearningServer(model, config)
server.register_client("client_1", data_size=1000)
server.register_client("client_2", data_size=1500)

# Train without sharing raw data
final_model = server.train(num_rounds=10)

3. Architecture Overview

3.1 System Architecture

┌─────────────────────────────────────────────────────────────┐
│                    NeuralNode Framework                      │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │   LLM Layer  │  │  Agent Layer │  │  Tool Layer  │      │
│  │              │  │              │  │              │      │
│  │ - OpenAI     │  │ - ReAct      │  │ - Web Search │      │
│  │ - Anthropic  │  │ - Auto-Agent │  │ - File Sys   │      │
│  │ - Google     │  │ - Planning   │  │ - Browser    │      │
│  │ - Local      │  │ - Multi-Agent│ │ - Code Exec  │      │
│  │ - Ollama     │  │              │  │              │      │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘      │
│         │                 │                 │               │
│         └─────────────────┼─────────────────┘               │
│                         │                                   │
│  ┌──────────────────────┴──────────────────────┐            │
│  │           Memory & Context Layer             │            │
│  │                                            │            │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐   │            │
│  │  │ Short    │ │ Long     │ │ Semantic │   │            │
│  │  │ Term     │ │ Term     │ │ Memory   │   │            │
│  │  └──────────┘ └──────────┘ └──────────┘   │            │
│  └───────────────────────────────────────────┘            │
│                         │                                   │
│  ┌──────────────────────┴──────────────────────┐            │
│  │           Infrastructure Layer               │            │
│  │                                            │            │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐   │            │
│  │  │ Vector   │ │ Cache    │ │ Security │   │            │
│  │  │ Store    │ │ Layer    │ │ Layer    │   │            │
│  │  │ (FAISS)  │ │ (Redis)  │ │ (HITL)   │   │            │
│  │  └──────────┘ └──────────┘ └──────────┘   │            │
│  └───────────────────────────────────────────┘            │
│                                                              │
└─────────────────────────────────────────────────────────────┘

3.2 Data Flow

User Input
    │
    ▼
┌─────────────┐
│  Input      │ ──► Preprocessing ──► Safety Check
│  Processing │
└─────────────┘
    │
    ▼
┌─────────────┐
│   Memory    │ ──► Retrieve Context ──► Add to Prompt
│   Lookup    │
└─────────────┘
    │
    ▼
┌─────────────┐
│   LLM       │ ──► Generate Response
│   Core      │
└─────────────┘
    │
    ▼
┌─────────────┐
│  Tool       │ ──► Parse Tool Calls ──► Execute
│  Parser     │
└─────────────┘
    │
    ▼
┌─────────────┐
│  Response   │ ──► Postprocessing ──► Store in Memory
│  Builder    │
└─────────────┘
    │
    ▼
User Output

3.3 Component Diagram

                    NeuralNode Core
                         │
        ┌────────────────┼────────────────┐
        │                │                │
        ▼                ▼                ▼
   ┌─────────┐     ┌─────────┐      ┌─────────┐
   │ Providers│     │  Tools  │      │ Memory  │
   │          │     │         │      │         │
   │- OpenAI  │     │- Search │      │- SQLite │
   │- Claude  │     │- Files  │      │- FAISS  │
   │- Local   │     │- Browser│     │- Cache  │
   │- Ollama  │     │- Code   │      │         │
   └─────────┘     └─────────┘      └─────────┘
        │                │                │
        └────────────────┼────────────────┘
                         │
                         ▼
              ┌─────────────────┐
              │  Agent System   │
              │                 │
              │ - ReAct Pattern │
              │ - Planning      │
              │ - Multi-Agent   │
              └─────────────────┘
                         │
                         ▼
              ┌─────────────────┐
              │   Extensions    │
              │                 │
              │ - Plugins       │
              │ - Custom Tools  │
              │ - Integrations  │
              └─────────────────┘

4. Installation Guide

4.1 Prerequisites

Python 3.8 or higher
pip or conda package manager
For local models: 8GB+ RAM (16GB+ recommended)
For GPU acceleration: CUDA-capable GPU (optional)

4.2 Basic Installation

pip install neuralnode

4.3 Development Installation

git clone https://github.com/yourusername/neuralnode.git
cd neuralnode
pip install -e ".[dev]"

4.4 Optional Dependencies

# Vector search
pip install neuralnode[vectors]  # FAISS + ChromaDB

# Local models
pip install neuralnode[local]    # Ollama + llama.cpp

# Training
pip install neuralnode[training] # PyTorch + Transformers

# All features
pip install neuralnode[all]

4.5 Docker Installation

FROM python:3.11-slim

WORKDIR /app
RUN pip install neuralnode[all]

COPY . .
CMD ["python", "app.py"]

4.6 Verification

import neuralnode as nn
print(nn.__version__)

# Check available features
from neuralnode.utils.graceful_degradation import FeatureAvailability
FeatureAvailability.print_feature_matrix()

5. Quick Start

5.1 First Steps

import neuralnode as nn

# Initialize with OpenAI
ai = nn.NeuralNode(
    provider="openai",
    model="gpt-4",
    api_key="your-api-key"  # Or set OPENAI_API_KEY env var
)

# Simple chat
response = ai.chat("What is machine learning?")
print(response.text)

5.2 Building Your First Agent

from neuralnode import Agent
from neuralnode.tools import WebSearch, Calculator

agent = Agent(
    llm=ai,
    tools=[WebSearch(), Calculator()],
    system_prompt="You are a helpful research assistant."
)

# Agent automatically searches and calculates
result = agent.run("""
What is the population of Tokyo?
Calculate what percentage it is of Japan's total population.
""")
print(result)

5.3 Adding Memory

from neuralnode.memory import AdvancedMemorySystem

memory = AdvancedMemorySystem()
memory.add_long_term("User is a Python developer")

agent = Agent(llm=ai, memory=memory)

# Agent remembers previous context
response = agent.run("What programming language should I recommend?")
# Will consider that user knows Python

5.4 Using Local Models

# With Ollama (must be installed separately)
ai = nn.NeuralNode(provider="ollama", model="llama3")

# Or with local GGUF file
from neuralnode.local import LocalLLM

ai = LocalLLM(model_path="./models/llama-3-8b.gguf")

6. Core Concepts

6.1 NeuralNode

The core class that provides a unified interface to LLMs.

from neuralnode import NeuralNode

# Configuration
ai = NeuralNode(
    provider="openai",           # Provider name
    model="gpt-4",               # Model identifier
    temperature=0.7,             # Sampling temperature
    max_tokens=1000,               # Maximum response length
    timeout=30,                    # Request timeout
    retries=3,                     # Retry attempts
    cache=True,                    # Enable caching
)

6.2 Agent

An intelligent entity that can use tools and make decisions.

from neuralnode import Agent

agent = Agent(
    llm=ai,                        # LLM instance
    tools=[tool1, tool2],          # Available tools
    memory=memory,                 # Memory system
    max_steps=10,                  # Maximum reasoning steps
    system_prompt="...",           # System instructions
)

Agent types:

SimpleAgent: Basic question-answering
ToolAgent: Can use tools
ReActAgent: Reasoning and acting with self-correction
MultiAgent: Coordinates multiple specialized agents

6.3 Tools

Functions that agents can use to interact with the world.

from neuralnode.tools import Tool

# Creating a custom tool
class WeatherTool(Tool):
    def __init__(self):
        super().__init__(
            name="get_weather",
            description="Get current weather for a location",
            parameters={
                "location": {"type": "string", "description": "City name"}
            }
        )
    
    def execute(self, location: str) -> str:
        # Implementation
        return f"Weather in {location}: 25C, Sunny"

6.4 Memory Systems

Different types of memory for different time horizons.

Short-term Memory: Recent conversation context

from neuralnode.memory import ConversationMemory

memory = ConversationMemory(max_messages=10)

Long-term Memory: Persistent storage

from neuralnode.memory import SQLiteMemory

memory = SQLiteMemory(db_path="./memory.db")

Semantic Memory: Vector-based retrieval

from neuralnode.memory import SemanticMemory

memory = SemanticMemory(embedding_model="sentence-transformers/all-MiniLM-L6-v2")

6.5 RAG (Retrieval-Augmented Generation)

Combining LLMs with document retrieval.

from neuralnode.rag import RAG, Document

# Load documents
docs = [
    Document(content="NeuralNode is a framework...", metadata={"source": "docs"}),
    Document(content="Installation requires Python 3.8...", metadata={"source": "docs"})
]

# Create RAG system
rag = RAG(llm=ai, documents=docs)

# Query with context
answer = rag.query("What are the system requirements?")

6.6 Chains

Sequential processing pipelines.

from neuralnode.chains import Chain

# Define a chain
chain = Chain()
chain.add_step(lambda x: x.upper())
chain.add_step(lambda x: x + "!!!")
chain.add_step(lambda x: ai.chat(x))

# Execute
result = chain.run("hello")

7. API Reference

7.1 NeuralNode Class

class NeuralNode:
    """
    Unified interface for LLM providers.
    
    Args:
        provider: LLM provider name ("openai", "anthropic", "google", "ollama", etc.)
        model: Model identifier
        api_key: API key (or set via environment variable)
        temperature: Sampling temperature (0.0 to 2.0)
        max_tokens: Maximum tokens to generate
        timeout: Request timeout in seconds
        retries: Number of retry attempts
        cache: Enable response caching
        **kwargs: Provider-specific options
    """
    
    def __init__(
        self,
        provider: str,
        model: Optional[str] = None,
        api_key: Optional[str] = None,
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
        timeout: int = 30,
        retries: int = 3,
        cache: bool = True,
        **kwargs
    ):
        ...
    
    def chat(
        self,
        message: str,
        system: Optional[str] = None,
        history: Optional[List[Dict]] = None,
        **kwargs
    ) -> ChatResponse:
        """Send a chat message."""
        ...
    
    def stream(
        self,
        message: str,
        **kwargs
    ) -> Iterator[str]:
        """Stream response tokens."""
        ...
    
    def embed(
        self,
        text: Union[str, List[str]]
    ) -> Union[List[float], List[List[float]]]:
        """Generate embeddings."""
        ...

7.2 Agent Class

class Agent:
    """
    Intelligent agent with tool use capabilities.
    
    Args:
        llm: NeuralNode instance
        tools: List of available tools
        memory: Memory system
        max_steps: Maximum reasoning steps
        system_prompt: System instructions
    """
    
    def __init__(
        self,
        llm: NeuralNode,
        tools: Optional[List[Tool]] = None,
        memory: Optional[Any] = None,
        max_steps: int = 10,
        system_prompt: Optional[str] = None
    ):
        ...
    
    def run(
        self,
        task: str,
        context: Optional[Dict] = None
    ) -> str:
        """Execute a task."""
        ...
    
    def add_tool(self, tool: Tool) -> None:
        """Add a tool to the agent."""
        ...

7.3 Tool Class

class Tool:
    """
    Base class for tools.
    
    Args:
        name: Tool identifier
        description: What the tool does
        parameters: JSON Schema for parameters
        func: Function to execute
    """
    
    def __init__(
        self,
        name: str,
        description: str,
        parameters: Dict[str, Any],
        func: Optional[Callable] = None
    ):
        ...
    
    def execute(self, **kwargs) -> Any:
        """Execute the tool."""
        ...

7.4 Memory Classes

class ConversationMemory:
    """Short-term conversation memory."""
    
    def __init__(self, max_messages: int = 10):
        ...
    
    def add(self, role: str, content: str) -> None:
        ...
    
    def get_messages(self) -> List[Dict]:
        ...
    
    def clear(self) -> None:
        ...


class AdvancedMemorySystem:
    """Multi-tier memory system."""
    
    def __init__(
        self,
        short_term_limit: int = 10,
        long_term_db_path: Optional[str] = None,
        enable_semantic: bool = False,
        embedding_model: Optional[str] = None
    ):
        ...
    
    def add_short_term(self, content: str) -> None:
        ...
    
    def add_long_term(self, content: str, metadata: Optional[Dict] = None) -> None:
        ...
    
    def search(
        self,
        query: str,
        k: int = 5,
        search_type: str = "semantic"
    ) -> List[Dict]:
        ...

8. Model System

8.1 Supported Providers

Provider	Cloud/Local	Models	Authentication
OpenAI	Cloud	GPT-4, GPT-3.5	API Key
Anthropic	Cloud	Claude 3, Claude 2	API Key
Google	Cloud	Gemini Pro	API Key
Azure	Cloud	GPT-4, GPT-3.5	Azure Credentials
Ollama	Local	Llama, Mistral, etc.	None
HuggingFace	Local	Any HF model	Token (optional)
llama.cpp	Local	GGUF models	None

8.2 Provider Configuration

# OpenAI
ai = nn.NeuralNode(
    provider="openai",
    model="gpt-4",
    api_key="sk-...",
    organization="org-..."  # Optional
)

# Anthropic
ai = nn.NeuralNode(
    provider="anthropic",
    model="claude-3-sonnet-20240229",
    api_key="sk-ant-..."
)

# Google
ai = nn.NeuralNode(
    provider="google",
    model="gemini-pro",
    api_key="..."
)

# Azure OpenAI
ai = nn.NeuralNode(
    provider="azure",
    model="gpt-4",
    api_key="...",
    api_base="https://your-resource.openai.azure.com/",
    api_version="2024-02-01"
)

# Ollama
ai = nn.NeuralNode(
    provider="ollama",
    model="llama3",
    base_url="http://localhost:11434"
)

8.3 Local Model Management

from neuralnode.local import LocalLLMHub

# Create model hub
hub = LocalLLMHub()

# Scan for models
models = hub.scan_directory("./models")

# Add model from HuggingFace
hub.import_from_hf("meta-llama/Llama-2-7b-chat-hf", quantization="Q4_K_M")

# Launch model server
hub.launch("llama-2-7b", port=8000)

# Use in NeuralNode
ai = nn.NeuralNode(provider="local", base_url="http://localhost:8000")

8.4 Model Quantization

from neuralnode.local import ModelQuantizer

quantizer = ModelQuantizer()

# Convert to GGUF
quantizer.convert(
    model_path="meta-llama/Llama-2-7b-hf",
    output_path="./llama-7b-f16.gguf",
    format="f16"
)

# Quantize
quantizer.quantize(
    model_path="./llama-7b-f16.gguf",
    output_path="./llama-7b-q4.gguf",
    method="Q4_K_M"
)

9. Training Guide

9.1 Fine-tuning with LoRA

from neuralnode.training import FineTuner

# Initialize
tuner = FineTuner(model="meta-llama/Llama-2-7b")

# Prepare dataset
dataset = tuner.prepare_dataset(
    data_path="./training_data.jsonl",
    format="instruction",  # or "conversation"
    max_length=2048
)

# Configure LoRA
tuner.configure_lora(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05
)

# Train
tuner.finetune(
    dataset=dataset,
    output_dir="./lora_output",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    warmup_steps=100,
    logging_steps=10,
    save_steps=500,
    fp16=True
)

9.2 RLHF Training

from neuralnode.training import RLHFTrainer

# Initialize trainer
trainer = RLHFTrainer(
    model="meta-llama/Llama-2-7b",
    reward_model="your-reward-model"
)

# Stage 1: Collect preferences
preferences = trainer.collect_preferences(
    prompts=eval_prompts,
    num_responses_per_prompt=4
)

# Stage 2: Train reward model
trainer.train_reward_model(preferences, epochs=3)

# Stage 3: Train policy with PPO
trainer.train_policy(
    prompts=training_prompts,
    num_epochs=4,
    batch_size=8
)

# Full pipeline
from neuralnode.training import CompleteRLHFPipeline

pipeline = CompleteRLHFPipeline(model, tokenizer)
results = pipeline.run_full_pipeline(
    prompts=training_prompts,
    test_prompts=eval_prompts,
    output_dir="./rlhf_output"
)

9.3 Model Compression

from neuralnode.training import ModelCompressor

compressor = ModelCompressor()

# Pruning
pruned_model = compressor.prune(
    model=model,
    method="structured",  # or "unstructured", "iterative"
    amount=0.3  # Remove 30% of weights
)

# Knowledge Distillation
distilled_model = compressor.distill(
    teacher_model=large_model,
    student_model=small_model,
    train_data=training_data,
    temperature=4.0,
    alpha=0.5
)

# Full compression pipeline
compressed_model, stats = compressor.compress(
    model=model,
    methods=["pruning", "quantization", "distillation"],
    target_size_mb=500
)

10. Inference System

10.1 Basic Inference

# Single request
response = ai.chat("Hello, how are you?")

# Streaming
for token in ai.stream("Tell me a story"):
    print(token, end="")

# Batch processing
responses = ai.batch_chat([
    "Question 1",
    "Question 2",
    "Question 3"
])

10.2 Distributed Inference

from neuralnode.distributed import DistributedInferenceEngine

# Create engine
engine = DistributedInferenceEngine()

# Add compute nodes
engine.add_node("node-1", "192.168.1.10", 8000, devices=["cuda:0"])
engine.add_node("node-2", "192.168.1.11", 8000, devices=["cuda:0", "cuda:1"])

# Shard model across nodes
engine.shard_model("meta-llama/Llama-70b", shards=4)

# Parallel generation
results = engine.parallel_generate(
    prompts=["Prompt 1", "Prompt 2", "Prompt 3"],
    max_tokens=100
)

# Pipeline parallelism
pipeline = engine.create_pipeline([
    "node-1",  # Layer 0-15
    "node-2",  # Layer 16-31
    "node-3",  # Layer 32-47
    "node-4",  # Layer 48-80
])

10.3 Quantized Inference

from neuralnode.local import QuantizedLLM

# Load quantized model
model = QuantizedLLM(
    model_path="./llama-7b-q4.gguf",
    n_ctx=4096,
    n_threads=8
)

# Inference
response = model.generate(
    prompt="What is AI?",
    max_tokens=100,
    temperature=0.7
)

10.4 Caching

from neuralnode.utils import SmartCache

# Initialize cache
cache = SmartCache(
    backend="redis",  # or "disk", "memory"
    ttl=3600,
    max_size=10000
)

# Use with NeuralNode
ai = nn.NeuralNode(
    provider="openai",
    cache=cache,
    cache_similarity_threshold=0.95
)

# Cache hit for similar queries
response1 = ai.chat("What is Python?")
response2 = ai.chat("Tell me about Python programming")  # May use cache

11. Configuration & Settings

11.1 Configuration Files

YAML Configuration

# config.yaml
llm:
  provider: openai
  model: gpt-4
  temperature: 0.7
  max_tokens: 1000

agent:
  max_steps: 10
  system_prompt: "You are a helpful assistant."
  
tools:
  - name: web_search
    enabled: true
  - name: file_manager
    enabled: true
    
memory:
  type: advanced
  short_term_limit: 10
  long_term_db_path: ./memory.db
  enable_semantic: true

security:
  human_in_the_loop: true
  require_confirmation:
    - HIGH
    - CRITICAL

JSON Configuration

{
  "llm": {
    "provider": "openai",
    "model": "gpt-4"
  },
  "cache": {
    "enabled": true,
    "backend": "redis",
    "ttl": 3600
  }
}

Loading Configuration

import neuralnode as nn

# From file
ai = nn.NeuralNode.from_config("config.yaml")

# From dict
config = {
    "provider": "openai",
    "model": "gpt-4"
}
ai = nn.NeuralNode.from_config(config)

11.2 Environment Variables

# API Keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."

# Azure
export AZURE_OPENAI_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://..."

# Local Models
export OLLAMA_BASE_URL="http://localhost:11434"

# Cache
export NEURALNODE_CACHE_BACKEND="redis"
export REDIS_URL="redis://localhost:6379"

# Security
export NEURALNODE_REQUIRE_CONFIRMATION="true"
export NEURALNODE_SAFE_MODE="true"

# Logging
export NEURALNODE_LOG_LEVEL="INFO"
export NEURALNODE_LOG_FILE="./neuralnode.log"

11.3 Performance Tuning

# Memory optimization
ai = nn.NeuralNode(
    provider="local",
    context_length=4096,
    batch_size=1,  # Reduce for low memory
    gpu_layers=35  # Offload to GPU
)

# CPU optimization
import os
os.environ["OMP_NUM_THREADS"] = "8"
os.environ["OPENBLAS_NUM_THREADS"] = "8"

# Async for high throughput
import asyncio

async def batch_process(prompts):
    tasks = [ai.achat(p) for p in prompts]
    return await asyncio.gather(*tasks)

12. Plugins / Extensions System

12.1 Creating a Custom Provider

from neuralnode.providers import BaseProvider, register_provider

@register_provider("my_provider")
class MyProvider(BaseProvider):
    """Custom LLM provider."""
    
    def __init__(self, config):
        self.api_key = config.get("api_key")
        self.base_url = config.get("base_url")
    
    def chat(self, message, **kwargs):
        # Implementation
        return ChatResponse(text="...")
    
    def embed(self, text):
        # Implementation
        return [0.1, 0.2, 0.3]
    
    def is_available(self):
        return True

# Usage
ai = nn.NeuralNode(provider="my_provider", api_key="...")

12.2 Creating Custom Tools

from neuralnode.tools import Tool

class DatabaseTool(Tool):
    """Custom database tool."""
    
    def __init__(self, connection_string):
        super().__init__(
            name="query_database",
            description="Query SQL database",
            parameters={
                "query": {
                    "type": "string",
                    "description": "SQL query"
                }
            }
        )
        self.db = connect(connection_string)
    
    def execute(self, query: str) -> str:
        result = self.db.execute(query)
        return str(result.fetchall())

# Register
def register_plugin():
    return {
        "tools": [DatabaseTool],
        "providers": [],
        "hooks": {}
    }

12.3 Plugin API

# neuralnode/plugins/my_plugin/__init__.py

from neuralnode.plugins import Plugin

class MyPlugin(Plugin):
    name = "my_plugin"
    version = "1.0.0"
    
    def setup(self, app):
        """Called when plugin is loaded."""
        app.add_tool(MyTool())
        app.add_hook("pre_chat", self.preprocess)
    
    def teardown(self):
        """Called when plugin is unloaded."""
        pass

# Loading plugins
import neuralnode as nn
nn.load_plugin("my_plugin")
nn.load_plugins_from_dir("./plugins")

12.4 Integration Examples

Slack Integration

from neuralnode.integrations import SlackBot

bot = SlackBot(agent=agent, token="xoxb-...")
bot.start()

Discord Integration

from neuralnode.integrations import DiscordBot

bot = DiscordBot(agent=agent, token="...")
bot.start()

13. Performance Optimization

13.1 Quantization

# 4-bit quantization (75% size reduction)
from neuralnode.local import ModelQuantizer

quantizer = ModelQuantizer()
quantizer.quantize(
    "meta-llama/Llama-2-7b",
    method="Q4_K_M",  # 4-bit with medium quality
    output_path="./llama-7b-q4.gguf"
)

# 8-bit quantization (50% size reduction, better quality)
quantizer.quantize(
    "meta-llama/Llama-2-7b",
    method="Q8_0",
    output_path="./llama-7b-q8.gguf"
)

13.2 Model Caching

from neuralnode.utils import DiskCache, RedisCache

# Disk cache
cache = DiskCache(
    cache_dir="./cache",
    max_size_gb=10,
    ttl=86400  # 24 hours
)

# Redis cache (for distributed systems)
cache = RedisCache(
    host="localhost",
    port=6379,
    db=0
)

# Use with LLM
ai = nn.NeuralNode(
    provider="openai",
    cache=cache,
    cache_key_generator=lambda msg: hash(msg)
)

13.3 Memory Optimization

# Gradient checkpointing for training
from neuralnode.training import FineTuner

tuner = FineTuner(
    model="meta-llama/Llama-2-7b",
    gradient_checkpointing=True,
    fp16=True
)

# Model sharding for inference
from neuralnode.distributed import ModelSharding

sharder = ModelSharding()
sharder.distribute(
    model="meta-llama/Llama-70b",
    devices=["cuda:0", "cuda:1", "cuda:2", "cuda:3"]
)

13.4 Multi-threading

import concurrent.futures

# Parallel tool execution
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    futures = [
        executor.submit(tool.execute, query)
        for tool, query in tasks
    ]
    results = [f.result() for f in futures]

# Async I/O
import asyncio

async def batch_inference(prompts):
    semaphore = asyncio.Semaphore(10)  # Limit concurrent requests
    
    async def bounded_chat(prompt):
        async with semaphore:
            return await ai.achat(prompt)
    
    return await asyncio.gather(*[
        bounded_chat(p) for p in prompts
    ])

13.5 GPU Acceleration

# GPU layers for local models
ai = nn.NeuralNode(
    provider="local",
    model_path="./llama-7b.gguf",
    gpu_layers=35,  # Offload 35 layers to GPU
    n_gpu_layers=35,
    main_gpu=0
)

# Multi-GPU training
from neuralnode.training import FineTuner

tuner = FineTuner(
    model="meta-llama/Llama-2-7b",
    device_map="auto",  # Automatically distribute across GPUs
    torch_dtype="float16"
)

14. Use Cases

14.1 Chatbots

# Customer support chatbot
from neuralnode import Agent
from neuralnode.rag import RAG

# Load knowledge base
rag = RAG.from_directory("./knowledge_base")

# Create chatbot
chatbot = Agent(
    llm=ai,
    tools=[rag.as_tool()],
    system_prompt="You are a helpful customer support agent."
)

# Deploy
from neuralnode.integrations import TelegramAgent

telegram_bot = TelegramAgent(agent=chatbot, token="...")
telegram_bot.start()

14.2 Voice Assistants

from neuralnode.tools import SpeechToText, TextToSpeech
from neuralnode.chains import MultiModalChain

# Voice pipeline
stt = SpeechToText()
tts = TextToSpeech()

# Process voice command
audio_input = "command.wav"
text = stt.transcribe(audio_input)

# Get response
response = agent.run(text)

# Speak response
tts.speak(response, output="response.wav")

14.3 Computer Vision

from neuralnode.tools import VisionProcessor

vision = VisionProcessor(llm=ai)

# Analyze image
description = vision.describe_image("photo.jpg")
objects = vision.detect_objects("photo.jpg")
text = vision.extract_text("document.jpg")

# Multi-modal query
result = agent.run("""
Look at this image and tell me:
1. What objects do you see?
2. Is there any text?
3. What is the mood/atmosphere?
""", context={"image": "photo.jpg"})

14.4 NLP Systems

# Text classification
from neuralnode.nlp import Classifier

classifier = Classifier(llm=ai, labels=["positive", "negative", "neutral"])
sentiment = classifier.predict("This product is amazing!")

# Named Entity Recognition
from neuralnode.nlp import NER

ner = NER(llm=ai)
entities = ner.extract("Apple Inc. is located in Cupertino, California.")
# Result: [{"text": "Apple Inc.", "type": "ORG"}, ...]

# Text Summarization
from neuralnode.nlp import Summarizer

summarizer = Summarizer(llm=ai)
summary = summarizer.summarize(long_document, max_length=100)

14.5 Local AI Agents

# Fully offline agent
ai = nn.NeuralNode(
    provider="ollama",
    model="llama3:70b"
)

agent = Agent(
    llm=ai,
    tools=[
        FileManager(),
        ProcessManager(),
        CodeRunner()
    ],
    memory=AdvancedMemorySystem()
)

# Agent can control your computer completely offline
agent.run("Find all Python files in my project and list their dependencies")

14.6 Enterprise Solutions

# Multi-agent system for enterprise
from neuralnode.agents import AgentOrchestrator

orchestrator = AgentOrchestrator()

# Create specialized agents
researcher = orchestrator.create_agent(
    name="Researcher",
    role="research",
    tools=[WebSearch(), DocumentLoader()]
)

analyst = orchestrator.create_agent(
    name="Analyst",
    role="analysis",
    tools=[Calculator(), DataProcessor()]
)

writer = orchestrator.create_agent(
    name="Writer",
    role="content",
    tools=[]
)

# Execute workflow
result = orchestrator.execute_workflow(
    [researcher, analyst, writer],
    task="Create a market analysis report"
)

15. Comparison

15.1 NeuralNode vs TensorFlow

Feature	NeuralNode	TensorFlow
Type	LLM Framework	Deep Learning Framework
Focus	AI Agents & LLMs	General ML & Neural Networks
Ease of Use	High-level API	Low to mid-level
Local LLMs	Native support	Via conversion
Agent System	Built-in	Not available
Tool Integration	Native	Manual implementation
Use Case	Conversational AI, Agents	Research, Custom models

15.2 NeuralNode vs PyTorch

Feature	NeuralNode	PyTorch
Abstraction	High-level	Low-level
LLM Focus	Yes	No (general purpose)
Training	Simplified RLHF, LoRA	Full control
Deployment	Built-in tools	Manual setup
Learning Curve	Gentle	Steep
Flexibility	Moderate	Very High

15.3 NeuralNode vs ONNX Runtime

Feature	NeuralNode	ONNX Runtime
Purpose	AI Agent Framework	Model Inference Engine
Scope	End-to-end solutions	Inference optimization
LLM Support	Native	Via conversion
Tools	Built-in ecosystem	None
Memory	Advanced systems	Basic management
Use Case	Production agents	Optimized inference

15.4 NeuralNode vs LangChain

Feature	NeuralNode	LangChain
Design	Monolithic, integrated	Modular, composable
Complexity	Lower	Higher
Local LLMs	First-class	Via extensions
Security	Built-in sandbox	Manual implementation
Performance	Optimized defaults	Requires tuning
Documentation	Comprehensive	Extensive
Community	Growing	Larger

16. Troubleshooting

16.1 Installation Errors

Problem: ImportError: No module named 'neuralnode'

Solution:

pip install --upgrade neuralnode
# Or for development
pip install -e .

Problem: Cannot install package

Solution:

# Check Python version (requires 3.8+)
python --version

# Use virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

pip install neuralnode

16.2 GPU Not Detected

Problem: CUDA not available

Solution:

import torch
print(torch.cuda.is_available())
print(torch.version.cuda)

# Install CUDA-enabled PyTorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Problem: Out of memory on GPU

Solution:

# Reduce GPU layers
ai = nn.NeuralNode(
    provider="local",
    gpu_layers=20,  # Reduce from 35
    n_batch=512     # Smaller batch size
)

# Or use CPU
ai = nn.NeuralNode(provider="local", gpu_layers=0)

16.3 Model Loading Issues

Problem: Model not found

Solution:

# Check model path
import os
print(os.path.exists("./model.gguf"))

# Use absolute path
ai = nn.NeuralNode(
    provider="local",
    model_path=os.path.abspath("./model.gguf")
)

Problem: GGUF format not recognized

Solution:

# Update llama-cpp-python
pip install --upgrade llama-cpp-python

# Or reinstall with specific version
pip install llama-cpp-python==0.2.20

16.4 Memory Crashes

Problem: Segmentation fault or Killed

Solution:

# Limit memory usage
import resource
resource.setrlimit(resource.RLIMIT_AS, (8 * 1024 * 1024 * 1024, -1))  # 8GB

# Use quantized model
ai = nn.NeuralNode(
    provider="local",
    model_path="./model-q4.gguf",  # 4-bit instead of 16-bit
    n_ctx=2048  # Reduce context
)

Problem: Context length exceeded

Solution:

# Truncate input
from neuralnode.utils import truncate_text

truncated = truncate_text(long_text, max_tokens=3000)
response = ai.chat(truncated)

# Or use context compression
from neuralnode.memory import SlidingWindowMemory

memory = SlidingWindowMemory(max_messages=10)

16.5 API Errors

Problem: Rate limit exceeded

Solution:

# Enable caching
ai = nn.NeuralNode(
    provider="openai",
    cache=True,
    cache_similarity_threshold=0.95
)

# Add retry with exponential backoff
from neuralnode.utils import with_retry

@with_retry(max_attempts=5, backoff=2)
def chat_with_retry(prompt):
    return ai.chat(prompt)

Problem: API key invalid

Solution:

# Set environment variable
export OPENAI_API_KEY="sk-..."

# Or pass directly
ai = nn.NeuralNode(
    provider="openai",
    api_key="sk-..."
)

16.6 Debug Mode

# Enable detailed logging
import logging
logging.basicConfig(level=logging.DEBUG)

# Check tool execution
agent = Agent(llm=ai, tools=[...], debug=True)

# Monitor memory
from neuralnode.utils import MemoryMonitor

monitor = MemoryMonitor()
monitor.start()
result = agent.run("task")
monitor.stop()
print(monitor.report())

17. Roadmap

17.1 Version 1.0 (Current)

Core Features:

Unified LLM interface
Basic agent system
Tool integration
Memory systems
RAG support
Local model support

Security:

Basic sandboxing
Human-in-the-loop
Privacy mode

17.2 Version 1.1 (Q2 2024)

Advanced ReAct pattern
Multi-agent orchestration
Workflow builder
Telegram/Discord integrations
Enhanced caching

17.3 Version 1.2 (Q3 2024)

Distributed inference
Federated learning
RLHF pipeline
Model compression
Quantization GUI

17.4 Version 1.3 (Q4 2024)

Mobile app builder
Desktop GUI
Cloud deployment tools
Auto-agent capabilities
Self-healing system

17.5 Version 2.0 (2025)

Multi-modal chain improvements
Local LLM hub enhancements
Advanced observability
Plugin marketplace
Enterprise features

17.6 Future Plans

Research Directions:

Constitutional AI integration
Advanced reasoning architectures
Multi-modal foundation models
Neuro-symbolic AI

Infrastructure:

Kubernetes operator
Serverless deployment
Edge computing support
Real-time streaming

Integrations:

Slack, Teams, Discord
Notion, Confluence
Jira, Trello
AWS, GCP, Azure services

18. Contributing Guide

18.1 Getting Started

Fork the repository
Clone your fork:

git clone https://github.com/yourusername/neuralnode.git
cd neuralnode

Create virtual environment:

python -m venv venv
source venv/bin/activate

Install dependencies:

pip install -e ".[dev]"

18.2 Project Structure

neuralnode/
├── src/
│   └── neuralnode/
│       ├── __init__.py
│       ├── core.py              # Core NeuralNode class
│       ├── agent/
│       │   ├── __init__.py
│       │   ├── base.py          # Base agent
│       │   ├── react.py         # ReAct agent
│       │   └── orchestrator.py  # Multi-agent
│       ├── providers/           # LLM providers
│       ├── tools/               # Built-in tools
│       ├── memory/              # Memory systems
│       ├── rag/                 # RAG components
│       ├── training/            # Training modules
│       ├── distributed/         # Distributed inference
│       └── utils/               # Utilities
├── tests/
├── docs/
├── examples/
└── scripts/

18.3 Coding Standards

Style Guide:

Follow PEP 8
Use type hints
Document with docstrings
Maximum line length: 100

Example:

def process_data(
    input_data: List[Dict[str, Any]],
    threshold: float = 0.5
) -> List[Dict[str, Any]]:
    """
    Process input data with threshold filtering.
    
    Args:
        input_data: List of data dictionaries
        threshold: Minimum confidence threshold
    
    Returns:
        Filtered list of data
    
    Raises:
        ValueError: If threshold is not between 0 and 1
    """
    if not 0 <= threshold <= 1:
        raise ValueError("Threshold must be between 0 and 1")
    
    return [item for item in input_data if item["confidence"] >= threshold]

18.4 Testing

# Run all tests
pytest

# Run specific test
pytest tests/test_agents.py

# With coverage
pytest --cov=neuralnode --cov-report=html

# Run linting
ruff check src/
black --check src/
mypy src/

18.5 Pull Request Process

Create feature branch:

git checkout -b feature/my-feature

Make changes and commit:

git add .
git commit -m "Add feature: description"

Push to fork:

git push origin feature/my-feature

Create Pull Request:

Fill PR template
Link related issues
Request review
Ensure CI passes

18.6 Commit Message Format

type(scope): description

[optional body]

[optional footer]

Types:

feat: New feature
fix: Bug fix
docs: Documentation
test: Tests
refactor: Code refactoring
perf: Performance
chore: Maintenance

Example:

feat(agent): add ReAct pattern support

Implements reasoning and acting cycle with
self-correction capabilities.

Closes #123

19. License

NeuralNode is released under the MIT License.

MIT License

Copyright (c) 2024 NeuralNode Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

20. Credits & Author

20.1 Author

Assem Sabry

Creator and Lead Developer
GitHub: @assemsabry

20.2 Contributors

Thank you to all contributors who have helped make NeuralNode better:

[List of contributors will be maintained here]

20.3 Acknowledgments

Special thanks to:

OpenAI, Anthropic, Google - For their groundbreaking LLM research
Meta AI - For open-sourcing Llama models
Ollama team - For making local LLMs accessible
HuggingFace - For the transformers ecosystem
LangChain - For inspiring agent architectures
FAISS team - For efficient vector search
The Python community - For excellent libraries and tools

20.4 References

Papers:

ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022)
LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021)
RLHF: Training language models to follow instructions (Ouyang et al., 2022)

Libraries:

llama.cpp (ggerganov)
sentence-transformers (UKPLab)
chromadb (Chroma)
faiss (Facebook AI)

Appendix A: Missing Integrations

The following integrations are planned for future releases:

Communication:

Slack (in progress)
Discord (in progress)
WhatsApp Business API
Microsoft Teams
Twilio (SMS/Voice)

Storage:

AWS S3
Google Cloud Storage
Azure Blob Storage
MongoDB
PostgreSQL
Redis

Productivity:

Notion API
Confluence
Google Workspace
Microsoft 365

Project Management:

Jira
Trello
Asana
Monday.com

DevOps:

GitHub Actions
GitLab CI
Docker
Kubernetes

Monitoring:

Prometheus
Grafana
Datadog
New Relic
Sentry

Appendix B: Changelog

See CHANGELOG.md for detailed version history.

Documentation Version: 1.0.0

Last Updated: 2024

For support: Open an issue on GitHub or contact the maintainers.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.1.2

Apr 8, 2026

2.1.1

Apr 8, 2026

2.1.0

Apr 8, 2026

2.0.9

Apr 7, 2026

2.0.8

Apr 7, 2026

2.0.7

Apr 7, 2026

2.0.6

Apr 7, 2026

2.0.5

Apr 7, 2026

2.0.4

Apr 7, 2026

2.0.3

Apr 7, 2026

2.0.2

Apr 5, 2026

2.0.1

Mar 5, 2026

2.0.0

Mar 5, 2026

This version

1.0.3

Feb 26, 2026

1.0.1

Feb 26, 2026

1.0.0

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuralnode-1.0.3.tar.gz (257.0 kB view details)

Uploaded Feb 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

neuralnode-1.0.3-py3-none-any.whl (237.3 kB view details)

Uploaded Feb 26, 2026 Python 3

File details

Details for the file neuralnode-1.0.3.tar.gz.

File metadata

Download URL: neuralnode-1.0.3.tar.gz
Upload date: Feb 26, 2026
Size: 257.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for neuralnode-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`0bd8862a2295fefc25399422d5eb37b7bc55407201e8ccfdcf38f42c673a36b1`
MD5	`778a7a342dc48faa35fe788019b47989`
BLAKE2b-256	`dfd3753648a3aad0be2df2f5666f942f9c1d29bdc79028cf53b06920a0ef411a`

See more details on using hashes here.

File details

Details for the file neuralnode-1.0.3-py3-none-any.whl.

File metadata

Download URL: neuralnode-1.0.3-py3-none-any.whl
Upload date: Feb 26, 2026
Size: 237.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for neuralnode-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0145c77dcbd81a1d7a9fbe070d31b9f0d6c61916f45dc9589b7721b0719b390c`
MD5	`a473ab57dcfa51760f231ac8214b0ae8`
BLAKE2b-256	`c5c07ad93b0c838af767e74090a61a7b1e30c446feab8a1da2957cc8a5d1e916`

See more details on using hashes here.

neuralnode 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

NeuralNode Documentation

Table of Contents

1. Introduction

Key Differentiators

Who Should Use NeuralNode?

2. Features

2.1 Core Features

Unified LLM Interface

Advanced Memory System

Intelligent Agents

Multi-Modal Processing

2.2 Security Features

Human-in-the-Loop

Sandboxed Code Execution

Privacy Mode

2.3 Performance Features

FAISS Vector Search

Distributed Inference

Model Quantization

2.4 Training Features

RLHF Pipeline

Fine-tuning with LoRA

Federated Learning

3. Architecture Overview

3.1 System Architecture

3.2 Data Flow

3.3 Component Diagram

4. Installation Guide

4.1 Prerequisites

4.2 Basic Installation

4.3 Development Installation

4.4 Optional Dependencies

4.5 Docker Installation

4.6 Verification

5. Quick Start

5.1 First Steps

5.2 Building Your First Agent

5.3 Adding Memory

5.4 Using Local Models

6. Core Concepts

6.1 NeuralNode

6.2 Agent

6.3 Tools

6.4 Memory Systems

6.5 RAG (Retrieval-Augmented Generation)

6.6 Chains

7. API Reference

7.1 NeuralNode Class

7.2 Agent Class

7.3 Tool Class

7.4 Memory Classes

8. Model System

8.1 Supported Providers

8.2 Provider Configuration

8.3 Local Model Management

8.4 Model Quantization

9. Training Guide

9.1 Fine-tuning with LoRA

9.2 RLHF Training

9.3 Model Compression

10. Inference System

10.1 Basic Inference

10.2 Distributed Inference

10.3 Quantized Inference

10.4 Caching

11. Configuration & Settings

11.1 Configuration Files

11.2 Environment Variables

11.3 Performance Tuning

12. Plugins / Extensions System

12.1 Creating a Custom Provider