Skip to main content

A robust, multi-model LLM calling package with intelligent context management, file processing, and advanced prompt handling

Project description

Nimble LLM Caller

A robust, multi-model LLM calling package with intelligent context management, file processing, and advanced prompt handling capabilities.

🚀 Key Features

Core Capabilities

  • Multi-Model Support: Call multiple LLM providers (OpenAI, Anthropic, Google, etc.) through LiteLLM
  • Intelligent Context Management: Automatic context-size-aware request handling with model upshifting
  • File Processing: Support for 29+ file types (PDF, Word, images, JSON, CSV, XML, YAML, etc.)
  • Batch Processing: Submit multiple prompts to multiple models efficiently
  • Robust JSON Parsing: Multiple fallback strategies for parsing LLM responses
  • Retry Logic: Exponential backoff with jitter for handling rate limits and transient errors

Advanced Features

  • Context-Size-Aware Safe Submit: Automatic overflow handling with model upshifting and content chunking
  • File Attachment Support: Process and include files directly in LLM requests
  • Comprehensive Interaction Logging: Detailed request/response tracking with metadata
  • Prompt Management: JSON-based prompt templates with variable substitution
  • Document Assembly: Built-in formatters for text, markdown, and LaTeX output
  • Graceful Degradation: Fallback strategies for reliability
  • Full Backward Compatibility: Existing code continues to work unchanged

📦 Installation

Basic Installation

pip install nimble-llm-caller

Enhanced Installation (Recommended)

# Install with enhanced file processing capabilities
pip install nimble-llm-caller[enhanced]

All Features Installation

# Install with all optional dependencies
pip install nimble-llm-caller[all]

Development Installation

# Clone the repository
git clone https://github.com/fredzannarbor/nimble-llm-caller.git
cd nimble-llm-caller

# Install in development mode with all features
pip install -e .[dev,enhanced]

# Run setup script
python setup_dev.py setup

Installation Options Summary

Installation Command Features
Basic pip install nimble-llm-caller Core LLM calling, basic context management
Enhanced pip install nimble-llm-caller[enhanced] + File processing (PDF, Word, images), advanced tokenization
All pip install nimble-llm-caller[all] + All optional features and dependencies
Development pip install -e .[dev,enhanced] + Testing, linting, documentation tools

⚙️ Configuration

1. API Keys Setup

Set your API keys in environment variables:

# Required: At least one LLM provider
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export GOOGLE_API_KEY="your-google-key"

# Optional: For enhanced features
export LITELLM_LOG="INFO"  # Enable LiteLLM logging

2. Environment File (.env)

Create a .env file in your project root:

# LLM Provider API Keys
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key
GOOGLE_API_KEY=your-google-key

# Optional Configuration
LITELLM_LOG=INFO
NIMBLE_LOG_LEVEL=INFO
NIMBLE_DEFAULT_MODEL=gpt-4o
NIMBLE_MAX_RETRIES=3

3. Configuration File

Create a configuration file for advanced settings:

# config.py
from nimble_llm_caller.models.context_config import ContextConfig, ContextStrategy

# Custom context configuration
context_config = ContextConfig(
    default_strategy=ContextStrategy.UPSHIFT,
    enable_chunking=True,
    chunk_overlap_tokens=100,
    max_cost_multiplier=3.0,
    enable_model_fallback=True
)

🚀 Quick Start

Basic Usage (Backward Compatible)

from nimble_llm_caller import LLMCaller, LLMRequest

# Traditional usage - still works!
caller = LLMCaller()
request = LLMRequest(
    prompt_key="summarize_text",
    model="gpt-4",
    substitutions={"text": "Your text here"}
)
response = caller.call(request)
print(f"Result: {response.content}")

Enhanced Usage with Intelligent Context Management

from nimble_llm_caller import EnhancedLLMCaller, LLMRequest, FileAttachment

# Enhanced caller with all intelligent features
caller = EnhancedLLMCaller(
    enable_context_management=True,
    enable_file_processing=True,
    enable_interaction_logging=True
)

# Request with file attachments and automatic context management
request = LLMRequest(
    prompt_key="analyze_document",
    model="gpt-4",
    file_attachments=[
        FileAttachment(file_path="document.pdf", content_type="application/pdf"),
        FileAttachment(file_path="data.csv", content_type="text/csv")
    ],
    substitutions={"analysis_type": "comprehensive"}
)

# Automatic context management, file processing, and logging
response = caller.call(request)
print(f"Analysis: {response.content}")
print(f"Files processed: {response.files_processed}")
print(f"Model used: {response.model} (original: {response.original_model})")

Content Generation with File Processing

from nimble_llm_caller import LLMContentGenerator

# Initialize with prompts and enhanced features
generator = LLMContentGenerator(
    prompt_file_path="prompts.json",
    enable_context_management=True,
    enable_file_processing=True
)

# Process multiple files with intelligent context handling
results = generator.call_batch(
    prompt_keys=["summarize_document", "extract_key_points"],
    models=["gpt-4o", "claude-3-sonnet"],
    shared_substitutions={
        "files": ["report.pdf", "data.xlsx", "presentation.pptx"]
    }
)

print(f"Success rate: {results.success_rate:.1f}%")
print(f"Total files processed: {sum(r.files_processed for r in results.responses)}")

📋 Usage Examples

1. Context-Size-Aware Processing

from nimble_llm_caller import EnhancedLLMCaller, LLMRequest

caller = EnhancedLLMCaller(enable_context_management=True)

# Large content that might exceed context limits
large_content = "..." * 50000  # Very large text

request = LLMRequest(
    prompt_key="analyze_content",
    model="gpt-5-mini",  # Will automatically upshift if needed
    substitutions={"content": large_content}
)

# Automatic handling: upshift to gpt-4-turbo or chunk content
response = caller.call(request)

if response.upshift_reason:
    print(f"Upshifted from {response.original_model} to {response.model}")
    print(f"Reason: {response.upshift_reason}")

if response.was_chunked:
    print(f"Content was chunked: {response.chunk_info}")

2. File Processing with Multiple Formats

from nimble_llm_caller import EnhancedLLMCaller, LLMRequest, FileAttachment

caller = EnhancedLLMCaller(
    enable_file_processing=True,
    enable_context_management=True
)

# Process multiple file types
files = [
    FileAttachment("report.pdf", content_type="application/pdf"),
    FileAttachment("data.xlsx", content_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"),
    FileAttachment("image.png", content_type="image/png"),
    FileAttachment("config.yaml", content_type="application/x-yaml")
]

request = LLMRequest(
    prompt_key="comprehensive_analysis",
    model="gpt-4o",  # Vision-capable model for images
    file_attachments=files
)

response = caller.call(request)
print(f"Processed {response.files_processed} files")
print(f"Analysis: {response.content}")

3. Interaction Logging and Monitoring

from nimble_llm_caller import EnhancedLLMCaller

# Enable comprehensive logging
caller = EnhancedLLMCaller(
    enable_interaction_logging=True,
    log_file_path="llm_interactions.log",
    log_content=True,
    log_metadata=True
)

# Make requests - all interactions are logged
response = caller.call(request)

# Access recent interactions
recent = caller.interaction_logger.get_recent_interactions(count=5)
for interaction in recent:
    print(f"Request: {interaction.prompt_key} -> {interaction.model}")
    print(f"Duration: {interaction.duration_ms}ms")
    print(f"Tokens: {interaction.token_usage}")

# Get statistics
stats = caller.interaction_logger.get_statistics()
print(f"Total requests: {stats['total_requests']}")
print(f"Success rate: {stats['success_rate']:.1f}%")
print(f"Average duration: {stats['avg_duration_ms']:.1f}ms")

4. Custom Context Strategies

from nimble_llm_caller import EnhancedLLMCaller, ContextConfig, ContextStrategy

# Custom context configuration
config = ContextConfig(
    default_strategy=ContextStrategy.CHUNK,  # Prefer chunking over upshifting
    enable_chunking=True,
    chunk_overlap_tokens=200,
    max_cost_multiplier=2.0,  # Limit cost increases
    enable_model_fallback=True
)

caller = EnhancedLLMCaller(
    enable_context_management=True,
    context_config=config
)

# Requests will use chunking strategy when context limits are exceeded
response = caller.call(large_request)

5. Batch Processing with Context Management

from nimble_llm_caller import LLMContentGenerator

generator = LLMContentGenerator(
    prompt_file_path="prompts.json",
    enable_context_management=True,
    enable_file_processing=True
)

# Batch process with automatic context handling
results = generator.call_batch(
    prompt_keys=["analyze_document", "extract_insights", "generate_summary"],
    models=["gpt-4o", "claude-3-sonnet", "gemini-1.5-pro"],
    shared_substitutions={
        "documents": ["doc1.pdf", "doc2.docx", "doc3.txt"]
    },
    parallel=True,
    max_concurrent=3
)

# Results include context management information
for response in results.responses:
    print(f"Prompt: {response.prompt_key}")
    print(f"Model: {response.model} (original: {response.original_model})")
    print(f"Strategy: {response.context_strategy_used}")
    print(f"Files: {response.files_processed}")
    print("---")

📝 Prompt Format

Basic Prompt Structure

{
  "prompt_keys": ["summarize_text", "analyze_document"],
  "summarize_text": {
    "messages": [
      {
        "role": "system",
        "content": "You are a professional summarizer."
      },
      {
        "role": "user", 
        "content": "Summarize this text: {text}"
      }
    ],
    "params": {
      "temperature": 0.3,
      "max_tokens": 1000
    }
  }
}

Enhanced Prompt with File Processing

{
  "analyze_document": {
    "messages": [
      {
        "role": "system",
        "content": "You are a document analyst. Analyze the provided files and give insights."
      },
      {
        "role": "user",
        "content": "Please analyze the attached files and provide {analysis_type} analysis. Focus on: {focus_areas}"
      }
    ],
    "params": {
      "temperature": 0.2,
      "max_tokens": 2000
    },
    "supports_files": true,
    "supports_vision": true
  }
}

🔧 Advanced Configuration

Context Management Settings

from nimble_llm_caller.models.context_config import ContextConfig, ContextStrategy

# Fine-tune context management
config = ContextConfig(
    # Strategy when context limit is exceeded
    default_strategy=ContextStrategy.UPSHIFT,  # or CHUNK, TRUNCATE, ERROR
    
    # Chunking settings
    enable_chunking=True,
    chunk_overlap_tokens=100,
    max_chunks=10,
    
    # Model upshifting settings
    enable_model_upshifting=True,
    max_cost_multiplier=3.0,
    enable_model_fallback=True,
    
    # Safety margins
    context_buffer_tokens=500,
    enable_token_estimation=True
)

File Processing Configuration

from nimble_llm_caller.core.file_processor import FileProcessor

# Custom file processor
processor = FileProcessor(
    max_file_size_mb=50,
    supported_formats=[
        "pdf", "docx", "txt", "md", "json", "csv", 
        "xlsx", "png", "jpg", "yaml", "xml"
    ],
    extract_metadata=True,
    preserve_formatting=True
)

Logging Configuration

from nimble_llm_caller.core.interaction_logger import InteractionLogger

# Custom interaction logger
logger = InteractionLogger(
    log_file_path="interactions.jsonl",
    log_content=True,
    log_metadata=True,
    async_logging=True,
    max_log_size_mb=100,
    max_files=10
)

🔍 Monitoring and Debugging

Access Interaction Logs

# Get recent interactions
recent = caller.interaction_logger.get_recent_interactions(count=10)

# Filter by model
gpt4_interactions = caller.interaction_logger.get_interactions_by_model("gpt-4o")

# Filter by time range
from datetime import datetime, timedelta
since = datetime.now() - timedelta(hours=1)
recent_hour = caller.interaction_logger.get_interactions_since(since)

Performance Statistics

stats = caller.interaction_logger.get_statistics()
print(f"""
Performance Statistics:
- Total Requests: {stats['total_requests']}
- Success Rate: {stats['success_rate']:.1f}%
- Average Duration: {stats['avg_duration_ms']:.1f}ms
- Total Tokens: {stats['total_tokens']}
- Average Cost: ${stats['avg_cost']:.4f}
""")

Error Analysis

# Get failed requests
failed = caller.interaction_logger.get_failed_interactions()
for failure in failed:
    print(f"Failed: {failure.prompt_key} -> {failure.error}")
    print(f"Model: {failure.model}, Duration: {failure.duration_ms}ms")

🔄 Migration Guide

From v0.1.x to v0.2.x

Your existing code continues to work unchanged! New features are opt-in:

# Old code (still works)
from nimble_llm_caller import LLMCaller, LLMRequest
caller = LLMCaller()
response = caller.call(request)

# New enhanced features (optional)
from nimble_llm_caller import EnhancedLLMCaller
caller = EnhancedLLMCaller(
    enable_context_management=True,
    enable_file_processing=True
)

See MIGRATION.md for detailed migration instructions.

📚 Documentation

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

🏷️ Version

Current version: 0.2.2 - Intelligent Context Management Release

Recent Updates

  • 📖 v0.2.2: Improved README
  • v0.2.1: Bug fixes for InteractionLogger
  • 🚀 v0.2.0: Intelligent context management, file processing, enhanced logging
  • 📦 v0.1.0: Initial release with basic LLM calling capabilities

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nimble_llm_caller-0.2.2.tar.gz (84.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nimble_llm_caller-0.2.2-py3-none-any.whl (86.8 kB view details)

Uploaded Python 3

File details

Details for the file nimble_llm_caller-0.2.2.tar.gz.

File metadata

  • Download URL: nimble_llm_caller-0.2.2.tar.gz
  • Upload date:
  • Size: 84.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for nimble_llm_caller-0.2.2.tar.gz
Algorithm Hash digest
SHA256 057a5ebcbcbcbfc2a8a54d372ba2f3fd77d19526b06d6c26935572cb5c4108e4
MD5 6bc30ceedce2992bc4d5bb52f8f4dfd6
BLAKE2b-256 4e01eaa9cab317c8069f51bfdd6b6f21e8a4956926a9acae8040f968970665e1

See more details on using hashes here.

File details

Details for the file nimble_llm_caller-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for nimble_llm_caller-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 72f2453e2ac07ede3c6a212a735ade7993a96a5486a250bf1845142fba533099
MD5 fb46c76566a0a5697e4aba0106651511
BLAKE2b-256 d669105b0d063aff667cd935a76df12e5a65b31836484020f124071c77dd211b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page