Skip to main content

Python client and tools for KoboldCPP API

Project description

Basic functions

KoboldCpp API Interface

Allows easy use of basic KoboldCpp API endpoints, including streaming generations, images, samplers.

Instruct Template Wrapping

Finds the appropriate instruct template for the running model and wraps it around content to create a prompt.

Chunking

Will read most types of document and chunk them any size up to max context. Stops at natural break points. Returns the chunks as a list.

Guide to Using the KoboldCPP API with Python

Introduction

KoboldCPP is a powerful and portable solution for running Large Language Models (LLMs). Its standout features include:

  • Zero-installation deployment with single executable
  • Support for any GGUF model compatible with LlamaCPP
  • Cross-platform support (Linux, Windows, macOS)
  • Hardware acceleration via CUDA and Vulkan
  • Built-in GUI with extensive features
  • Multimodal capabilities (image generation, speech, etc.)
  • API compatibility with OpenAI and Ollama

Quick Start

Basic Setup

  1. Download the KoboldCPP executable for your platform
  2. Place your GGUF model file in the same directory
  3. Install the Python client:
git clone https://github.com/jabberjabberjabber/koboldapi-python
cd koboldapi-python
pip install git+https://github.com/jabberjabberjabber/koboldapi-python.git

First Steps

Here's a minimal example to get started:

from koboldapi import KoboldAPI

# Initialize the client
api = KoboldAPI("http://localhost:5001")

# Basic text generation
response = api.generate(
    prompt="Write a haiku about programming:",
    max_length=50,
    temperature=0.7
)
print(response)

Core Concepts

Configuration Management

The KoboldAPIConfig class manages configuration settings for the API client. You can either create a config programmatically or load it from a JSON file:

from koboldapi import KoboldAPIConfig

# Create config programmatically
config = KoboldAPIConfig(
    api_url="http://localhost:5001",
    api_password="",
    templates_directory="./templates",
    translation_language="English",
    temp=0.7,
    top_k=40,
    top_p=0.9,
    rep_pen=1.1
)

# Or load from JSON file
config = KoboldAPIConfig.from_json("config.json")

# Save config to file
config.to_json("new_config.json")

Example config.json:

{
    "api_url": "http://localhost:5001",
    "api_password": "",
    "templates_directory": "./templates",
    "translation_language": "English",
    "temp": 0.7,
    "top_k": 40,
    "top_p": 0.9,
    "rep_pen": 1.1
}

Template Management

KoboldAPI supports various instruction formats through templates. The InstructTemplate class handles this automatically:

from koboldapi.templates import InstructTemplate

template = InstructTemplate("./templates", "http://localhost:5001")

# Wrap a prompt with the appropriate template
wrapped_prompt = template.wrap_prompt(
    instruction="Explain quantum computing",
    content="Focus on qubits and superposition",
    system_instruction="You are a quantum physics expert"
)

Example Applications

Text Processing

The library includes example scripts for various text processing tasks:

from koboldapi import KoboldAPICore
from koboldapi.chunking.processor import ChunkingProcessor

# Initialize core with config
config = {
    "api_url": "http://localhost:5001",
    "templates_directory": "./templates"
}
core = KoboldAPICore(config)

# Process a text file
processor = ChunkingProcessor(core.api_client, max_chunk_length=2048)
chunks, metadata = processor.chunk_file("document.txt")

# Generate summary for each chunk
for chunk, _ in chunks:
    summary = core.api_client.generate(
        prompt=core.template_wrapper.wrap_prompt(
            instruction="Summarize this text",
            content=chunk
        )[0],
        max_length=200
    )
    print(summary)

Image Processing

Process images:

from koboldapi import KoboldAPICore
from pathlib import Path

# Initialize core
config = {
    "api_url": "http://localhost:5001",
    "templates_directory": "./templates"
}
core = KoboldAPICore(config)

# Process image
image_path = Path("image.png")
with open(image_path, "rb") as f:
    image_data = base64.b64encode(f.read()).decode()

result = core.api_client.generate(
    prompt=core.template_wrapper.wrap_prompt(
        instruction="Extract text from this image",
        system_instruction="You are an OCR system"
    )[0],
    images=[image_data],
    temperature=0.1
)
print(result)

Advanced Features

Custom Template Creation

Create custom instruction templates for different models:

{
    "name": ["vicuna-7b", "vicuna-13b"],
    "system_start": "### System:\n",
    "system_end": "\n\n",
    "user_start": "### Human: ",
    "user_end": "\n\n",
    "assistant_start": "### Assistant: ",
    "assistant_end": "\n\n"
}

Generation Parameters

Fine-tune generation settings:

response = api.generate(
    prompt="Write a story:",
    max_length=500,
    temperature=0.8,      # Higher = more creative
    top_p=0.9,           # Nucleus sampling threshold
    top_k=40,            # Top-k sampling threshold
    rep_pen=1.1,         # Repetition penalty
    rep_pen_range=256,   # How far back to apply rep penalty
    min_p=0.05          # Minimum probability threshold
)

Error Handling

Implement robust error handling:

from koboldapi import KoboldAPIError

try:
    response = api.generate(prompt="Test prompt")
except KoboldAPIError as e:
    print(f"API Error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Performance Optimization

Context Management

Optimize token usage:

# Get max context length
max_length = api.get_max_context_length()

# Count tokens in prompt
token_count = api.count_tokens(prompt)["count"]

# Ensure we stay within limits
available_tokens = max_length - token_count
response_length = min(desired_length, available_tokens)

Batch Processing

Handle multiple inputs efficiently:

async def process_batch(prompts):
    results = []
    for prompt in prompts:
        async for token in api.stream_generate(prompt):
            results.append(token)
    return results

Troubleshooting

Common Issues

  1. Connection Errors
# Test connection
if not api.validate_connection():
    print("Cannot connect to API")
  1. Template Errors
# Check if template exists
if not template.get_template():
    print("No matching template found for model")
  1. Generation Errors
# Monitor generation status
status = api.check_generation()
if status is None:
    print("Generation failed or was interrupted")

Contributing

Contributions to improve these tools are welcome. Please submit issues and pull requests on GitHub.

Development Setup

  1. Clone the repository
  2. Install development dependencies:
pip install -e ".[dev]"
  1. Run tests:
pytest tests/

License

This project is licensed under the GPLv3 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

koboldapi-0.1.0.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

koboldapi-0.1.0-py3-none-any.whl (30.0 kB view details)

Uploaded Python 3

File details

Details for the file koboldapi-0.1.0.tar.gz.

File metadata

  • Download URL: koboldapi-0.1.0.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.13

File hashes

Hashes for koboldapi-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8f3a1e28b70a62f391292859fd3a0230e66e04bdb26d89d0b245b0e42b0c9d70
MD5 1f6e861032b7e277d0f6078ee8fe3ded
BLAKE2b-256 f3fad2a448654f39926479341c6c84267c90fd953aa0097391d7ef736e0118cf

See more details on using hashes here.

File details

Details for the file koboldapi-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: koboldapi-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 30.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.13

File hashes

Hashes for koboldapi-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5c473ca9859ee21d19cc6a050dc175b61430bc4bbf34aff37673da958160e2e
MD5 9e1a877fdb3a181aa554447e5b67ff01
BLAKE2b-256 d9da245381c58547b8dd2ea4adc00655cc886b236c7b924f6dd3f75dead4c48f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page