Skip to main content

Structured prompt orchestration with cache, safety, and analyzer layers

Project description

prompt_orchestrator

Prompt Orchestrator

Python module for structured prompt orchestration with:

  • static/semi-stable/dynamic prompt layout
  • configurable summary LLM with provider selection
  • TTL cache backends
  • optional RAG providers
  • safety checks (config-driven grouped threats, weighted groups, bilingual patterns, contradiction pairs)
  • prompt efficiency analyzer
  • token counting with tiktoken
  • centralized mutable config (Pydantic)
  • one-call orchestrator bootstrap from config store

Install

pip install -e .

For development and tests:

pip install -e .[dev]

Install with optional OpenTelemetry support:

pip install -e .[otel]

Optional OpenTelemetry + SigNoz

OpenTelemetry is optional. If not installed or not enabled, PromptOrchestrator works as before.

SigNoz is expected to run separately (for example, official SigNoz Docker deployment on http://localhost:8080).

Enable OTel (host runtime):

ENABLE_OTEL=true
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317
OTEL_SERVICE_NAME=prompt-orchestrator
OTEL_SERVICE_NAMESPACE=prompt-stack
OTEL_DEPLOYMENT_ENVIRONMENT=dev

Required/optional flags summary:

  • Start telemetry export (required): set ENABLE_OTEL=true
  • Stop telemetry export (required): set ENABLE_OTEL=false
  • OTLP destination (optional, used when enabled): OTEL_EXPORTER_OTLP_ENDPOINT
  • Resource labels (optional): OTEL_SERVICE_NAME, OTEL_SERVICE_NAMESPACE, OTEL_DEPLOYMENT_ENVIRONMENT, OTEL_SERVICE_VERSION

Run local OTel Collector (1 additional container):

docker compose -f docker-compose.otel.yml up -d

Disable OTel (host runtime):

ENABLE_OTEL=false

Stop local OTel Collector:

docker compose -f docker-compose.otel.yml down

Files used:

  • docker-compose.otel.yml
  • observability/otel-collector-config.yaml

Default endpoints:

  • SigNoz UI (external): http://localhost:8080
  • OTLP gRPC ingest (local collector): http://localhost:4317
  • OTLP HTTP ingest (local collector): http://localhost:4318

Exposed telemetry (when enabled):

Telemetry signal name Description
prompt_orchestrator.build_for_request Trace span for one prompt build request. Includes attribute session.id.
prompt_build_requests_total Counter of prompt build attempts. Attributes include operation=build_for_request and status (ok/error).
prompt_errors_total Counter of errors by operation and error type. Attributes include operation and error.type.
prompt_build_latency_ms Histogram of prompt build latency in milliseconds.
prompt_total_tokens Histogram of total token count in the built prompt payload.
prompt_total_chars Histogram of total character count in the built prompt payload.
prompt_rag_chunks_count Histogram of retrieved RAG chunks used in the prompt.
prompt_warnings_count Histogram of analyzer warnings count per build.
prompt_safety_events_total Counter of safety events. Attributes include severity and status.
prompt_summary_calls_total Counter of summary calls. Attributes include operation=summary, provider, and status.
prompt_summary_latency_ms Histogram of summary call latency in milliseconds.
prompt.error operation={operation} error_type={error_type} OTLP log message emitted on errors (for example in build_for_request or summary).

Dashboard template blueprint:

  • observability/signoz-dashboard-prompt-orchestrator.yaml

Use it as a panel/query blueprint in SigNoz to create a dashboard for prompt build latency, token pressure, RAG payload size, safety events, summary latency, logs, and traces.

Configuration Models

  • PromptConfig: static prompt structure
  • OrchestratorSettings: runtime limits and behavior
  • SummaryLLMConfig: summary provider and model settings
  • ModuleConfig: full module config in one object
  • ConfigStore: mutable config holder (get, set_config, as_dict)

Safety Engine

The safety layer is configured from prompt_orchestrator/safety/threats.json. The catalog is grouped by threat family, and each family has its own weight so the final severity is still computed by the maximum matched threat score.

What changed:

  • threat families are defined in threats.json and loaded at runtime
  • regular lexical rules live under patterns
  • contradiction rules live under contradictions and are matched as pairs
  • each family can include English and Russian analogs for the same threat family
  • duplicate patterns were removed from the catalog
  • each matched rule keeps its threat code in the report

SafetyReport now includes:

  • issues: flat list of matched safety issues
  • threat_groups: grouped report by threat family
  • severity: overall severity (none, low, medium, high)
  • threat_score: weighted maximum score used for the final severity
  • sanitized_prompt: optional rewritten prompt when auto rewrite is enabled

Each grouped report includes the threat family name, the number of matches, the matched codes, and the family weight. Use result.safety.grouped_summary or result.safety.model_dump() to inspect the grouped output.

OrchestratorSettings.debug_mode

By default, section headers (=== STATIC PART (CACHE-FRIENDLY) ===, etc.) are excluded from the final prompt sent to LLMs to save tokens.

Enable debug_mode=True to include section headers for:

  • Debugging and development
  • Understanding prompt structure during testing
  • Console/log output inspection
settings = OrchestratorSettings(
    debug_mode=True,  # Enables section headers in output
)

In simulations, use --debug flag:

python simulations/console_pipeline_test.py  # Prompts for debug mode
python simulations/conversation_simulation_test.py --debug  # Enable debug headers

Supported Summary Providers

  • none: deterministic local fallback summarization
  • openai: OpenAI via openai SDK
  • ollama: local Ollama endpoint via /api/generate
  • custom: bring your own client implementing generate(prompt, model, max_tokens, temperature)

Integration with RagflowOrchestrator

PromptOrchestrator can work directly with RagflowOrchestrator as a retrieval backend.

Why this pairing works well:

  • PromptOrchestrator controls prompt layout, context compaction, safety checks, and token budgets.
  • RagflowOrchestrator handles indexing, embedding, and retrieval from vector storage.
  • Both projects use a compatible DocChunk shape (id, content, score, metadata).

Option 1: Use RagflowOrchestrator compatibility adapter (recommended)

RagflowOrchestrator includes PromptStyleRAGProviderAdapter, which exposes the exact interface PromptOrchestrator expects (retrieve(query, limit)).

from prompt_orchestrator import (
    LocalTTLCacheBackend,
    OrchestratorSettings,
    PromptConfig,
    PromptContextManager,
    PromptOrchestrator,
    SummaryLLM,
)

from ragflow_orchestrator import HashEmbedder, create_provider
from ragflow_orchestrator.rag import PromptStyleRAGProviderAdapter

# RagflowOrchestrator side: provider + embedder
provider = create_provider(kind="sqlite", db_path="rag.db", table="chunks")
embedder = HashEmbedder(dimensions=256)

# Adapter gives PromptOrchestrator-compatible retrieve(query, limit)
rag_provider = PromptStyleRAGProviderAdapter(provider=provider, embedder=embedder)

config = PromptConfig(
    system_prompt="You are a grounded assistant.",
    role="Engineer",
    task="Answer using retrieved context.",
    constraints=["Cite retrieved facts", "Avoid unsupported claims"],
    output_format="Markdown",
    examples=[],
)

settings = OrchestratorSettings(use_rag_default=True, rag_limit=4)
cache = LocalTTLCacheBackend(default_ttl_seconds=settings.cache_ttl_seconds)
context_manager = PromptContextManager(cache, settings, SummaryLLM())

orchestrator = PromptOrchestrator(
    config=config,
    context_manager=context_manager,
    rag_provider=rag_provider,
    settings=settings,
)

result = orchestrator.build_for_request(
    session_id="rag-integration-demo",
    user_message="How does deduplication work in our retrieval pipeline?",
    use_rag=True,
)

print(result.prompt)

Option 2: Wrap RAGOrchestrator.search(...) in a thin adapter

If you already use a full RAGOrchestrator pipeline (ingest + search), expose it as a RAGProvider for PromptOrchestrator:

from prompt_orchestrator.rag.base import RAGProvider
from prompt_orchestrator.context.state import DocChunk

from rag_orchestrator import RAGOrchestrator


class RagOrchestratorProvider(RAGProvider):
    def __init__(self, orchestrator: RAGOrchestrator) -> None:
        self._orchestrator = orchestrator

    def retrieve(self, query: str, limit: int) -> list[DocChunk]:
        rows = self._orchestrator.search(query_text=query, top_k=limit)
        return [
            DocChunk(
                id=row.chunk.id,
                content=row.chunk.text,
                score=row.score,
                metadata={str(k): str(v) for k, v in row.chunk.metadata.items()},
            )
            for row in rows
        ]

Use this adapter as rag_provider in PromptOrchestrator(...) and set use_rag=True when building requests.

Simulations Folder

Simulation assets are located in simulations:

How to work with simulations:

# Interactive pipeline (manual typing)
python simulations/console_pipeline_test.py

# Scripted simulation from JSON turns
python simulations/conversation_simulation_test.py

# Include unsafe/injection scenarios
python simulations/conversation_simulation_test.py --include-safety

# Run without RAG and cap turns
python simulations/conversation_simulation_test.py --no-rag --max-turns 5

Example 1: Manual Wiring (Local, No RAG)

from prompt_orchestrator import (
    LocalTTLCacheBackend,
    NoRAGProvider,
    OrchestratorSettings,
    PromptConfig,
    PromptContextManager,
    PromptOrchestrator,
    SummaryLLM,
)

config = PromptConfig(
    system_prompt="You are a helpful assistant.",
    role="Senior Analyst",
    task="Answer user questions precisely.",
    constraints=["Do not hallucinate", "Use concise style"],
    output_format="Markdown",
    examples=["Q: 2+2? A: 4"],
)

settings = OrchestratorSettings(
    max_prompt_chars=12000,
    max_prompt_tokens=3000,
    recent_messages_limit=10,
    cache_ttl_seconds=900,
    rag_limit=3,
)

cache = LocalTTLCacheBackend(default_ttl_seconds=settings.cache_ttl_seconds)
summary_llm = SummaryLLM()
context_manager = PromptContextManager(cache, settings, summary_llm)

orchestrator = PromptOrchestrator(
    config=config,
    context_manager=context_manager,
    rag_provider=NoRAGProvider(),
    settings=settings,
)

result = orchestrator.build_for_request(
    session_id="demo-session",
    user_message="Explain how TTL helps prompt caching",
    use_rag=False,
)

print(result.prompt)
print(result.stats.model_dump())
print(result.safety.model_dump())

Example 2: Centralized Config + Factory (One-Call Bootstrap)

from prompt_orchestrator import (
    ConfigStore,
    ModuleConfig,
    OrchestratorSettings,
    PromptConfig,
    SummaryLLMConfig,
    PromptOrchestratorFactory,
)

full_config = ModuleConfig(
    prompt=PromptConfig(
        system_prompt="You are a helpful assistant.",
        role="Engineer",
        task="Answer clearly",
        constraints=["No hallucinations"],
        output_format="Markdown",
        examples=[],
    ),
    settings=OrchestratorSettings(max_prompt_tokens=3000),
    summary_llm=SummaryLLMConfig(provider="openai", model="gpt-4o-mini"),
)

store = ConfigStore(full_config)
model_name = store.get("summary_llm.model")

orchestrator = PromptOrchestratorFactory.from_config_store(store)
result = orchestrator.build_for_request(
    session_id="factory-demo",
    user_message="What is TTL cache?",
    use_rag=False,
)

Example 3: OpenAI Summary Provider

from prompt_orchestrator import (
    ConfigStore,
    ModuleConfig,
    OpenAIConfig,
    OrchestratorSettings,
    PromptConfig,
    PromptOrchestratorFactory,
    SummaryLLMConfig,
)

cfg = ModuleConfig(
    prompt=PromptConfig(
        system_prompt="You are a concise assistant.",
        role="Tech Writer",
        task="Summarize conversation state and answer user request.",
        constraints=["No speculative claims"],
        output_format="Markdown",
        examples=[],
    ),
    settings=OrchestratorSettings(
        max_prompt_tokens=2500,
        token_model="gpt-4o-mini",
    ),
    summary_llm=SummaryLLMConfig(
        provider="openai",
        model="gpt-4o-mini",
        openai=OpenAIConfig(
            api_key="YOUR_OPENAI_API_KEY",
            base_url=None,
            organization=None,
        ),
    ),
)

store = ConfigStore(cfg)
orchestrator = PromptOrchestratorFactory.from_config_store(store)
response = orchestrator.build_for_request(
    session_id="openai-summary",
    user_message="Please summarize previous decisions and next actions",
    use_rag=False,
)
print(response.stats.total_tokens)

Token Counting (tiktoken)

  • Prompt length checks use tiktoken-based counting
  • Configure tokenizer via OrchestratorSettings.token_model and OrchestratorSettings.token_encoding
  • Limit fitting in PromptContextManager.ensure_fits_limit trims sections to satisfy both char and token budgets

Running Tests

pytest -q

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_orchestrator-0.1.5.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prompt_orchestrator-0.1.5-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file prompt_orchestrator-0.1.5.tar.gz.

File metadata

  • Download URL: prompt_orchestrator-0.1.5.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for prompt_orchestrator-0.1.5.tar.gz
Algorithm Hash digest
SHA256 3d7dacdcac1b04e9a184b5185632af6c04da04eedab377f0b7cf4919ef5c37b5
MD5 bdabd9ed2bf16961ec0fc5cd15639e24
BLAKE2b-256 0692a6e3df9c9688873cd7a8dfca37beb72e428e8309d2b4ff06fbbf5644f554

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_orchestrator-0.1.5.tar.gz:

Publisher: publish.yml on VeryComplexAndLongName/PromptOrchestrator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prompt_orchestrator-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for prompt_orchestrator-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b2c1df0c5eb57cbc5f0cbbc9e2954f2b5bcc33e9fd313e617516621fb928ce5c
MD5 ecc8cef937fd1c4860e96687392e25fa
BLAKE2b-256 2e1928c212afbcfcc4e0c6319388d2e01af6ff51ec7faef95a7f224223974037

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_orchestrator-0.1.5-py3-none-any.whl:

Publisher: publish.yml on VeryComplexAndLongName/PromptOrchestrator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page