Structured prompt orchestration with cache, safety, and analyzer layers
Project description
prompt_orchestrator
Python module for structured prompt orchestration with:
- static/semi-stable/dynamic prompt layout
- configurable summary LLM with provider selection
- TTL cache backends
- optional RAG providers
- safety checks (injection + contradiction heuristics)
- prompt efficiency analyzer
- token counting with tiktoken
- centralized mutable config (Pydantic)
- one-call orchestrator bootstrap from config store
Install
pip install -e .
For development and tests:
pip install -e .[dev]
Configuration Models
PromptConfig: static prompt structureOrchestratorSettings: runtime limits and behaviorSummaryLLMConfig: summary provider and model settingsModuleConfig: full module config in one objectConfigStore: mutable config holder (get,set_config,as_dict)
OrchestratorSettings.debug_mode
By default, section headers (=== STATIC PART (CACHE-FRIENDLY) ===, etc.) are excluded from the final prompt sent to LLMs to save tokens.
Enable debug_mode=True to include section headers for:
- Debugging and development
- Understanding prompt structure during testing
- Console/log output inspection
settings = OrchestratorSettings(
debug_mode=True, # Enables section headers in output
)
In simulations, use --debug flag:
python simulations/console_pipeline_test.py # Prompts for debug mode
python simulations/conversation_simulation_test.py --debug # Enable debug headers
Supported Summary Providers
none: deterministic local fallback summarizationopenai: OpenAI viaopenaiSDKollama: local Ollama endpoint via/api/generatecustom: bring your own client implementinggenerate(prompt, model, max_tokens, temperature)
Integration with RagOrchestrator
PromptOrchestrator can work directly with RagOrchestrator as a retrieval backend.
Why this pairing works well:
- PromptOrchestrator controls prompt layout, context compaction, safety checks, and token budgets.
- RagOrchestrator handles indexing, embedding, and retrieval from vector storage.
- Both projects use a compatible
DocChunkshape (id,content,score,metadata).
Option 1: Use RagOrchestrator compatibility adapter (recommended)
RagOrchestrator includes PromptStyleRAGProviderAdapter, which exposes the exact interface PromptOrchestrator expects (retrieve(query, limit)).
from prompt_orchestrator import (
LocalTTLCacheBackend,
OrchestratorSettings,
PromptConfig,
PromptContextManager,
PromptOrchestrator,
SummaryLLM,
)
from rag_orchestrator import HashEmbedder, create_provider
from rag_orchestrator.rag import PromptStyleRAGProviderAdapter
# RagOrchestrator side: provider + embedder
provider = create_provider(kind="sqlite", db_path="rag.db", table="chunks")
embedder = HashEmbedder(dimensions=256)
# Adapter gives PromptOrchestrator-compatible retrieve(query, limit)
rag_provider = PromptStyleRAGProviderAdapter(provider=provider, embedder=embedder)
config = PromptConfig(
system_prompt="You are a grounded assistant.",
role="Engineer",
task="Answer using retrieved context.",
constraints=["Cite retrieved facts", "Avoid unsupported claims"],
output_format="Markdown",
examples=[],
)
settings = OrchestratorSettings(use_rag_default=True, rag_limit=4)
cache = LocalTTLCacheBackend(default_ttl_seconds=settings.cache_ttl_seconds)
context_manager = PromptContextManager(cache, settings, SummaryLLM())
orchestrator = PromptOrchestrator(
config=config,
context_manager=context_manager,
rag_provider=rag_provider,
settings=settings,
)
result = orchestrator.build_for_request(
session_id="rag-integration-demo",
user_message="How does deduplication work in our retrieval pipeline?",
use_rag=True,
)
print(result.prompt)
Option 2: Wrap RAGOrchestrator.search(...) in a thin adapter
If you already use a full RAGOrchestrator pipeline (ingest + search), expose it as a RAGProvider for PromptOrchestrator:
from prompt_orchestrator.rag.base import RAGProvider
from prompt_orchestrator.context.state import DocChunk
from rag_orchestrator import RAGOrchestrator
class RagOrchestratorProvider(RAGProvider):
def __init__(self, orchestrator: RAGOrchestrator) -> None:
self._orchestrator = orchestrator
def retrieve(self, query: str, limit: int) -> list[DocChunk]:
rows = self._orchestrator.search(query_text=query, top_k=limit)
return [
DocChunk(
id=row.chunk.id,
content=row.chunk.text,
score=row.score,
metadata={str(k): str(v) for k, v in row.chunk.metadata.items()},
)
for row in rows
]
Use this adapter as rag_provider in PromptOrchestrator(...) and set use_rag=True when building requests.
Simulations Folder
Simulation assets are located in simulations:
- simulations/console_pipeline_test.py: interactive console runner for manual checks
- simulations/conversation_simulation_test.py: scripted multi-turn simulation with prompt/STATS/SAFETY output
- simulations/test_turns.json: regular conversation turns for context-window and compaction checks
- simulations/safety_injection_turns.json: unsafe/injection turns for SAFETY trigger checks
- simulations/conversation_simulation_test.log: output log from last simulation run (overwritten on each run)
How to work with simulations:
# Interactive pipeline (manual typing)
python simulations/console_pipeline_test.py
# Scripted simulation from JSON turns
python simulations/conversation_simulation_test.py
# Include unsafe/injection scenarios
python simulations/conversation_simulation_test.py --include-safety
# Run without RAG and cap turns
python simulations/conversation_simulation_test.py --no-rag --max-turns 5
Example 1: Manual Wiring (Local, No RAG)
from prompt_orchestrator import (
LocalTTLCacheBackend,
NoRAGProvider,
OrchestratorSettings,
PromptConfig,
PromptContextManager,
PromptOrchestrator,
SummaryLLM,
)
config = PromptConfig(
system_prompt="You are a helpful assistant.",
role="Senior Analyst",
task="Answer user questions precisely.",
constraints=["Do not hallucinate", "Use concise style"],
output_format="Markdown",
examples=["Q: 2+2? A: 4"],
)
settings = OrchestratorSettings(
max_prompt_chars=12000,
max_prompt_tokens=3000,
recent_messages_limit=10,
cache_ttl_seconds=900,
rag_limit=3,
)
cache = LocalTTLCacheBackend(default_ttl_seconds=settings.cache_ttl_seconds)
summary_llm = SummaryLLM()
context_manager = PromptContextManager(cache, settings, summary_llm)
orchestrator = PromptOrchestrator(
config=config,
context_manager=context_manager,
rag_provider=NoRAGProvider(),
settings=settings,
)
result = orchestrator.build_for_request(
session_id="demo-session",
user_message="Explain how TTL helps prompt caching",
use_rag=False,
)
print(result.prompt)
print(result.stats.model_dump())
print(result.safety.model_dump())
Example 2: Centralized Config + Factory (One-Call Bootstrap)
from prompt_orchestrator import (
ConfigStore,
ModuleConfig,
OrchestratorSettings,
PromptConfig,
SummaryLLMConfig,
PromptOrchestratorFactory,
)
full_config = ModuleConfig(
prompt=PromptConfig(
system_prompt="You are a helpful assistant.",
role="Engineer",
task="Answer clearly",
constraints=["No hallucinations"],
output_format="Markdown",
examples=[],
),
settings=OrchestratorSettings(max_prompt_tokens=3000),
summary_llm=SummaryLLMConfig(provider="openai", model="gpt-4o-mini"),
)
store = ConfigStore(full_config)
model_name = store.get("summary_llm.model")
orchestrator = PromptOrchestratorFactory.from_config_store(store)
result = orchestrator.build_for_request(
session_id="factory-demo",
user_message="What is TTL cache?",
use_rag=False,
)
Example 3: OpenAI Summary Provider
from prompt_orchestrator import (
ConfigStore,
ModuleConfig,
OpenAIConfig,
OrchestratorSettings,
PromptConfig,
PromptOrchestratorFactory,
SummaryLLMConfig,
)
cfg = ModuleConfig(
prompt=PromptConfig(
system_prompt="You are a concise assistant.",
role="Tech Writer",
task="Summarize conversation state and answer user request.",
constraints=["No speculative claims"],
output_format="Markdown",
examples=[],
),
settings=OrchestratorSettings(
max_prompt_tokens=2500,
token_model="gpt-4o-mini",
),
summary_llm=SummaryLLMConfig(
provider="openai",
model="gpt-4o-mini",
openai=OpenAIConfig(
api_key="YOUR_OPENAI_API_KEY",
base_url=None,
organization=None,
),
),
)
store = ConfigStore(cfg)
orchestrator = PromptOrchestratorFactory.from_config_store(store)
response = orchestrator.build_for_request(
session_id="openai-summary",
user_message="Please summarize previous decisions and next actions",
use_rag=False,
)
print(response.stats.total_tokens)
Token Counting (tiktoken)
- Prompt length checks use tiktoken-based counting
- Configure tokenizer via
OrchestratorSettings.token_modelandOrchestratorSettings.token_encoding - Limit fitting in
PromptContextManager.ensure_fits_limittrims sections to satisfy both char and token budgets
Running Tests
pytest -q
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prompt_orchestrator-0.1.3.tar.gz.
File metadata
- Download URL: prompt_orchestrator-0.1.3.tar.gz
- Upload date:
- Size: 21.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f6009c972517a58346ccb0e8ffd3b5b919da3edb3c41f62f0a4656d1a4114eb
|
|
| MD5 |
1966ab1a7848d0e472b91a6bb353e0c7
|
|
| BLAKE2b-256 |
eb6396e3363a3fc9a2cd62b270d52fcef6f2af619e19a97d2e2d15cbd0fa146a
|
Provenance
The following attestation bundles were made for prompt_orchestrator-0.1.3.tar.gz:
Publisher:
publish.yml on VeryComplexAndLongName/PromptOrchestrator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
prompt_orchestrator-0.1.3.tar.gz -
Subject digest:
6f6009c972517a58346ccb0e8ffd3b5b919da3edb3c41f62f0a4656d1a4114eb - Sigstore transparency entry: 1662394151
- Sigstore integration time:
-
Permalink:
VeryComplexAndLongName/PromptOrchestrator@56e85c9477523ebf22972f13d97741cba70f580e -
Branch / Tag:
refs/tags/0.1.3 - Owner: https://github.com/VeryComplexAndLongName
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@56e85c9477523ebf22972f13d97741cba70f580e -
Trigger Event:
release
-
Statement type:
File details
Details for the file prompt_orchestrator-0.1.3-py3-none-any.whl.
File metadata
- Download URL: prompt_orchestrator-0.1.3-py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9f3431fa5dde03db33f47f32ebe81cf2ba9cc30e41680466fcfea62b9e697f3
|
|
| MD5 |
98a2b9037a9abfcd3226c40112f0795e
|
|
| BLAKE2b-256 |
8c50a6609aa90f006ed71c5e2aa8e388187157ce48c36e1df4a4c1e57e5ddd49
|
Provenance
The following attestation bundles were made for prompt_orchestrator-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on VeryComplexAndLongName/PromptOrchestrator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
prompt_orchestrator-0.1.3-py3-none-any.whl -
Subject digest:
e9f3431fa5dde03db33f47f32ebe81cf2ba9cc30e41680466fcfea62b9e697f3 - Sigstore transparency entry: 1662394303
- Sigstore integration time:
-
Permalink:
VeryComplexAndLongName/PromptOrchestrator@56e85c9477523ebf22972f13d97741cba70f580e -
Branch / Tag:
refs/tags/0.1.3 - Owner: https://github.com/VeryComplexAndLongName
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@56e85c9477523ebf22972f13d97741cba70f580e -
Trigger Event:
release
-
Statement type: