Vector Vein inspired agent framework with cycle runtime, tools and memory management
Project description
vv-agent
A lightweight agent framework extracted from VectorVein's production runtime. Cycle-based execution with pluggable LLM backends, tool dispatch, memory compression, and distributed scheduling.
Architecture
AgentRuntime
├── CycleRunner # single LLM turn: context -> completion -> tool calls
├── ToolCallRunner # tool dispatch, directive convergence (finish/wait_user/continue)
├── RuntimeHookManager # before/after hooks for LLM, tool calls, memory compaction
├── MemoryManager # automatic history compression when context exceeds threshold
└── ExecutionBackend # cycle loop scheduling
├── InlineBackend # synchronous (default)
├── ThreadBackend # thread pool with futures
└── CeleryBackend # distributed, per-cycle Celery task dispatch
Core types live in vv_agent.types: AgentTask, AgentResult, Message, CycleRecord, ToolCall.
Task completion is tool-driven: the agent calls task_finish or ask_user to signal terminal states. No implicit "last message = answer" heuristics.
Setup
cp local_settings.example.py local_settings.py
# Fill in your API keys and endpoints in local_settings.py
uv sync --dev
uv run pytest
Quick Start
CLI
uv run vv-agent --prompt "Summarize this framework" --backend moonshot --model kimi-k2.5
# With per-cycle logging
uv run vv-agent --prompt "Summarize this framework" --backend moonshot --model kimi-k2.5 --verbose
CLI flags: --settings-file, --backend, --model, --verbose.
Programmatic
from vv_agent.config import build_openai_llm_from_local_settings
from vv_agent.runtime import AgentRuntime
from vv_agent.tools import build_default_registry
from vv_agent.types import AgentTask
llm, resolved = build_openai_llm_from_local_settings("local_settings.py", backend="moonshot", model="kimi-k2.5")
runtime = AgentRuntime(llm_client=llm, tool_registry=build_default_registry())
result = runtime.run(AgentTask(
task_id="demo",
model=resolved.model_id,
system_prompt="You are a helpful assistant.",
user_prompt="What is 1+1?",
))
print(result.status, result.final_answer)
SDK
from vv_agent.sdk import AgentSDKClient, AgentSDKOptions
client = AgentSDKClient(options=AgentSDKOptions(
settings_file="local_settings.py",
default_backend="moonshot",
default_model="kimi-k2.5",
))
result = client.run("Explain Python's GIL in one sentence.")
print(result.final_answer)
SDK Workspace Override (Session/Task)
AgentSDKOptions.workspace is the SDK default workspace. You can override it per one-shot run, or bind a fixed workspace to a session.
Priority for workspace resolution is:
- Explicit
workspacepassed torun(...)/query(...)/create_session(...) AgentSDKOptions.workspace
from vv_agent.sdk import AgentSDKClient, AgentSDKOptions
client = AgentSDKClient(options=AgentSDKOptions(
settings_file="local_settings.py",
default_backend="moonshot",
default_model="kimi-k2.5",
workspace="./workspace/default",
))
# One-shot override: this run uses ./workspace/task-a
run = client.run(prompt="Create notes.md", workspace="./workspace/task-a")
# Session override: all turns in this session stay in ./workspace/session-b
session = client.create_session(workspace="./workspace/session-b")
session.prompt("Create todo.md")
session.follow_up("Append one more todo item")
session.continue_run()
Notes:
AgentSession.workspaceis fixed at session creation time.prompt()/continue_run()/follow_up()all execute in that same session workspace.session.cancel()requests cancellation for the currently running prompt in that session.- Top-level SDK helpers
vv_agent.sdk.run(...)andvv_agent.sdk.query(...)also acceptworkspace=....
Shell Runtime Configuration (Windows)
bash runtime defaults are a startup/session configuration, not tool-call arguments.
- Global defaults:
AgentSDKOptions.bash_shell,AgentSDKOptions.windows_shell_priority,AgentSDKOptions.bash_env - Per-agent override:
AgentDefinition.bash_shell,AgentDefinition.windows_shell_priority,AgentDefinition.bash_env - Recommended Windows priority:
["git-bash", "powershell", "cmd"] - On Windows, bash-tool child processes default
PYTHONUTF8=1andPYTHONIOENCODING=utf-8unless already overridden via the parent environment orbash_env. - On Windows, bash-tool child processes are launched with hidden-console flags so GUI hosts can run
bash/powershellcommands without flashing a terminal window. run(...)andcreate_session(...)both inherit startup shell defaults.- The
bashtool schema description includes a runtime shell hint (resolved shell kind + invocation prefix), so the model sees which shell command style is expected before calling the tool. - The runtime shell hint is frozen per task/session-run to keep tool schemas stable across cycles and preserve LLM prompt cache efficiency.
- SDK/CLI-generated tasks now also attach structured
system_prompt_sectionsmetadata to the system message, so Anthropic prompt-cache breakpoints can keep the stable prompt prefix hot while treating current time and session-memory blocks as volatile.
from vv_agent.sdk import AgentDefinition, AgentSDKClient, AgentSDKOptions
client = AgentSDKClient(
options=AgentSDKOptions(
settings_file="local_settings.py",
default_backend="moonshot",
windows_shell_priority=["git-bash", "powershell", "cmd"],
bash_env={"PIP_INDEX_URL": "https://pypi.tuna.tsinghua.edu.cn/simple"},
),
agents={
"desktop": AgentDefinition(
description="Desktop helper",
model="kimi-k2.5",
# Optional hard override for this agent only:
bash_shell=None,
bash_env={"HTTP_PROXY": "http://127.0.0.1:7890"},
)
},
)
Execution Backends
The cycle loop is delegated to a pluggable ExecutionBackend.
| Backend | Use case |
|---|---|
InlineBackend |
Default. Synchronous, single-process. |
ThreadBackend |
Thread pool. Non-blocking submit() returns a Future. |
CeleryBackend |
Distributed. Each cycle dispatched as an independent Celery task. |
CeleryBackend
Two modes:
- Inline fallback (no
RuntimeRecipe): cycles run in-process, same asInlineBackend. - Distributed (with
RuntimeRecipe): each cycle is a Celery task. Workers rebuild theAgentRuntimefrom the recipe and load state from a sharedStateStore(SQLite or Redis).
from vv_agent.runtime.backends.celery import CeleryBackend, RuntimeRecipe, register_cycle_task
register_cycle_task(celery_app)
recipe = RuntimeRecipe(
settings_file="local_settings.py",
backend="moonshot",
model="kimi-k2.5",
workspace="./workspace",
)
backend = CeleryBackend(celery_app=app, state_store=store, runtime_recipe=recipe)
runtime = AgentRuntime(llm_client=llm, tool_registry=registry, execution_backend=backend)
Install celery extras: uv sync --extra celery.
Cancellation and Streaming
from vv_agent.runtime import CancellationToken, ExecutionContext
# Cancel from another thread
token = CancellationToken()
ctx = ExecutionContext(cancellation_token=token)
result = runtime.run(task, ctx=ctx)
def on_stream_event(event: dict) -> None:
if event.get("event") == "assistant_delta":
print(event.get("content_delta", ""), end="")
# Stream LLM output events, including assistant deltas and tool progress
ctx = ExecutionContext(stream_callback=on_stream_event)
result = runtime.run(task, ctx=ctx)
Runtime Log Payloads
tool_result runtime events carry full tool output in content and any structured tool payload in metadata (no implicit truncation of content).
content_preview and assistant_preview are still emitted for UI convenience.
If you need shorter previews for logs/transport, configure an explicit preview limit:
from vv_agent.sdk import AgentSDKOptions
options = AgentSDKOptions(
settings_file="local_settings.py",
default_backend="moonshot",
log_preview_chars=220, # optional: enable preview truncation explicitly
)
Workspace Backends
Workspace file I/O is delegated to a pluggable WorkspaceBackend protocol. All built-in file tools (read_file, write_file, list_files, etc.) go through this abstraction.
list_files includes built-in safety defaults for large workspaces:
- Returns at most
500paths per call by default (max_resultscan tune this, with hard cap). - Uses
ripgrep(rg) for fast local traversal when available, with automatic fallback to Python walk. workspace_grepalso usesrgfor local workspaces (with Python fallback), defaults to smart-case matching (lowercase patterns are case-insensitive; patterns with uppercase stay case-sensitive), and skips hidden/common dependency roots unless explicitly included.workspace_grepreturns model-facing grep text inToolExecutionResult.content, while structured matches/counts live inToolExecutionResult.metadata.- When listing from workspace root, common dependency/cache roots (for example
node_modules,.venv,.git) are summarized instead of expanded. - You can still inspect those paths explicitly by setting
pathto that directory (or by settinginclude_ignored=true). - Supports
scan_limitto stop early on very large trees; when triggered, response setscount_is_estimate=true.
| Backend | Use case |
|---|---|
LocalWorkspaceBackend |
Default. Reads/writes to a local directory with path-escape protection. |
MemoryWorkspaceBackend |
Pure in-memory dict storage. Great for testing and sandboxed runs. |
S3WorkspaceBackend |
S3-compatible object storage (AWS S3, Aliyun OSS, MinIO, Cloudflare R2). |
from vv_agent.workspace import LocalWorkspaceBackend, MemoryWorkspaceBackend
# Explicit local backend
runtime = AgentRuntime(
llm_client=llm,
tool_registry=registry,
workspace_backend=LocalWorkspaceBackend(Path("./workspace")),
)
# In-memory backend for testing
runtime = AgentRuntime(
llm_client=llm,
tool_registry=registry,
workspace_backend=MemoryWorkspaceBackend(),
)
S3WorkspaceBackend
Install the optional S3 dependency: uv pip install 'vv-agent[s3]'.
from vv_agent.workspace import S3WorkspaceBackend
backend = S3WorkspaceBackend(
bucket="my-bucket",
prefix="agent-workspace",
endpoint_url="https://oss-cn-hangzhou.aliyuncs.com", # or None for AWS
aws_access_key_id="...",
aws_secret_access_key="...",
addressing_style="virtual", # "path" for MinIO
)
Custom Backend
Implement the WorkspaceBackend protocol (8 methods) to plug in any storage:
from vv_agent.workspace import WorkspaceBackend
class MyBackend:
def list_files(self, base: str, glob: str) -> list[str]: ...
def read_text(self, path: str) -> str: ...
def read_bytes(self, path: str) -> bytes: ...
def write_text(self, path: str, content: str, *, append: bool = False) -> int: ...
def file_info(self, path: str) -> FileInfo | None: ...
def exists(self, path: str) -> bool: ...
def is_file(self, path: str) -> bool: ...
def mkdir(self, path: str) -> None: ...
Modules
| Module | Description |
|---|---|
vv_agent.runtime.AgentRuntime |
Top-level state machine (completed / wait_user / max_cycles / failed) |
vv_agent.runtime.CycleRunner |
Single LLM turn and cycle record construction |
vv_agent.runtime.ToolCallRunner |
Tool execution with directive convergence |
vv_agent.runtime.RuntimeHookManager |
Hook dispatch (before/after LLM, tool call, memory compact) |
vv_agent.runtime.StateStore |
Checkpoint persistence protocol (InMemoryStateStore / SqliteStateStore / RedisStateStore) |
vv_agent.memory.MemoryManager |
Context compression when history exceeds threshold |
vv_agent.workspace |
Pluggable file storage: LocalWorkspaceBackend, MemoryWorkspaceBackend, S3WorkspaceBackend |
vv_agent.tools |
Built-in tools: workspace I/O, todo, bash, image, sub-agents, skills |
vv_agent.sdk |
High-level SDK: AgentSDKClient, AgentSession, AgentResourceLoader |
vv_agent.skills |
Agent Skills support (SKILL.md parsing, validation, unified normalization, prompt rendering with budget management, activate_skill tool) |
vv_agent.llm.VVLlmClient |
Unified LLM interface via vv-llm (endpoint rotation, retry, streaming) |
vv_agent.config |
Model/endpoint/key resolution from local_settings.py |
Memory Compaction
MemoryManager now measures context size in tokens and compacts history when a model-derived auto-compaction threshold is exceeded.
- Task-level knobs:
memory_compact_threshold(default128000, legacy fallback only when token counting is unavailable)memory_threshold_percentage(warning threshold percentage, default90)
- SDK mapping:
AgentDefinition.memory_compact_thresholdAgentDefinition.memory_threshold_percentageAgentSDKClient.prepare_task(...)forwards both values toAgentTask.
- Token budget model:
effective_context_window = model_context_window - reserved_output_tokensautocompact_threshold = effective_context_window - autocompact_buffer_tokens- Defaults come from
vv-llmmodel metadata when available, otherwise fall back to200000 / 16000 / 13000
- Effective-length strategy (backend-aligned):
- If previous cycle token usage exists:
effective_length = previous_prompt_tokens + token_count(recent_tool_messages)
- Otherwise fallback to:
vv_llm.chat_clients.utils.get_message_token_counts(...)- If tokenizer resolution fails, use a local CJK-aware estimate
- If previous cycle token usage exists:
- Compaction pipeline:
- Preemptive microcompact: clear old large tool results when usage crosses
microcompact_trigger_ratio - Session Memory extraction: persist key facts before full summarization so they survive later compactions
- Structural cleanup (stale tool calls, orphan tool messages, assistant-no-tool collapse, old tool result artifactization)
- If still over threshold, generate a compressed memory summary that preserves original user messages, file operations, current work state, and resolved errors
- If the provider still returns prompt-too-long, retry with forced compaction once, then progressively stronger emergency tail-dropping
- After full compaction, re-inject relevant workspace files into
<Post-Compaction File Context>under a bounded token budget
- Preemptive microcompact: clear old large tool results when usage crosses
- Session Memory behavior:
- Stored in
workspace/.memory/session/<session-or-task-scope>/session_memory.jsonby default - Scoped to the current session when
metadata.session_idis present; otherwise scoped to the currenttask_id - New sessions/tasks start without inherited Session Memory from previous sessions/tasks
- Injected into the first system message on every cycle as
<Session Memory> - Extraction reuses the configured memory summary backend/model
- Full compaction resets transcript tracking but preserves persisted memory entries
- Sub-tasks disable Session Memory by default to avoid parent/child memory-file contamination
- Stored in
Runtime metadata keys
Pass these via AgentTask.metadata:
memory_keep_recent_messagesmodel_context_windowreserved_output_tokensautocompact_buffer_tokensmicrocompact_trigger_ratiomicrocompact_keep_recent_cyclesmicrocompact_min_result_lengthmicrocompact_compactable_toolsinclude_memory_warningsession_memory_enabled/enable_session_memorysession_memory_min_tokenssession_memory_max_tokenssession_memory_min_text_messagessession_memory_storage_dirtool_result_compact_thresholdtool_result_keep_lasttool_result_excerpt_headtool_result_excerpt_tailtool_calls_keep_lastassistant_no_tool_keep_lasttool_result_artifact_dirsummary_event_limit
Memory summary model selection priority
Priority is strict:
AgentTask.metadatamemory_summary_backend/memory_summary_model- aliases:
compress_memory_summary_backend/compress_memory_summary_model - aliases:
memory_compress_backend/memory_compress_model
local_settings.pyconstantsDEFAULT_USER_MEMORY_SUMMARIZE_BACKEND/DEFAULT_USER_MEMORY_SUMMARIZE_MODEL- aliases:
DEFAULT_MEMORY_SUMMARIZE_BACKEND/DEFAULT_MEMORY_SUMMARIZE_MODEL - aliases:
VV_AGENT_MEMORY_SUMMARY_BACKEND/VV_AGENT_MEMORY_SUMMARY_MODEL
- Fallback
- runtime
default_backend+ current taskmodel
- runtime
Built-in Tools
list_files, file_info, read_file, write_file, file_str_replace, workspace_grep, compress_memory, todo_write, task_finish, ask_user, bash, read_image, create_sub_task, sub_task_status.
Custom tools can be registered via ToolRegistry.register().
The bash tool supports two background paths:
- Explicit background: pass
run_in_background=true, receive asession_idimmediately, then poll withcheck_background_command. - Timeout handoff: if a foreground command reaches
timeout, it is moved into a background session instead of failing immediately. The tool returns asession_id, and the session emits terminal background-command events when that process completes, fails, or times out.
Sub-agents
Configure named sub-agents on AgentTask.sub_agents. The parent agent delegates work via create_sub_task: use agent_id to select the target sub-agent, task_description for one task, tasks for batch mode, and wait_for_completion=false to start background sub-tasks. Each sub-agent gets its own runtime, model, and tool set. The default system prompt automatically injects the callable sub-agent list, including each agent_id and description, so the model can choose directly.
Each delegated sub-task now runs in a real AgentSession (session id defaults to the sub-task id). Tool payloads include session_id, and runtime events include stable identifiers (task_id / session_id) so host apps can subscribe, persist, and stream sub-task progress independently, including sub_agent_assistant_delta and sub_agent_tool_call_progress events.
Batch mode in create_sub_task dispatches valid sub-task items through the runtime execution backend's parallel_map, so synchronous batches run concurrently when the backend supports parallel execution.
Use sub_task_status to query sub-task states, inspect lightweight progress snapshots (detail_level=snapshot), or send follow-up messages to running/completed sub-tasks. When you run agents through AgentSDKClient.create_session(), the sub-task registry stays attached to that session, so later turns can still query background sub-tasks created earlier in the same session.
Before a completed sub-task is resumed, the runtime now sanitizes the saved session transcript: empty assistant turns, thinking-only turns, orphaned tool results, and unresolved tail tool calls are removed so the next follow-up prompt resumes from a coherent history.
Sub-task runtime metadata now includes task_id, session_id, and browser_scope_key for each sub-agent run, so session-scoped tools (for example, browser controllers) stay isolated across parallel sub-tasks.
Host apps can interrupt a currently running sub-agent by calling vv_agent.runtime.engine.steer_sub_agent_session(session_id=..., prompt=...).
When a sub-agent uses a different model from the parent, the runtime needs settings_file and default_backend to resolve the LLM client.
Examples
24 numbered examples in examples/. See examples/README.md for the full list.
uv run python examples/01_quick_start.py
uv run python examples/24_workspace_backends.py
Testing
uv run pytest # unit tests (no network)
uv run ruff check . # lint
uv run ty check # type check
V_AGENT_RUN_LIVE_TESTS=1 uv run pytest -m live # integration tests (needs real LLM)
Environment variables for live tests:
| Variable | Default | Description |
|---|---|---|
V_AGENT_LOCAL_SETTINGS |
local_settings.py |
Settings file path |
V_AGENT_LIVE_BACKEND |
moonshot |
LLM backend |
V_AGENT_LIVE_MODEL |
kimi-k2.5 |
Model name |
V_AGENT_ENABLE_BASE64_KEY_DECODE |
- | Set 1 to enable base64 API key decoding |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vv_agent-0.1.62.tar.gz.
File metadata
- Download URL: vv_agent-0.1.62.tar.gz
- Upload date:
- Size: 128.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec2565717cc1cfb9c55c487257df71d74d888d1033efcd7159be962fb503b630
|
|
| MD5 |
138f8ccb44dce1a79004a69281d454aa
|
|
| BLAKE2b-256 |
0e166e0e7c4c30089690fecaf654923eebb6c6adae27d6de6e4222da7b3a35bf
|
Provenance
The following attestation bundles were made for vv_agent-0.1.62.tar.gz:
Publisher:
release.yml on AndersonBY/vv-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vv_agent-0.1.62.tar.gz -
Subject digest:
ec2565717cc1cfb9c55c487257df71d74d888d1033efcd7159be962fb503b630 - Sigstore transparency entry: 1439840028
- Sigstore integration time:
-
Permalink:
AndersonBY/vv-agent@5ea46dc4c93a0891245652905a72042252da7176 -
Branch / Tag:
refs/tags/v0.1.62 - Owner: https://github.com/AndersonBY
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5ea46dc4c93a0891245652905a72042252da7176 -
Trigger Event:
push
-
Statement type:
File details
Details for the file vv_agent-0.1.62-py3-none-any.whl.
File metadata
- Download URL: vv_agent-0.1.62-py3-none-any.whl
- Upload date:
- Size: 167.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a298f9a95bd4a965020519ecb0f3270e33f42305807edae11a5c778d191ef01b
|
|
| MD5 |
12d50c3b64615f97b9f79167d0ac823d
|
|
| BLAKE2b-256 |
9791eec4217c566c7f4de4fc8d440f7f35d60738aaa88388433cc80b3581ea49
|
Provenance
The following attestation bundles were made for vv_agent-0.1.62-py3-none-any.whl:
Publisher:
release.yml on AndersonBY/vv-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vv_agent-0.1.62-py3-none-any.whl -
Subject digest:
a298f9a95bd4a965020519ecb0f3270e33f42305807edae11a5c778d191ef01b - Sigstore transparency entry: 1439840041
- Sigstore integration time:
-
Permalink:
AndersonBY/vv-agent@5ea46dc4c93a0891245652905a72042252da7176 -
Branch / Tag:
refs/tags/v0.1.62 - Owner: https://github.com/AndersonBY
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5ea46dc4c93a0891245652905a72042252da7176 -
Trigger Event:
push
-
Statement type: