Skip to main content

One More Dimension — lightweight LLM agent framework with tool-calling, skills, and file tools

Project description

OMD — One More Dimension

A lightweight Python framework for building LLM agents with tool-calling support. OMD provides a unified OpenAI-compatible client, a ReAct-style agentic loop, and automatic JSON schema generation from plain Python functions.

Features

  • Unified LLM clientBaseClient abstraction for any OpenAI-compatible HTTP API; swap providers by implementing a single method.
  • ReAct agentBaseAgent runs an iterative tool-calling loop with configurable iteration limits and automatic final-answer forcing.
  • Tool systemTool.from_callable() converts a typed Python function (with Sphinx-style docstring) into an OpenAI tool definition automatically.
  • Skills — drop a SKILL.md into a directory and the agent can pick it up automatically (LLM router) or load it on demand via load_skill tool.
  • File tools — line-precise read_fragment / read_lines / insert_lines / delete_lines with optional workspace-root sandbox.
  • Machinery integrations (optional) — ready-made functions for Stability AI image generation, Gemini reference-guided images, and Veo video generation.
  • Pydantic modelsModel and Tool are Pydantic v2 models with strict validation and clean serialization.

Installation

From PyPI (once published):

pip install omd
# or
uv add omd

From source:

git clone https://github.com/mrYush/omd.git
cd omd
uv pip install -e .

Quick Start

1. Call an LLM

from omd.clients import ApiBarClient, Model

client = ApiBarClient(
    model=Model(
        name="gpt-4",
        url="https://api.openai.com/v1/chat/completions",
        api_token="sk-...",
    )
)

# or, if MODEL_NAME / MODEL_URL / MODEL_API_TOKEN are set in the environment:
client = ApiBarClient(model=Model.from_env())

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
]

response = client.call_model(messages=messages)

2. Define Tools from Functions

Tool.from_callable() extracts the function name, docstring, and type hints to build an OpenAI-compatible JSON schema:

from omd.tools import Tool


def search(query: str, top_k: int = 5) -> list[str]:
    """Search the web for relevant information.

    :param query: Search query.
    :param top_k: Number of results to return.
    :returns: List of search results.
    """
    return [f"result for {query}"]


tool = Tool.from_callable(search)

3. Run an Agent

BaseAgent implements a ReAct-style agentic loop — it sends messages to the LLM, executes any requested tool calls, appends the results, and repeats until the model produces a final text answer or the iteration limit is reached.

from omd.agents import BaseAgent
from omd.clients import ApiBarClient, Model
from omd.tools import Tool


def search(query: str) -> str:
    """Search the web for information.

    :param query: Search query.
    :returns: Search results.
    """
    return f"Results for: {query}"


def create_image(prompt: str) -> str:
    """Generate an image from a text description.

    :param prompt: Image description.
    :returns: Path to generated image.
    """
    return f"/images/{prompt.replace(' ', '_')}.png"


client = ApiBarClient(
    model=Model(
        name="gpt-4",
        url="https://api.openai.com/v1/chat/completions",
        api_token="sk-...",
    )
)

agent = BaseAgent(
    client=client,
    tools=[
        Tool.from_callable(search),
        Tool.from_callable(create_image),
    ],
    tool_choice="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Search for 'Python async' and create an image of a snake"},
]

new_messages = agent.run(messages=messages, attempts_limit=10)

4. Manual Tool-Calling

If you need lower-level control, pass raw tool definitions directly to the client:

tools = [
    {
        "type": "function",
        "function": {
            "name": "search",
            "description": "Search the web.",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"],
            },
        },
    },
]

response = client.call_model(
    messages=messages,
    tools=tools,
    tool_choice="auto",
    parallel_tool_calls=True,
)

Agent Architecture

BaseAgent implements the ReAct (Reasoning + Acting) pattern:

┌─────────────────────────────────────────────────────────────┐
│                       BaseAgent.run()                       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  1. Send messages + tools → LLM                             │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │  LLM response   │
                    └─────────────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
              ▼                               ▼
    ┌──────────────────┐           ┌──────────────────┐
    │  Has tool_calls  │           │  Final answer    │
    └──────────────────┘           └──────────────────┘
              │                               │
              ▼                               ▼
┌──────────────────────────┐       ┌──────────────────┐
│ 2. Execute each tool     │       │ Return new       │
│    tool.invoke(args)     │       │ messages         │
└──────────────────────────┘       └──────────────────┘
              │
              ▼
┌──────────────────────────┐
│ 3. Append results        │
│    to messages           │
└──────────────────────────┘
              │
              ▼
┌──────────────────────────┐
│ 4. Loop back to step 1   │
└──────────────────────────┘
Component Role
BaseClient Abstract HTTP call to an LLM API
Tool Python function wrapper with JSON schema
BaseAgent Agentic loop orchestrator
messages Conversation history (mutated in-place)

The attempts_limit parameter controls the maximum number of iterations. On the final iteration tools are disabled (tools=None) to force the LLM to produce a text answer and prevent infinite tool-call loops.

Skills

A skill is a folder containing a SKILL.md file with YAML-like front-matter and a free-form Markdown body. The body explains how a particular role uses the available tools — it is appended to the system prompt only when the skill is selected.

SKILL.md format

---
name: file-editor
description: Edit text files by line range. Use when the user asks to read fragments or modify specific lines.
recommended_tools: [read_fragment, read_lines, insert_lines, delete_lines]
version: 0.1.0
---
# File Editor

Workflow:
1. Read the target range first.
2. Insert / delete with line-precise tools.
3. Re-read to verify.

Discovery

SkillRegistry discovers skills from three sources:

  1. Built-in skills shipped with the package (omd/skills/builtin/): file-editor, code-reviewer, test-writer. Disable via include_builtin=False.
  2. Directories listed in the OMD_SKILLS_PATH environment variable (os.pathsep-separated). Disable via read_env=False.
  3. Directories passed to the dirs= constructor argument (highest precedence).

Hybrid selection

The agent can use skills in two complementary ways:

  • Auto-router — a small LLM call before the main loop picks the most relevant skills based on the user's request and name + description metadata; their bodies are prepended to the system prompt.
  • On-demand toolslist_skills and load_skill tools let the agent itself fetch a skill body mid-conversation when the router missed.
from omd.agents import BaseAgent
from omd.clients import ApiBarClient, Model
from omd.skills import SkillRegistry, SkillRouter
from omd.tools import make_file_tools

client = ApiBarClient(model=Model.from_env())
registry = SkillRegistry(dirs=["./my-skills"])

agent = BaseAgent(
    client=client,
    tools=make_file_tools(workspace_root="./project"),
    skills=registry,
    skill_router=SkillRouter(client, max_skills=2),
    auto_skill_routing=True,    # default — pre-select skills via router
    expose_skill_tools=True,    # default — also expose list_skills/load_skill
)

agent.run(
    messages=[{"role": "user", "content": "Refactor utils.py and add tests"}],
    attempts_limit=10,
)

File tools

omd.tools.make_file_tools() returns a ready-to-use set of line-precise file manipulation tools:

Tool Purpose
read_fragment(path, offset=1, limit=200) Read up to limit lines starting at offset
read_lines(path, start, end) Read the inclusive 1-indexed line range
insert_lines(path, line, content) Insert content before the given line
delete_lines(path, start, end) Delete the inclusive 1-indexed range

All paths are 1-indexed and the responses include LINE|content prefixes so the model never has to count lines manually. Pass workspace_root=Path("./project") to refuse any path that resolves outside that directory; omit it for an unrestricted toolset.

from omd.tools import make_file_tools

tools = make_file_tools(workspace_root="./project")

The standalone callables (omd.tools.read_fragment, read_lines, insert_lines, delete_lines) are also exported for direct use without an agent.

Machinery (Optional Integrations)

The omd.machinery subpackage provides ready-made functions for external media generation services. These are optional — the core agent/client/tool system works without them.

Stability AI — Image Generation

from omd.machinery import generate_image

url = generate_image(
    prompt="A serene mountain landscape at sunrise",
    aspect_ratio="16:9",
    output_format="png",
)

Image-to-image with strength control:

url = generate_image(
    prompt="Transform into a watercolor painting",
    image_path="/path/to/input.png",
    strength=0.5,
)

Stability AI — Background Replacement

from omd.machinery import replace_background

url = replace_background(
    subject_image_path="/path/to/person.png",
    background_prompt="modern office with large windows and city view",
    light_source_direction="right",
)

url = replace_background(
    subject_image_path="/path/to/person.png",
    background_reference_path="/path/to/beach_bg.jpg",
    preserve_original_subject=0.9,
)

Gemini — Reference-Guided Image Generation

from omd.machinery import generate_image_gemini

url = generate_image_gemini(
    prompt="A portrait in the style of the reference",
    style_reference_image="/path/to/reference.png",
)

Veo — Video Generation

generate_video_veo produces short videos (4-8 seconds) with native audio via Google Veo API. Supports text-to-video and image-to-video.

Under the hood the function:

  1. Submits a predictLongRunning request to the Gemini API (via api-bar)
  2. Polls the operation every 10 seconds until completion
  3. Downloads the generated MP4 video
  4. Uploads it to MinIO and returns a presigned URL
from omd.machinery import generate_video_veo

# Text-to-video
url = generate_video_veo(
    prompt="A drone shot of a sunset over the ocean, cinematic, warm tones",
)

# Portrait video, Full HD
url = generate_video_veo(
    prompt='Close-up of a barista pouring latte art. She says "Almost perfect".',
    aspect_ratio="9:16",
    resolution="1080p",
    duration_seconds="8",
)

# Image-to-video (first frame from an image)
url = generate_video_veo(
    prompt="The cat slowly opens its eyes and stretches",
    reference_image="https://example.com/sleeping_cat.png",
)

Veo Model Selection

Model When to use
veo-3.0-fast-generate-001 Default. Veo 3.0, cost-effective
veo-3.0-generate-001 Veo 3.0, baseline quality
veo-3.1-generate-preview Preview 3.1: highest quality
veo-3.1-fast-generate-preview Preview 3.1: faster than full 3.1
veo-3.1-lite-generate-preview Preview 3.1: lightweight/cheapest in the 3.1 line

Override per call with model="..." or change the default via the _VEO_DEFAULT_MODEL constant in omd.machinery.veo.

Veo Parameters

Parameter Default Description
prompt (required) Text description. Use quotes for dialogue, explicit words for sounds
reference_image None Starting frame: file path, URL, or data URL
aspect_ratio "16:9" "16:9" (landscape) or "9:16" (portrait)
resolution "720p" "720p", "1080p", or "4k"
duration_seconds "8" "4", "6", or "8" (1080p/4k require "8")
model "veo-3.0-fast-generate-001" See model table above
timeout_s 600.0 Max wait time for generation (seconds)

Prompt Tips

  • Composition: subject, action, style, camera motion, ambiance
  • Dialogue: A man says "Let's go!" and grabs his coat
  • Sound: thunder rumbling, rain hitting the window
  • Style: cinematic, anime, stop-motion, film noir
  • Camera: dolly shot, aerial view, close-up, POV shot

Configuration

Environment Variables

Variable Description Required
MODEL_URL Chat completions endpoint URL Yes (for client)
MODEL_NAME Model name (e.g. gpt-4, llama-3) Yes (for client)
MODEL_API_TOKEN API key for authorization Yes (for client)
STABILITY_API_KEY Stability AI API key For omd.machinery.sd
STABILITY_BASE_URL Stability AI base URL For omd.machinery.sd
API_BAR_TOKEN Api-bar gateway token For omd.machinery.gemini / veo
MINIO_* MinIO connection settings For media upload (image_utils, veo)
MINIO_PRESIGNED_URL_EXPIRES_IN Presigned URL TTL in seconds (default: 3600) No

Project Structure

src/omd/
├── __init__.py
├── agents/
│   ├── __init__.py
│   └── base.py             # BaseAgent — ReAct-style agentic loop with skills
├── clients/
│   ├── __init__.py
│   ├── base.py             # BaseClient — abstract interface
│   ├── apibar.py           # ApiBarClient — OpenAI-compatible implementation
│   └── data_models.py      # Model — Pydantic client config (with from_env)
├── tools/
│   ├── __init__.py
│   ├── data_models.py      # Tool — auto-generated JSON schemas from functions
│   └── file_tools.py       # read_fragment / read_lines / insert / delete
├── skills/
│   ├── __init__.py
│   ├── data_models.py      # Skill — Pydantic skill record
│   ├── parser.py           # SKILL.md front-matter parser
│   ├── registry.py         # SkillRegistry — discovery and lookup
│   ├── router.py           # SkillRouter — LLM-driven selection
│   ├── tools.py            # list_skills / load_skill agent tools
│   └── builtin/            # file-editor, code-reviewer, test-writer
├── machinery/              # Optional: Stability AI / Gemini / Veo
├── prompts/
│   └── base.py             # PromptBase, ListPrompt — system prompt builders
└── utils/
    └── logging_utils.py    # truncate_text, serialize_for_log

Model.from_env() reads MODEL_NAME, MODEL_URL, and MODEL_API_TOKEN lazily — import omd no longer requires any environment variables.

Development

Setup

git clone https://github.com/mrYush/omd.git
cd omd
uv sync --all-extras

Running Tests

uv run pytest

Linting

uv run ruff check src/ tests/
uv run ruff format --check src/ tests/

Roadmap

  • Payload normalization across providers (max_tokens vs max_completion_tokens, etc.)
  • Legacy tool-calling schema support (functions / function_call)
  • Retry logic with exponential backoff
  • Streaming responses
  • Additional client implementations (Anthropic, Google AI, etc.)
  • Richer skill metadata (capabilities, dependencies, examples) and a CLI to scaffold/validate skill packages

License

MIT — Copyright (c) 2026 mrYush

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omd-0.2.0.tar.gz (48.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omd-0.2.0-py3-none-any.whl (57.9 kB view details)

Uploaded Python 3

File details

Details for the file omd-0.2.0.tar.gz.

File metadata

  • Download URL: omd-0.2.0.tar.gz
  • Upload date:
  • Size: 48.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.1

File hashes

Hashes for omd-0.2.0.tar.gz
Algorithm Hash digest
SHA256 564cbccf6c4b50c5aa8e53fa38889121c8b387bea53644e40bd54b2f69370167
MD5 27781ced7419de4fa80f07edc84d3600
BLAKE2b-256 98da55dc7615f38be35058eb916705abfed1bf5d201cacd3772ee4cdf28dfee9

See more details on using hashes here.

File details

Details for the file omd-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: omd-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 57.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.1

File hashes

Hashes for omd-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f82798e851dee0fdea054086bdf97848d22cd1d5622fa2a8c596ec47e0e5bc02
MD5 7ba1346087b302052c6e6abe65c13cc4
BLAKE2b-256 77f1ce42af049159e72a496176e0c24e9059b0101dd14010c429fc4d6212c9c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page