Skip to main content

Lightweight AI Agents SDK for building intelligent automation systems

Project description

MoonLight

Minimal async AI agent framework with zero bloat

Python 3.14+ License: MIT PyPI version PyPI Downloads

[!IMPORTANT] Status: sunset as of 0.3.0. Moonlight isn't actively maintained anymore, and 0.3.0 is the last planned release for a while.

When I started this, a tiny provider-agnostic agent layer was genuinely useful. It's less so now. The big providers (Anthropic, OpenAI, and the rest) ship their own agent SDKs that are more capable and far better supported than anything I'd keep up with on my own.

So Moonlight is a personal research bed now: a small, readable codebase I use to prototype ideas about how agents get built. It's still MIT-licensed and still works, so fork it, learn from it, or build on it.

Moonlight is a lightweight SDK for building AI agents with full control. You get async stateful agents, multimodal input/output (text, images, vision), image generation, structured responses via Pydantic or dataclasses, automatic model validation, and built-in retries. It works with any OpenAI-compatible provider and with Anthropic. No vendor SDKs, no hidden abstractions, no framework bloat.

Installation

pip install moonlight-ai

Works with uv too: uv pip install moonlight-ai.

Quick Start

import asyncio
from moonlight import Provider, Agent, Content

# Configure provider
provider = Provider(
    source="openrouter",  # or "openai", "deepseek", "anthropic", a custom URL
    api="your-api-key"
)

# Create agent
analysis_agent = Agent(
    provider=provider,
    model="qwen/qwen3-4b:free",
    system_role="You are a data analyst"
)

# Analyze some data
data = """
Q1 Sales: $125k, Q2 Sales: $157k, Q3 Sales: $198k, Q4 Sales: $223k
Top product: Widget A (45% revenue), Customer satisfaction: 4.2/5
"""

prompt = Content(f"Analyze this business data and provide key insights:\n{data}")

# Run async
response = asyncio.run(analysis_agent.run(prompt))
print(response.content)
# Output: The business shows strong growth momentum with 78% increase from Q1 to Q4...

Core Features

Structured Output

Return type-safe responses using Pydantic models or dataclasses:

from pydantic import BaseModel
from typing import List
from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

class Entity(BaseModel):
    name: str
    type: str  # person, organization, location, etc.
    mentions: int

class Analysis(BaseModel):
    sentiment: Sentiment
    confidence: float
    key_topics: List[str]
    entities: List[Entity]
    summary: str

sentiment_agent = Agent(
    provider=provider,
    model="qwen/qwen3-4b:free",
    output_schema=Analysis  # Automatic JSON mode + validation
)

text = """
Apple Inc. announced record quarterly earnings today, with CEO Tim Cook 
praising the team's innovation. The iPhone 15 sales exceeded expectations 
in Asian markets, particularly China and India.
"""

result: Analysis = asyncio.run(sentiment_agent.run(Content(f"Analyze this text:\n{text}")))
print(result.sentiment)
# Output: Sentiment.POSITIVE

print(result.confidence)
# Output: 0.92

print(result.entities[0].name)
# Output: Apple Inc.

print(result.summary)
# Output: Apple reports strong earnings driven by iPhone 15 success in Asia

The SDK automatically:

  • Enables JSON mode on the provider
  • Injects schema into system prompt
  • Validates and parses response into your model
  • Handles nested structures and optional fields
  • Self-corrects on validation failure: shows the model its error and retries (schema_retries, default 2) before falling back to the raw response

Tool Calling (Explicit, Schema-Driven)

Moonlight does not let the model execute tools. Instead, the model produces a structured action or parameter object, and your code executes the function. This keeps all control flow in your program and works with any provider or model.

Philosophy

Moonlight follows a simple workflow:

  1. The agent produces structured parameters
  2. Your code executes the function
  3. Your code decides the next step

Basic Example

from pydantic import BaseModel
from typing import Literal
import asyncio

# Define your tool's parameter schema
class SearchParams(BaseModel):
    query: str
    max_results: int = 5
    date_filter: str = "any"

# Create agent with structured output
tool_agent = Agent(
    provider=provider,
    model="qwen/qwen3-coder:free",
    output_schema=SearchParams,
    system_role="You help users search. Output only the search parameters needed."
)

# Get parameters from agent
params = asyncio.run(tool_agent.run(
    Content("Find recent papers about quantum computing, limit to 3 results")
))

print(params.query)
# Output: "quantum computing papers 2024"

print(params.max_results)
# Output: 3

# Execute the actual tool with the parameters
results = search_papers(query=params.query, max_results=params.max_results)
print(results)
# Output: [{'title': 'Advances in Quantum...', 'authors': [...], ...}, ...]

Multi-Step Workflow with Explicit Control

For complex workflows requiring multiple steps, compose agents explicitly:

from enum import Enum
from textwrap import dedent
from typing import Optional

class ActionType(str, Enum):
    SEARCH = "search"
    FETCH = "fetch"
    SUMMARIZE = "summarize"
    DONE = "done"

class ActionPlan(BaseModel):
    action: ActionType
    reasoning: str
    query: Optional[str] = None
    url: Optional[str] = None
    content: Optional[str] = None

system_role = dedent("""
You are a research assistant that plans actions step by step.

Fields explanation:
- action: The type of action to take (SEARCH, FETCH, SUMMARIZE, or DONE)
- reasoning: Brief explanation of why you chose this action
- query: Search query string (only fill when action=SEARCH)
- url: Web URL to fetch (only fill when action=FETCH)
- content: Text content to summarize (only fill when action=SUMMARIZE)

For each response, choose ONE action and fill ONLY the relevant field:
- If action=SEARCH: Fill 'query' with search terms. Leave url and content as None.
- If action=FETCH: Fill 'url' with the web address. Leave query and content as None.
- If action=SUMMARIZE: Fill 'content' with text to summarize. Leave query and url as None.
- If action=DONE: Leave query, url, and content as None.

Always fill 'reasoning' and 'action'. Only fill the one optional field relevant to your chosen action.
""")

planner = Agent(
    provider=provider,
    model="anthropic/claude-opus-4.5",
    output_schema=ActionPlan,
    system_role=system_role
)

# Explicit multi-step loop with full visibility and control
question = "What are the latest breakthroughs in fusion energy?"
context = {"question": question, "findings": []}

for step in range(5):  # Maximum 5 steps
    # Get next action from planner
    plan = asyncio.run(planner.run(
        Content(f"Question: {question}\n\nContext so far: {context}\n\nWhat should we do next?")
    ))
  
    print(f"Step {step + 1}: {plan.action} - {plan.reasoning}")
  
    # Execute action based on plan
    if plan.action == ActionType.SEARCH:
        results = web_search(plan.query)
        context["findings"].append({"type": "search", "query": plan.query, "results": results})
  
    elif plan.action == ActionType.FETCH:
        content = fetch_url(plan.url)
        context["findings"].append({"type": "fetch", "url": plan.url, "content": content})
  
    elif plan.action == ActionType.SUMMARIZE:
        summary = summarize_text(plan.content)
        context["summary"] = summary
  
    elif plan.action == ActionType.DONE:
        print("Research complete")
        break
  
    # Add custom guards and budgets
    if len(context["findings"]) > 10:
        print("Maximum findings reached, stopping")
        break

print(f"Final answer: {context.get('summary', 'No conclusion reached')}")

Why Not Implicit Tool Calling?

Implicit systems hide control flow inside the LLM. Moonlight keeps it explicit:

  • Works with any model (no provider lock-in)
  • Debuggable (inspect every step)
  • Composable (chain agents and tools freely)
  • Controllable (add retries, guards, budgets)
  • Testable (mock tool outputs)

Comparison:

# Implicit: Hidden loop
agent.run(tools=[search, fetch])

# Explicit: Visible control
params = agent.run(...)
result = search(**params)

Explicit control trades minor convenience for production-grade reliability, debuggability, and flexibility.

Multimodal Input

Send images alongside text (URLs, local files, or base64):

response = asyncio.run(agent.run(
    Content(
        text="What's in these images?",
        images=[
            "https://example.com/image.jpg",   # URL
            "/path/to/local/image.png",        # Local file
            "data:image/jpeg;base64,..."       # Base64
        ]
    )
))
print(response.content)
# Output: The first image shows a sunset over mountains...

Images are automatically:

  • Downloaded from URLs (async)
  • Read from disk with proper MIME types
  • Converted to base64 data URIs
  • Validated and filtered

Image Generation

Generate images directly from text prompts using multimodal models:

import asyncio
import base64

image_agent = Agent(
    provider=provider,
    model="google/gemini-3-pro-preview",  # or other image-capable models
    image_gen=True  # Enable image generation mode
)

# Run the image generation prompt
response = asyncio.run(image_agent.run(
    Content("Create a serene mountain landscape at sunset with a lake reflection")
))

# This prints the text description or caption returned by the model (if any)
print(response.content)
# Example output: "A calm mountain lake at sunset with orange and purple skies..."

# If the model returned images, they will be in response.images as base64 data URLs
if response.images:
    for i, img_url in enumerate(response.images):

        # This prints the raw base64 data URL (useful for debugging or logging)
        print(f"Generated image (base64 data URL): {img_url}")

        # Image is in "data:image/...;base64,XXXX" format.
        # strip the header and keep only the base64 part
        img_base64 = img_url.split(",", 1)[1]

        # Decode base64 into raw image bytes
        img_bytes = base64.b64decode(img_base64)

        # Save the decoded image bytes to a PNG file on disk
        out_path = f"generated_{i}.png"
        with open(out_path, "wb") as f:
            f.write(img_bytes)

        # This prints where the image was saved on disk
        print(f"Saved image to {out_path}")

The SDK automatically:

  • Validates model supports image output
  • Enables multimodal mode (["text", "image"])
  • Returns base64-encoded images in response
  • Prevents using image_gen with output_schema (incompatible)

Note: Image generation and structured output schemas are mutually exclusive.

Conversation History

Agents maintain stateful conversation history:

# DeepSeek provider example
deepseek_provider = Provider(source="deepseek", api="your-deepseek-key")
agent = Agent(provider=deepseek_provider, model="deepseek-chat")

# First turn
asyncio.run(agent.run(Content("My name is Alice")))

# Second turn (agent remembers context)
response = asyncio.run(agent.run(Content("What's my name?")))
print(response.content)
# Output: Your name is Alice

# Clear history
agent.clear()

# Update system role mid-conversation
agent.update_system_role("You are now a pirate")

Context Management (Auto-Summarization)

Long conversations are kept within the model's context window automatically. When usage crosses summarize_threshold of the context length (default 0.85), the oldest turns are folded into a running summary that lives in the system role and then dropped, while the most recent keep_recent messages stay verbatim.

agent = Agent(
    provider=provider,
    model="anthropic/claude-opus-4.5",
    summarize_threshold=0.85,  # compact at 85% of the context window (0 disables)
    keep_recent=2,             # keep the last N messages verbatim
)

# Just keep talking. Compaction happens on its own when the window fills up.
for message in conversation:
    await agent.run(Content(message))

It is reactive: the check runs when you send a new message, using the previous turn's token count, and the summary itself is a plain side-call to the same model. It only engages when the provider reports a context length (surfaced as ModelInfo.compactable). For providers that don't report one (such as OpenAI or DeepSeek), it fails open and leaves history untouched.

Web Search & Grounding

Agents can search the web and ground their answers in real sources. With web_search=True, run() first does a short research loop: the model decides whether it actually needs to search and, if so, proposes queries (it searches only when needed and reuses anything already gathered). Moonlight runs each query (DuckDuckGo via ddgs) and fetches the pages (Scrapy), up to max_search_iterations searches. The gathered results are folded into the prompt and answered through the normal flow, so structured output and everything else still apply.

agent = Agent(
    provider=provider,
    model="anthropic/claude-opus-4.5",
    web_search=True,
    max_search_iterations=3,   # cap on searches per run
    max_verify_iterations=2,   # cap on fact-check passes (0 disables)
)

resp = await agent.run(Content("What changed in the latest Python release?"))
print(resp.content)   # answer grounded in the fetched pages

It composes with structured output: set output_schema and the grounded answer comes back as a validated instance.

class Summary(BaseModel):
    headline: str
    points: list[str]

agent = Agent(provider=provider, model="...", web_search=True, output_schema=Summary)
result = await agent.run(Content("Summarize the latest Python release."))
print(result.headline, result.points)

How it works:

  • Searches only when needed: a small JSON decision format lets the model decide whether to search, what to search, or to stop. It works the same on OpenAI-compatible providers and Anthropic.
  • Grounded then answered: the fetched page text is folded into the prompt and the answer runs through the normal flow, so output_schema, persistence, and token tracking all apply.
  • No proof, no claim: the answer is held to the sources. A fact-check pass (up to max_verify_iterations) re-reads the results and drops anything they don't support; if a claim is plausible but unproven it can run one more search to try to confirm it before dropping it. A loose match (a similar name, a shared username) is not treated as proof, and when nothing answers the request the agent says so rather than guessing.
  • Context-light: once the answer is produced, the bulky search results are dropped from history (only the question, a short marker, and the answer are kept), so grounding doesn't bloat later turns.
  • Bounded: at most max_search_iterations searches per run, plus at most max_verify_iterations fact-check passes.

Notes:

  • This is text grounding. Pages are fetched as static HTML, so JavaScript-rendered content can come back thin, and DuckDuckGo can rate-limit. Per-page text is capped to keep token use bounded, so very long pages are clipped.
  • Because grounding is strict, fields that the fetched pages don't actually cover may come back empty rather than filled from the model's own memory. That is intended.
  • web_search cannot be combined with image_gen.

Provider Support

Moonlight speaks two wire formats: the OpenAI-compatible API (/chat/completions) and Anthropic's Messages API (/messages). It picks the right one per provider, and Anthropic is auto-detected from the source, so there's nothing extra to set up.

# Built-in shortcuts
Provider(source="openai",     api="sk-...")
Provider(source="deepseek",   api="sk-...")
Provider(source="openrouter", api="sk-...")
Provider(source="together",   api="...")
Provider(source="groq",       api="gsk-...")
Provider(source="anthropic",  api="sk-ant-...")   # auto-selects the Anthropic format

# Any other OpenAI-compatible endpoint via full URL
Provider(source="http://localhost:11434/v1", api="ollama")  # local Ollama / vLLM
Provider(source="https://generativelanguage.googleapis.com/v1beta/openai/", api="...")  # Google AI
Provider(source="https://api.custom.com/v1", api="key")

Built-in shortcuts: OpenAI, DeepSeek, Together, Groq, OpenRouter, Anthropic. Any other OpenAI-compatible endpoint (Google AI, Hugging Face, local servers, gateways) works via its full URL.

The structured-output and image-generation helpers currently assume OpenAI-compatible providers. Plain chat completions work on Anthropic today.

Model Validation

Agents automatically validate model capabilities on initialization:

agent = Agent(
    provider=provider,
    model="qwen/qwen3-4b:free",  # Checks if model exists in given provider
    max_completion_tokens=8192,  # Validates against model limits
    image_gen=True               # Validates against model limits
)

# Automatically checks:
# - Endpoint compatibility
# - Model exists in provider
# - Context length and max_completion_tokens
# - Input modalities (text, image, audio, video)
# - Output modalities (text, image)
# - Reasoning capability support

Validation prevents runtime errors by checking:

  • Endpoint Compatibility: Checks whether the Provider source has the necessary routes to be compatible
  • Model existence: Ensures the model is available from the provider
  • Token limits: Validates max_completion_tokens doesn't exceed model capacity
  • Modality support: Verifies model supports requested input/output types (images, video, etc.)
  • Image generation: Confirms model can generate images when image_gen=True

Capabilities are read from each provider's /models endpoint and normalized into one shape. Providers report very different metadata (or none at all), so when a capability can't be determined Moonlight fails open and won't block a request on something it couldn't verify. See structures.txt for the per-provider response shapes.

Errors are raised immediately during agent initialization with clear messages:

try:
    agent = Agent(
        provider=provider,
        model="qwen-4b",
        image_gen=True
    )
except AgentError as e:
    print(e)
    # Output: This model does not support image generation

Reliability

Every provider call retries transient failures automatically. That covers network errors and the retryable status codes (408, 429, 500, 502, 503, 504), using exponential backoff with jitter and honoring a server's Retry-After header when present. Permanent errors (400 / 401 / 403 / 404) fail fast instead of burning attempts. Agents get this out of the box (two retries by default), and GetCompletion exposes max_retries and retry_backoff if you call it directly.

Token Tracking

Agents track token usage automatically:

agent = Agent(provider=provider, model="anthropic/claude-opus-4.5")
asyncio.run(agent.run(Content("Hello")))

print(agent.get_total_tokens())
# Output: 156 (total tokens used)

Error Handling

Detailed error messages from providers with intelligent parsing:

response = asyncio.run(agent.run(Content("...")))

if response.error:
    print(f"Error: {response.error}")
    # Output: Error: Rate limited - too many requests
else:
    print(response.content)
    # Output: [normal response content]

Handles:

  • Invalid credentials (401): Expired OAuth tokens, invalid API keys
  • Rate limits (429): Too many requests, retry guidance
  • Content moderation (403): Specific reasons for flagged content
  • Parameter errors (400): Detailed validation messages (e.g., "max_tokens exceeds limit")
  • Insufficient credits (402): Clear payment/billing errors
  • Timeout errors (408): Request took too long
  • Provider errors (502): Model down or invalid response
  • Routing errors (503): No provider meets requirements
  • Provider-specific errors: Parses nested JSON error structures

Error messages include context from provider metadata when available, making debugging easier.

Design Philosophy

Moonlight is intentionally minimal:

  • No framework lock-in: Standard Python async, bring your own orchestration
  • No hidden magic: Direct API calls, explicit control flow
  • No bloat: Zero dependencies on vendor SDKs or heavy frameworks
  • Full control: Access raw responses, customize at any level
  • Provider agnostic: Any OpenAI-compatible API, plus Anthropic

What Moonlight Doesn't Do

To stay lightweight, Moonlight does not include:

  • Multi-agent orchestration and runner engines (sequential, parallel, and data sharing are just await and asyncio.gather over run(), so a dedicated engine would only add a single-use wrapper)
  • Audio and video output (there is no provider-agnostic standard for these over chat completions, unlike text and image, so supporting them would mean per-provider special cases). Image output is supported where the provider returns it.
  • RAG systems or vector databases
  • Streaming responses
  • Observability or logging (in development)
  • MCP (Model Context Protocol) integration (in consideration)

These are left to you or future extensions to keep the core minimal.

Advanced Configuration

agent = Agent(
    provider=provider,
    model="mistralai/devstral-2512:free",
    system_role="You are an expert analyst",
    output_schema=MyModel,    # Optional structured output
    image_gen=False,          # Enable image generation (conflicts with output_schema)
    schema_retries=2,         # Self-correction attempts on schema-validation failure
    summarize_threshold=0.85, # Auto-summarize history near the context limit (0 disables)
    keep_recent=2,            # Recent messages kept verbatim when summarizing
    web_search=False,         # Ground answers in web search (conflicts with image_gen)
    max_search_iterations=3,  # Cap on searches per grounded run
    max_verify_iterations=2,  # Cap on fact-check passes for a grounded answer (0 disables)
    temperature=0.7,
    top_p=0.9,
    top_k=40,
    max_completion_tokens=2048,
    frequency_penalty=0.5,
    presence_penalty=0.5,
    repetition_penalty=1.1
)

# Access history
messages = agent.get_history()

# Token usage
tokens = agent.get_total_tokens()

Supported Parameters:

  • temperature, top_p, top_k: Sampling parameters
  • max_completion_tokens, max_output_tokens: Token limits
  • frequency_penalty, presence_penalty, repetition_penalty: Repetition control
  • tools, tool_choice: Tool calling configuration (planned)
  • plugins: Provider-specific plugins
  • reasoning, verbosity: Control reasoning traces

Agent-level params (not forwarded to the provider): schema_retries, summarize_threshold, keep_recent, web_search, max_search_iterations, max_verify_iterations. See the sections above.

Building From Source

# Clone repo
git clone https://github.com/ecstra/moonlight.git
cd moonlight

# Build distribution
pip install build twine
python -m build

# Install locally
pip install dist/moonlight_ai-*.whl

# Test
python -c "from moonlight import Agent; print('OK')"

Local Setup (No Build)

Moonlight is pure Python, so you can vendor it into a project instead of installing from PyPI. Copy the moonlight/ folder next to your script, install the runtime deps, and import it. It's the same vendored layout test.py in this repo uses.

# from your project root, with the moonlight/ folder copied in
pip install -r requirements.txt   # httpx, pydantic, requests, ddgs, scrapy
# your_script.py  (sits next to the moonlight/ folder)
import asyncio
from moonlight import Provider, Agent, Content

provider = Provider(source="deepseek", api="your-deepseek-key")
agent = Agent(provider=provider, model="deepseek-chat")

response = asyncio.run(agent.run(Content("Hello!")))
print(response.content)

Layout:

your-project/
├── moonlight/        # the copied folder
└── your_script.py

License

MIT License - use freely in personal and commercial projects.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moonlight_ai-0.3.0.tar.gz (57.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moonlight_ai-0.3.0-py3-none-any.whl (50.4 kB view details)

Uploaded Python 3

File details

Details for the file moonlight_ai-0.3.0.tar.gz.

File metadata

  • Download URL: moonlight_ai-0.3.0.tar.gz
  • Upload date:
  • Size: 57.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for moonlight_ai-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b4543c614a81c0fa69f6d901bb3b170f118700882e1968e7dc5607faff9d754b
MD5 f23e97bdda4c6b82112d8008d79b9d73
BLAKE2b-256 c29c9a13f49d5c172c1bae39b09a47ec51be3f23e41cef84bc051b71a8a80778

See more details on using hashes here.

File details

Details for the file moonlight_ai-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: moonlight_ai-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 50.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for moonlight_ai-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 765143dc9833c544bb58d229a86d0692c5833a2bd9a4a848542f9bf4873b9f2d
MD5 a0bcf81ee5fc2ac819702f0e8a51f40f
BLAKE2b-256 7a63e0a158987c0a79e817b4930c9f911a2ed21c385a47ef1c8066c2d51f66c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page