Production-grade AI agent framework — cost governance, memory, caching, multi-agent teams, and built-in eval

These details have not been verified by PyPI

Project links

Project description

Helix

A Python framework for building production AI agents.

Helix gives you agents that actually behave in production: hard budget limits, semantic caching that cuts API costs by 40-70%, persistent memory, multi-agent teams, YAML-based task pipelines, and a 5-scorer eval suite. It works out of the box with OpenAI, Anthropic, Gemini, Groq, Mistral, and 8 other providers.

The import helix API is intentionally close to what you already know from AutoGen and CrewAI, but with the production layer those frameworks leave to you: cost governance, caching, memory, observability, and safety controls.

Installation
Quickstart
Agents
Tools
Tasks and Pipelines
YAML Configuration
Multi-Agent Teams
Group Chat
Workflows
Sessions
Budget Enforcement
Evaluation
Framework Adapters
CLI
Architecture
Supported Providers
Contributing

Installation

pip install helix-framework                        # core only (pydantic required)
pip install "helix-framework[gemini]"              # + Google Gemini (free tier available)
pip install "helix-framework[openai,anthropic]"    # + OpenAI and Anthropic
pip install "helix-framework[all]"                 # all providers

From source:

git clone https://github.com/sarcasticdhruv/helix-agent
cd helix-agent
pip install -e ".[all]"

API key setup

The easiest way is the persistent config store:

helix config set GOOGLE_API_KEY    "AIza..."    # Gemini, free tier works fine
helix config set OPENAI_API_KEY    "sk-..."
helix config set ANTHROPIC_API_KEY "sk-ant-..."

Keys are saved to ~/.helix/config.json. Helix picks the best available model automatically when multiple keys are set.

Or use environment variables directly:

# Linux / macOS
export GOOGLE_API_KEY="AIza..."

# Windows PowerShell
$env:GOOGLE_API_KEY = "AIza..."

Quickstart

import helix

agent = helix.Agent(
    name="Researcher",
    role="Research analyst",
    goal="Find accurate, cited answers.",
)

result = helix.run(agent, "What is quantum entanglement?")
print(result.output)
print(f"Cost:  ${result.cost_usd:.4f}")
print(f"Steps: {result.steps}")

Inside an async function, call run_async or agent.run directly:

import asyncio
import helix

async def main():
    agent = helix.Agent(
        name="Researcher",
        role="Research analyst",
        goal="Find accurate answers.",
    )
    result = await agent.run("What is quantum entanglement?")
    print(result.output)

asyncio.run(main())

Agents

import helix

agent = helix.Agent(
    name="Analyst",
    role="Senior data analyst",
    goal="Analyze datasets and produce concise summaries.",

    # Optional: rich background context that shapes agent behaviour
    backstory=(
        "You have 8 years of experience in financial data analysis. "
        "You prefer bullet-point summaries over long prose."
    ),

    # Model selection with automatic fallback
    model=helix.ModelConfig(
        primary="gpt-4o",
        fallback_chain=["gpt-4o-mini", "gemini-2.0-flash"],
        temperature=0.3,
    ),

    # Hard cost limit
    budget=helix.BudgetConfig(budget_usd=1.00),
    mode=helix.AgentMode.PRODUCTION,

    # Memory
    memory=helix.MemoryConfig(short_term_limit=20),

    # Semantic caching (40-70% cost reduction on repeated queries)
    cache=helix.CacheConfig(enabled=True, semantic_threshold=0.92),
)

result = helix.run(agent, "Summarize last quarter's sales trends.")

AgentResult fields: output, cost_usd, steps, model_used, cache_hits, cache_savings_usd, tool_calls, run_id, duration_s, trace.

Tools

import helix

@helix.tool(
    description="Search the web for current information.",
    timeout=15.0,
    retries=2,
)
async def web_search(query: str, max_results: int = 5) -> list:
    # your implementation here
    return [{"title": "...", "url": "...", "snippet": "..."}]


@helix.tool(description="Read a file from disk.")
async def read_file(path: str) -> str:
    with open(path) as f:
        return f.read()


agent = helix.Agent(
    name="Researcher",
    role="Research analyst",
    goal="Find answers using web search.",
    tools=[web_search, read_file],
)

result = helix.run(agent, "What are the latest AI headlines?")

Built-in tools (12 included):

import helix.tools.builtin  # registers tools globally

# web_search, fetch_url, read_file, write_file, list_directory,
# calculator, json_query, get_datetime, get_env,
# text_stats, extract_urls, sleep

Tasks and Pipelines

Tasks are first-class declarative units of work. They chain outputs together, support output validation with guardrails, and can write results to files. This is the Helix equivalent of CrewAI's Task + crew.kickoff().

import helix

researcher = helix.Agent(
    name="Researcher",
    role="Research analyst",
    goal="Find accurate information on {topic}.",
    backstory="You specialize in academic and technical research.",
)
writer = helix.Agent(
    name="Writer",
    role="Technical writer",
    goal="Write clear articles on {topic}.",
)

research = helix.Task(
    description="Research the latest advances in {topic}.",
    expected_output="A list of 5 key findings with sources.",
    agent=researcher,
)
article = helix.Task(
    description="Write a 3-paragraph article based on the research.",
    expected_output="A well-structured article, no jargon.",
    agent=writer,
    context=[research],        # automatically receives research output
    output_file="article.md",  # saved to disk when done
)

pipeline = helix.Pipeline(tasks=[research, article])
result = pipeline.kickoff(inputs={"topic": "quantum computing"})
print(result.final_output)
print(f"Total cost: ${result.total_cost_usd:.4f}")

Task options:

Parameter	Description
`context`	List of Tasks whose outputs are passed as context
`output_schema`	Pydantic model for structured output
`guardrail`	Validation function or string description
`guardrails`	List of validation functions (chained)
`guardrail_max_retries`	How many times to retry on validation failure (default 3)
`output_file`	Path to write the task output
`async_execution`	Run this task concurrently with others
`callback`	Called with `TaskOutput` after completion
`markdown`	Instruct the agent to format output as Markdown

Validation with guardrails:

from helix import Task, TaskOutput

def must_be_under_300_words(result: TaskOutput):
    words = len(result.raw.split())
    if words > 300:
        return False, f"Too long: {words} words (max 300)"
    return True, result.raw

task = helix.Task(
    description="Write a short summary of {topic}.",
    expected_output="A summary under 300 words.",
    agent=writer,
    guardrail=must_be_under_300_words,
    guardrail_max_retries=2,
)

You can also pass a plain string and Helix uses the agent's own LLM to validate:

task = helix.Task(
    description="Write a product description for {product}.",
    expected_output="A concise, professional product description.",
    agent=writer,
    guardrail="Must be professional, under 100 words, and avoid superlatives.",
)

Accessing task output:

result = pipeline.kickoff(inputs={"topic": "AI safety"})

for task_output in result.task_outputs:
    print(f"Task:  {task_output.summary}")
    print(f"Raw:   {task_output.raw}")
    if task_output.pydantic:
        print(f"Model: {task_output.pydantic}")

YAML Configuration

Define agents and tasks in YAML files for cleaner project structure:

# agents.yaml
researcher:
  role: Senior Research Analyst
  goal: Find cutting-edge developments in {topic}.
  backstory: You work at a leading tech think tank with access to academic databases.

writer:
  role: Content Strategist
  goal: Write engaging, accurate articles about {topic}.
  backstory: You have 5 years of experience writing technical content for developers.

# tasks.yaml
research_task:
  description: Research the latest developments in {topic}.
  expected_output: A structured report with at least 5 key findings.
  agent: researcher

write_task:
  description: Write a concise article based on the research.
  expected_output: A 3-paragraph article written for a developer audience.
  agent: writer
  context: [research_task]
  output_file: output/article.md

import helix

pipeline = helix.from_yaml(
    "agents.yaml",
    "tasks.yaml",
    inputs={"topic": "large language models"},
)
result = pipeline.kickoff()
print(result.final_output)

Or use the lower-level helpers:

from helix.core.yaml_config import load_agents, load_tasks, load_pipeline

agents   = load_agents("agents.yaml", inputs={"topic": "LLMs"})
tasks    = load_tasks("tasks.yaml", agents, inputs={"topic": "LLMs"})
pipeline = load_pipeline(tasks)
result   = pipeline.kickoff()

Multi-Agent Teams

Teams coordinate multiple agents with three execution strategies.

import helix

searcher = helix.Agent(name="Searcher", role="Web researcher",   goal="Find sources.")
analyst  = helix.Agent(name="Analyst",  role="Data analyst",     goal="Analyze data.")
writer   = helix.Agent(name="Writer",   role="Technical writer", goal="Write reports.")

# sequential: searcher output feeds into analyst, then into writer
team = helix.Team(
    name="research-team",
    agents=[searcher, analyst, writer],
    strategy="sequential",
    budget_usd=5.00,
)

result = team.run_sync("Write a report on renewable energy trends.")
print(result.final_output)
print(f"Total cost: ${result.total_cost_usd:.4f}")

Strategies:

sequential - each agent receives the previous agent's output as its input
parallel - all agents run on the same input concurrently, outputs returned as a list
hierarchical - a lead agent decomposes the task and delegates subtasks to specialists

lead = helix.Agent(name="Lead", role="Project lead", goal="Decompose and delegate tasks.")

team = helix.Team(
    name="product-team",
    agents=[searcher, analyst, writer],
    strategy="hierarchical",
    lead=lead,
)

Group Chat

Group chat puts multiple agents in a shared multi-turn conversation. This is Helix's equivalent of AutoGen's GroupChat.

import asyncio
import helix

ceo    = helix.ConversableAgent(name="CEO",    role="CEO",    goal="Make strategic decisions.")
cto    = helix.ConversableAgent(name="CTO",    role="CTO",    goal="Assess technical risk.")
lawyer = helix.ConversableAgent(name="Lawyer", role="Lawyer", goal="Flag compliance issues.")

chat = helix.GroupChat(
    agents=[ceo, cto, lawyer],
    max_rounds=6,
    speaker_selection="round_robin",  # or "auto", "random", or a callable
    termination_keyword="AGREED",
)

async def main():
    result = await chat.run("Should we migrate our core product to microservices?")
    print(result.transcript())
    print(f"Rounds: {result.rounds}, Cost: ${result.total_cost_usd:.4f}")

asyncio.run(main())

Speaker selection:

Value	Behavior
`round_robin`	Agents speak in order (default)
`auto`	A coordinator LLM picks the most relevant next speaker
`random`	Random selection each round
`callable`	`fn(agents, history) -> Agent`

Termination:

chat = helix.GroupChat(
    agents=[...],
    max_rounds=10,
    termination_keyword="FINAL ANSWER",
    termination_fn=lambda msgs: len(msgs) > 8,
)

Human in the loop:

human = helix.HumanAgent(name="You")   # prompts the terminal each turn

chat = helix.GroupChat(
    agents=[agent1, agent2, human],
    max_rounds=5,
)

Workflows

Workflows are step-based directed pipelines with retry, timeout, fallback, and branching.

import helix

@helix.step(name="search", retry=2, timeout_s=10.0)
async def search_step(query: str) -> list:
    return []  # your search implementation

@helix.step(name="summarise")
async def summarise_step(results: list) -> str:
    return "\n".join(str(r) for r in results)

pipeline = (
    helix.Workflow("research-pipeline")
    .then(search_step)
    .then(summarise_step)
    .with_budget(2.00)
)

result = pipeline.run_sync("quantum computing trends 2025")
print(result.final_output)

Sessions

Sessions give an agent persistent memory across multiple turns.

import asyncio
import helix

async def main():
    agent = helix.Agent(name="Bot", role="Assistant", goal="Help users.")
    session = helix.Session(agent=agent)
    await session.start()

    r1 = await session.send("My name is Alice.")
    r2 = await session.send("What is my name?")   # remembers: Alice
    print(r2.output)

    await session.end()

asyncio.run(main())

Budget Enforcement

import helix

agent = helix.Agent(
    name="Bot",
    role="Assistant",
    goal="Help users.",
    budget=helix.BudgetConfig(
        budget_usd=0.50,
        warn_at_pct=0.8,
        strategy=helix.BudgetStrategy.DEGRADE,  # step down to cheaper model instead of stopping
    ),
    mode=helix.AgentMode.PRODUCTION,
)

try:
    result = helix.run(agent, "Write a 10,000 word essay on climate change...")
except helix.BudgetExceededError as e:
    print(f"Budget hit: ${e.spent_usd:.4f} of ${e.budget_usd:.4f}")

With BudgetStrategy.DEGRADE, Helix steps down through the fallback chain as the budget depletes rather than stopping outright.

Evaluation

import asyncio
import helix
from helix.eval.suite import EvalSuite
from helix.config import EvalCase

suite = EvalSuite("qa-suite")
suite.add_cases([
    EvalCase(
        name="capital_cities",
        input="What is the capital of France?",
        expected_facts=["Paris"],
        max_cost_usd=0.05,
    ),
    EvalCase(
        name="math",
        input="What is 15% of 240?",
        expected_facts=["36"],
        max_cost_usd=0.05,
    ),
])

async def main():
    agent = helix.Agent(name="Bot", role="Assistant", goal="Answer questions accurately.")
    results = await suite.run(agent, verbose=True)
    print(f"Pass rate:  {results.pass_rate:.0%}")
    print(f"Total cost: ${results.total_cost_usd:.4f}")
    suite.assert_pass_rate(0.90)   # raises AssertionError if below 90%

asyncio.run(main())

The eval suite runs 5 scorers per case: factual accuracy, tool usage, trajectory adherence, cost efficiency, and output format.

Framework Adapters

Wrap existing LangChain, CrewAI, or AutoGen code with Helix cost governance:

from langchain_openai import ChatOpenAI
import helix

llm = helix.wrap_llm(ChatOpenAI(model="gpt-4o"), budget_usd=2.00)
# adds budget gate, cost tracking, tracing, and audit log to any LangChain LLM

from crewai import Crew
import helix

crew = Crew(agents=[...], tasks=[...])
wrapped = helix.from_crewai(crew, budget_usd=5.00)
result = await wrapped.run(inputs={"topic": "AI trends"})
print(f"Cost: ${wrapped.cost_usd:.4f}")

CLI

helix doctor                          # check environment and provider keys
helix models                          # list available models with pricing
helix cost --all                      # cost report across all runs
helix trace <run-id>                  # view a run trace
helix trace <run-id> --diff <run-id>  # compare two runs for divergence
helix replay <run-id>                 # interactive failure replay
helix config set KEY value            # set a provider API key

Architecture

helix/
├── core/            Agent, ConversableAgent, GroupChat, Task, Pipeline,
│                    Workflow, Team, Session, Tool
├── memory/          Short-term buffer, WAL-backed long-term store, episodic recall
├── cache/           Semantic cache (tier 1), plan cache (tier 2), prefix cache (tier 3)
├── models/          Router, complexity estimator, 12 provider backends
├── safety/          Cost governor, permission model, guardrails, HITL, audit log
├── context_engine/  Multi-factor token decay, context compactor, preflight estimator
├── eval/            EvalSuite, 5 scorers, trajectory eval, regression gate, monitor
├── observability/   Tracer, ghost debug resolver, failure replay
├── adapters/        LangChain, CrewAI, AutoGen + universal LLM wrapper
├── runtime/         Event loop, worker pool, health checks
└── cli/             doctor, models, cost, trace, replay, config, ...

Supported Providers

Environment variable	Provider	Models	Free tier
`GOOGLE_API_KEY`	Google Gemini	Gemini 2.5 Flash/Pro, 2.0 Flash	Yes
`OPENAI_API_KEY`	OpenAI	GPT-4o, GPT-4o-mini, o1, o3	No
`ANTHROPIC_API_KEY`	Anthropic	Claude Opus/Sonnet/Haiku	No
`GROQ_API_KEY`	Groq	Llama 3, Mixtral, Gemma	Yes
`MISTRAL_API_KEY`	Mistral AI	Mistral Large/Small, Codestral	Partial
`COHERE_API_KEY`	Cohere	Command R+	Partial
`TOGETHER_API_KEY`	Together AI	200+ open-source models	No
`OPENROUTER_API_KEY`	OpenRouter	100+ models	Partial
`DEEPSEEK_API_KEY`	DeepSeek	DeepSeek V3, R1	No
`XAI_API_KEY`	xAI	Grok	No
`PERPLEXITY_API_KEY`	Perplexity	Online search models	No
`FIREWORKS_API_KEY`	Fireworks	Fast open-source inference	No

Set multiple keys and Helix automatically falls back to the next available provider on failure.

Contributing

Read CONTRIBUTING.md before opening a PR.

git clone https://github.com/YOUR_USERNAME/helix-agent
cd helix-agent
pip install -e ".[dev,gemini]"
pytest tests/

Contributors

Name	Role
Dhruv Choudhary	Author and maintainer

License

See CHANGELOG.md for release history.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.4

Feb 26, 2026

0.3.3

Feb 24, 2026

This version

0.3.2

Feb 24, 2026

0.3.1

Feb 24, 2026

0.3.0

Feb 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helix_framework-0.3.2.tar.gz (139.8 kB view details)

Uploaded Feb 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

helix_framework-0.3.2-py3-none-any.whl (161.9 kB view details)

Uploaded Feb 24, 2026 Python 3

File details

Details for the file helix_framework-0.3.2.tar.gz.

File metadata

Download URL: helix_framework-0.3.2.tar.gz
Upload date: Feb 24, 2026
Size: 139.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for helix_framework-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`8e6a06f8bc13c5f8b19f3e8b55bff79b8ae26062dfd5ba23404bcf4f8442a67b`
MD5	`3614106cb789eaf17c773a8219c0159e`
BLAKE2b-256	`eaa1ff211f75c4961272cc885df6aa48dbd97925d044372412fa52873142190f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for helix_framework-0.3.2.tar.gz:

Publisher: publish.yml on sarcasticdhruv/helix-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: helix_framework-0.3.2.tar.gz
- Subject digest: 8e6a06f8bc13c5f8b19f3e8b55bff79b8ae26062dfd5ba23404bcf4f8442a67b
- Sigstore transparency entry: 987272976
- Sigstore integration time: Feb 24, 2026
Source repository:
- Permalink: sarcasticdhruv/helix-agent@3dad7ceafce09fe9d7cd17d1340be784b98fb3da
- Branch / Tag: refs/tags/v0.3.2
- Owner: https://github.com/sarcasticdhruv
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3dad7ceafce09fe9d7cd17d1340be784b98fb3da
- Trigger Event: push

File details

Details for the file helix_framework-0.3.2-py3-none-any.whl.

File metadata

Download URL: helix_framework-0.3.2-py3-none-any.whl
Upload date: Feb 24, 2026
Size: 161.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for helix_framework-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d5a4dcc0d61a9f1015acaa65fc52d6faefe28e0d2e01b2975ab4d2fce9f90faa`
MD5	`6464f5219b9f5fa01424432e90113d19`
BLAKE2b-256	`ae99d6afefe1bbd35e02367018d8123c7ca14abda4db330ea6f55760794a5883`

See more details on using hashes here.

Provenance

The following attestation bundles were made for helix_framework-0.3.2-py3-none-any.whl:

Publisher: publish.yml on sarcasticdhruv/helix-agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: helix_framework-0.3.2-py3-none-any.whl
- Subject digest: d5a4dcc0d61a9f1015acaa65fc52d6faefe28e0d2e01b2975ab4d2fce9f90faa
- Sigstore transparency entry: 987273104
- Sigstore integration time: Feb 24, 2026
Source repository:
- Permalink: sarcasticdhruv/helix-agent@3dad7ceafce09fe9d7cd17d1340be784b98fb3da
- Branch / Tag: refs/tags/v0.3.2
- Owner: https://github.com/sarcasticdhruv
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3dad7ceafce09fe9d7cd17d1340be784b98fb3da
- Trigger Event: push

helix-framework 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Helix

Table of Contents

Installation

API key setup

Quickstart

Agents

Tools

Tasks and Pipelines

YAML Configuration

Multi-Agent Teams

Group Chat

Workflows

Sessions

Budget Enforcement

Evaluation

Framework Adapters

CLI

Architecture

Supported Providers

Contributing

Contributors

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance