Production-grade AI agent framework — cost governance, memory, caching, multi-agent teams, and built-in eval
Project description
Helix
A Python framework for building production AI agents.
Helix gives you agents that actually behave in production: hard budget limits, semantic caching that cuts API costs by 40-70%, persistent memory, multi-agent teams, YAML-based task pipelines, and a 5-scorer eval suite. It works out of the box with OpenAI, Anthropic, Gemini, Groq, Mistral, and 8 other providers.
The import helix API is intentionally close to what you already know from AutoGen and CrewAI, but with the production layer those frameworks leave to you: cost governance, caching, memory, observability, and safety controls.
Table of Contents
- Installation
- Quickstart
- Agents
- Tools
- Tasks and Pipelines
- YAML Configuration
- Multi-Agent Teams
- Group Chat
- Workflows
- Sessions
- Budget Enforcement
- Evaluation
- Framework Adapters
- CLI
- Architecture
- Supported Providers
- Contributing
Installation
pip install helix-framework # core only (pydantic required)
pip install "helix-framework[gemini]" # + Google Gemini (free tier available)
pip install "helix-framework[openai,anthropic]" # + OpenAI and Anthropic
pip install "helix-framework[all]" # all providers
From source:
git clone https://github.com/sarcasticdhruv/helix-agent
cd helix-agent
pip install -e ".[all]"
API key setup
The easiest way is the persistent config store:
helix config set GOOGLE_API_KEY "AIza..." # Gemini, free tier works fine
helix config set OPENAI_API_KEY "sk-..."
helix config set ANTHROPIC_API_KEY "sk-ant-..."
Keys are saved to ~/.helix/config.json. Helix picks the best available model automatically when multiple keys are set.
Or use environment variables directly:
# Linux / macOS
export GOOGLE_API_KEY="AIza..."
# Windows PowerShell
$env:GOOGLE_API_KEY = "AIza..."
Quickstart
import helix
agent = helix.Agent(
name="Researcher",
role="Research analyst",
goal="Find accurate, cited answers.",
)
result = helix.run(agent, "What is quantum entanglement?")
print(result.output)
print(f"Cost: ${result.cost_usd:.4f}")
print(f"Steps: {result.steps}")
Inside an async function, call run_async or agent.run directly:
import asyncio
import helix
async def main():
agent = helix.Agent(
name="Researcher",
role="Research analyst",
goal="Find accurate answers.",
)
result = await agent.run("What is quantum entanglement?")
print(result.output)
asyncio.run(main())
Agents
import helix
agent = helix.Agent(
name="Analyst",
role="Senior data analyst",
goal="Analyze datasets and produce concise summaries.",
# Optional: rich background context that shapes agent behaviour
backstory=(
"You have 8 years of experience in financial data analysis. "
"You prefer bullet-point summaries over long prose."
),
# Model selection with automatic fallback
model=helix.ModelConfig(
primary="gpt-4o",
fallback_chain=["gpt-4o-mini", "gemini-2.0-flash"],
temperature=0.3,
),
# Hard cost limit
budget=helix.BudgetConfig(budget_usd=1.00),
mode=helix.AgentMode.PRODUCTION,
# Memory
memory=helix.MemoryConfig(short_term_limit=20),
# Semantic caching (40-70% cost reduction on repeated queries)
cache=helix.CacheConfig(enabled=True, semantic_threshold=0.92),
)
result = helix.run(agent, "Summarize last quarter's sales trends.")
AgentResult fields: output, cost_usd, steps, model_used, cache_hits, cache_savings_usd, tool_calls, run_id, duration_s, trace.
Tools
import helix
@helix.tool(
description="Search the web for current information.",
timeout=15.0,
retries=2,
)
async def web_search(query: str, max_results: int = 5) -> list:
# your implementation here
return [{"title": "...", "url": "...", "snippet": "..."}]
@helix.tool(description="Read a file from disk.")
async def read_file(path: str) -> str:
with open(path) as f:
return f.read()
agent = helix.Agent(
name="Researcher",
role="Research analyst",
goal="Find answers using web search.",
tools=[web_search, read_file],
)
result = helix.run(agent, "What are the latest AI headlines?")
Built-in tools (12 included):
import helix.tools.builtin # registers tools globally
# web_search, fetch_url, read_file, write_file, list_directory,
# calculator, json_query, get_datetime, get_env,
# text_stats, extract_urls, sleep
Tasks and Pipelines
Tasks are first-class declarative units of work. They chain outputs together, support output validation with guardrails, and can write results to files. This is the Helix equivalent of CrewAI's Task + crew.kickoff().
import helix
researcher = helix.Agent(
name="Researcher",
role="Research analyst",
goal="Find accurate information on {topic}.",
backstory="You specialize in academic and technical research.",
)
writer = helix.Agent(
name="Writer",
role="Technical writer",
goal="Write clear articles on {topic}.",
)
research = helix.Task(
description="Research the latest advances in {topic}.",
expected_output="A list of 5 key findings with sources.",
agent=researcher,
)
article = helix.Task(
description="Write a 3-paragraph article based on the research.",
expected_output="A well-structured article, no jargon.",
agent=writer,
context=[research], # automatically receives research output
output_file="article.md", # saved to disk when done
)
pipeline = helix.Pipeline(tasks=[research, article])
result = pipeline.kickoff(inputs={"topic": "quantum computing"})
print(result.final_output)
print(f"Total cost: ${result.total_cost_usd:.4f}")
Task options:
| Parameter | Description |
|---|---|
context |
List of Tasks whose outputs are passed as context |
output_schema |
Pydantic model for structured output |
guardrail |
Validation function or string description |
guardrails |
List of validation functions (chained) |
guardrail_max_retries |
How many times to retry on validation failure (default 3) |
output_file |
Path to write the task output |
async_execution |
Run this task concurrently with others |
callback |
Called with TaskOutput after completion |
markdown |
Instruct the agent to format output as Markdown |
Validation with guardrails:
from helix import Task, TaskOutput
def must_be_under_300_words(result: TaskOutput):
words = len(result.raw.split())
if words > 300:
return False, f"Too long: {words} words (max 300)"
return True, result.raw
task = helix.Task(
description="Write a short summary of {topic}.",
expected_output="A summary under 300 words.",
agent=writer,
guardrail=must_be_under_300_words,
guardrail_max_retries=2,
)
You can also pass a plain string and Helix uses the agent's own LLM to validate:
task = helix.Task(
description="Write a product description for {product}.",
expected_output="A concise, professional product description.",
agent=writer,
guardrail="Must be professional, under 100 words, and avoid superlatives.",
)
Accessing task output:
result = pipeline.kickoff(inputs={"topic": "AI safety"})
for task_output in result.task_outputs:
print(f"Task: {task_output.summary}")
print(f"Raw: {task_output.raw}")
if task_output.pydantic:
print(f"Model: {task_output.pydantic}")
YAML Configuration
Define agents and tasks in YAML files for cleaner project structure:
# agents.yaml
researcher:
role: Senior Research Analyst
goal: Find cutting-edge developments in {topic}.
backstory: You work at a leading tech think tank with access to academic databases.
writer:
role: Content Strategist
goal: Write engaging, accurate articles about {topic}.
backstory: You have 5 years of experience writing technical content for developers.
# tasks.yaml
research_task:
description: Research the latest developments in {topic}.
expected_output: A structured report with at least 5 key findings.
agent: researcher
write_task:
description: Write a concise article based on the research.
expected_output: A 3-paragraph article written for a developer audience.
agent: writer
context: [research_task]
output_file: output/article.md
import helix
pipeline = helix.from_yaml(
"agents.yaml",
"tasks.yaml",
inputs={"topic": "large language models"},
)
result = pipeline.kickoff()
print(result.final_output)
Or use the lower-level helpers:
from helix.core.yaml_config import load_agents, load_tasks, load_pipeline
agents = load_agents("agents.yaml", inputs={"topic": "LLMs"})
tasks = load_tasks("tasks.yaml", agents, inputs={"topic": "LLMs"})
pipeline = load_pipeline(tasks)
result = pipeline.kickoff()
Multi-Agent Teams
Teams coordinate multiple agents with three execution strategies.
import helix
searcher = helix.Agent(name="Searcher", role="Web researcher", goal="Find sources.")
analyst = helix.Agent(name="Analyst", role="Data analyst", goal="Analyze data.")
writer = helix.Agent(name="Writer", role="Technical writer", goal="Write reports.")
# sequential: searcher output feeds into analyst, then into writer
team = helix.Team(
name="research-team",
agents=[searcher, analyst, writer],
strategy="sequential",
budget_usd=5.00,
)
result = team.run_sync("Write a report on renewable energy trends.")
print(result.final_output)
print(f"Total cost: ${result.total_cost_usd:.4f}")
Strategies:
sequential- each agent receives the previous agent's output as its inputparallel- all agents run on the same input concurrently, outputs returned as a listhierarchical- a lead agent decomposes the task and delegates subtasks to specialists
lead = helix.Agent(name="Lead", role="Project lead", goal="Decompose and delegate tasks.")
team = helix.Team(
name="product-team",
agents=[searcher, analyst, writer],
strategy="hierarchical",
lead=lead,
)
Group Chat
Group chat puts multiple agents in a shared multi-turn conversation. This is Helix's equivalent of AutoGen's GroupChat.
import asyncio
import helix
ceo = helix.ConversableAgent(name="CEO", role="CEO", goal="Make strategic decisions.")
cto = helix.ConversableAgent(name="CTO", role="CTO", goal="Assess technical risk.")
lawyer = helix.ConversableAgent(name="Lawyer", role="Lawyer", goal="Flag compliance issues.")
chat = helix.GroupChat(
agents=[ceo, cto, lawyer],
max_rounds=6,
speaker_selection="round_robin", # or "auto", "random", or a callable
termination_keyword="AGREED",
)
async def main():
result = await chat.run("Should we migrate our core product to microservices?")
print(result.transcript())
print(f"Rounds: {result.rounds}, Cost: ${result.total_cost_usd:.4f}")
asyncio.run(main())
Speaker selection:
| Value | Behavior |
|---|---|
round_robin |
Agents speak in order (default) |
auto |
A coordinator LLM picks the most relevant next speaker |
random |
Random selection each round |
callable |
fn(agents, history) -> Agent |
Termination:
chat = helix.GroupChat(
agents=[...],
max_rounds=10,
termination_keyword="FINAL ANSWER",
termination_fn=lambda msgs: len(msgs) > 8,
)
Human in the loop:
human = helix.HumanAgent(name="You") # prompts the terminal each turn
chat = helix.GroupChat(
agents=[agent1, agent2, human],
max_rounds=5,
)
Workflows
Workflows are step-based directed pipelines with retry, timeout, fallback, and branching.
import helix
@helix.step(name="search", retry=2, timeout_s=10.0)
async def search_step(query: str) -> list:
return [] # your search implementation
@helix.step(name="summarise")
async def summarise_step(results: list) -> str:
return "\n".join(str(r) for r in results)
pipeline = (
helix.Workflow("research-pipeline")
.then(search_step)
.then(summarise_step)
.with_budget(2.00)
)
result = pipeline.run_sync("quantum computing trends 2025")
print(result.final_output)
Sessions
Sessions give an agent persistent memory across multiple turns.
import asyncio
import helix
async def main():
agent = helix.Agent(name="Bot", role="Assistant", goal="Help users.")
session = helix.Session(agent=agent)
await session.start()
r1 = await session.send("My name is Alice.")
r2 = await session.send("What is my name?") # remembers: Alice
print(r2.output)
await session.end()
asyncio.run(main())
Budget Enforcement
import helix
agent = helix.Agent(
name="Bot",
role="Assistant",
goal="Help users.",
budget=helix.BudgetConfig(
budget_usd=0.50,
warn_at_pct=0.8,
strategy=helix.BudgetStrategy.DEGRADE, # step down to cheaper model instead of stopping
),
mode=helix.AgentMode.PRODUCTION,
)
try:
result = helix.run(agent, "Write a 10,000 word essay on climate change...")
except helix.BudgetExceededError as e:
print(f"Budget hit: ${e.spent_usd:.4f} of ${e.budget_usd:.4f}")
With BudgetStrategy.DEGRADE, Helix steps down through the fallback chain as the budget depletes rather than stopping outright.
Evaluation
import asyncio
import helix
from helix.eval.suite import EvalSuite
from helix.config import EvalCase
suite = EvalSuite("qa-suite")
suite.add_cases([
EvalCase(
name="capital_cities",
input="What is the capital of France?",
expected_facts=["Paris"],
max_cost_usd=0.05,
),
EvalCase(
name="math",
input="What is 15% of 240?",
expected_facts=["36"],
max_cost_usd=0.05,
),
])
async def main():
agent = helix.Agent(name="Bot", role="Assistant", goal="Answer questions accurately.")
results = await suite.run(agent, verbose=True)
print(f"Pass rate: {results.pass_rate:.0%}")
print(f"Total cost: ${results.total_cost_usd:.4f}")
suite.assert_pass_rate(0.90) # raises AssertionError if below 90%
asyncio.run(main())
The eval suite runs 5 scorers per case: factual accuracy, tool usage, trajectory adherence, cost efficiency, and output format.
Framework Adapters
Wrap existing LangChain, CrewAI, or AutoGen code with Helix cost governance:
from langchain_openai import ChatOpenAI
import helix
llm = helix.wrap_llm(ChatOpenAI(model="gpt-4o"), budget_usd=2.00)
# adds budget gate, cost tracking, tracing, and audit log to any LangChain LLM
from crewai import Crew
import helix
crew = Crew(agents=[...], tasks=[...])
wrapped = helix.from_crewai(crew, budget_usd=5.00)
result = await wrapped.run(inputs={"topic": "AI trends"})
print(f"Cost: ${wrapped.cost_usd:.4f}")
CLI
helix doctor # check environment and provider keys
helix models # list available models with pricing
helix cost --all # cost report across all runs
helix trace <run-id> # view a run trace
helix trace <run-id> --diff <run-id> # compare two runs for divergence
helix replay <run-id> # interactive failure replay
helix config set KEY value # set a provider API key
Architecture
helix/
├── core/ Agent, ConversableAgent, GroupChat, Task, Pipeline,
│ Workflow, Team, Session, Tool
├── memory/ Short-term buffer, WAL-backed long-term store, episodic recall
├── cache/ Semantic cache (tier 1), plan cache (tier 2), prefix cache (tier 3)
├── models/ Router, complexity estimator, 12 provider backends
├── safety/ Cost governor, permission model, guardrails, HITL, audit log
├── context_engine/ Multi-factor token decay, context compactor, preflight estimator
├── eval/ EvalSuite, 5 scorers, trajectory eval, regression gate, monitor
├── observability/ Tracer, ghost debug resolver, failure replay
├── adapters/ LangChain, CrewAI, AutoGen + universal LLM wrapper
├── runtime/ Event loop, worker pool, health checks
└── cli/ doctor, models, cost, trace, replay, config, ...
Supported Providers
| Environment variable | Provider | Models | Free tier |
|---|---|---|---|
GOOGLE_API_KEY |
Google Gemini | Gemini 2.5 Flash/Pro, 2.0 Flash | Yes |
OPENAI_API_KEY |
OpenAI | GPT-4o, GPT-4o-mini, o1, o3 | No |
ANTHROPIC_API_KEY |
Anthropic | Claude Opus/Sonnet/Haiku | No |
GROQ_API_KEY |
Groq | Llama 3, Mixtral, Gemma | Yes |
MISTRAL_API_KEY |
Mistral AI | Mistral Large/Small, Codestral | Partial |
COHERE_API_KEY |
Cohere | Command R+ | Partial |
TOGETHER_API_KEY |
Together AI | 200+ open-source models | No |
OPENROUTER_API_KEY |
OpenRouter | 100+ models | Partial |
DEEPSEEK_API_KEY |
DeepSeek | DeepSeek V3, R1 | No |
XAI_API_KEY |
xAI | Grok | No |
PERPLEXITY_API_KEY |
Perplexity | Online search models | No |
FIREWORKS_API_KEY |
Fireworks | Fast open-source inference | No |
Set multiple keys and Helix automatically falls back to the next available provider on failure.
Contributing
Read CONTRIBUTING.md before opening a PR.
git clone https://github.com/YOUR_USERNAME/helix-agent
cd helix-agent
pip install -e ".[dev,gemini]"
pytest tests/
Contributors
| Name | Role |
|---|---|
| Dhruv Choudhary | Author and maintainer |
License
Apache License 2.0. Copyright 2026 Dhruv Choudhary.
See CHANGELOG.md for release history.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file helix_framework-0.3.2.tar.gz.
File metadata
- Download URL: helix_framework-0.3.2.tar.gz
- Upload date:
- Size: 139.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e6a06f8bc13c5f8b19f3e8b55bff79b8ae26062dfd5ba23404bcf4f8442a67b
|
|
| MD5 |
3614106cb789eaf17c773a8219c0159e
|
|
| BLAKE2b-256 |
eaa1ff211f75c4961272cc885df6aa48dbd97925d044372412fa52873142190f
|
Provenance
The following attestation bundles were made for helix_framework-0.3.2.tar.gz:
Publisher:
publish.yml on sarcasticdhruv/helix-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
helix_framework-0.3.2.tar.gz -
Subject digest:
8e6a06f8bc13c5f8b19f3e8b55bff79b8ae26062dfd5ba23404bcf4f8442a67b - Sigstore transparency entry: 987272976
- Sigstore integration time:
-
Permalink:
sarcasticdhruv/helix-agent@3dad7ceafce09fe9d7cd17d1340be784b98fb3da -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/sarcasticdhruv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3dad7ceafce09fe9d7cd17d1340be784b98fb3da -
Trigger Event:
push
-
Statement type:
File details
Details for the file helix_framework-0.3.2-py3-none-any.whl.
File metadata
- Download URL: helix_framework-0.3.2-py3-none-any.whl
- Upload date:
- Size: 161.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5a4dcc0d61a9f1015acaa65fc52d6faefe28e0d2e01b2975ab4d2fce9f90faa
|
|
| MD5 |
6464f5219b9f5fa01424432e90113d19
|
|
| BLAKE2b-256 |
ae99d6afefe1bbd35e02367018d8123c7ca14abda4db330ea6f55760794a5883
|
Provenance
The following attestation bundles were made for helix_framework-0.3.2-py3-none-any.whl:
Publisher:
publish.yml on sarcasticdhruv/helix-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
helix_framework-0.3.2-py3-none-any.whl -
Subject digest:
d5a4dcc0d61a9f1015acaa65fc52d6faefe28e0d2e01b2975ab4d2fce9f90faa - Sigstore transparency entry: 987273104
- Sigstore integration time:
-
Permalink:
sarcasticdhruv/helix-agent@3dad7ceafce09fe9d7cd17d1340be784b98fb3da -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/sarcasticdhruv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3dad7ceafce09fe9d7cd17d1340be784b98fb3da -
Trigger Event:
push
-
Statement type: