Framework for reproducible autonomous research loops
Project description
Inspired by karpathy/autoresearch, helix generalizes the idea of autonomous AI research loops beyond LLM training. Give an agent a codebase, a metric, and a fixed time budget. It experiments overnight. You wake up to results.
The git history is the research trail. experiments.tsv is the proof. Anyone can clone a helix,
run it on their hardware, and independently verify every result.
Concepts
| Term | Meaning |
|---|---|
| helix | A git repo containing helix.yaml + program.md + a codebase the agent can modify |
helix.yaml |
Machine-readable spec: what to optimize, how to measure it, which files are editable |
program.md |
Human-written instructions for the agent: domain knowledge, constraints, techniques to try |
experiments.tsv |
Append-only ledger of every experiment: commit, metric, status, description |
helix run |
CLI command that launches an autonomous session on your hardware |
Quick start
helix is agent-agnostic. Pick a backend or bring your own.
| Backend | Install | Requires |
|---|---|---|
ClaudeBackend (default) |
pip install 'helices[claude]' |
Claude Code CLI |
GeminiBackend |
pip install helices |
Gemini CLI: npm install -g @google/gemini-cli |
| Custom | pip install helices |
Implement the AgentBackend protocol |
Start from a template
helix init my-project --template generic --domain "AI/ML" --description "Optimize X for task Y."
cd my-project
git init
helix run
Run an existing helix
# from within a helix directory (one that has helix.yaml)
helix run # start a session tagged with today's date
helix run --tag exp1 # custom tag
helix status # show current best and recent experiments
Templates
| Template | Description |
|---|---|
generic |
Blank slate: solver.py + evaluate.py. Print score: <value> at the end. |
ai-inference |
LLM inference throughput on WikiText-2. Metrics: tokens_per_sec + bpb. |
Examples
helix-examples is a curated gallery of standalone helices, each in its own repo and included as a git submodule.
git clone --recurse-submodules git@github.com:VectorInstitute/helix-examples.git
cd helix-examples/inference-opt
uv run prepare.py # one-time: download model + dataset
helix run
The first example, helix-inference-opt,
optimizes inference throughput for a causal language model on WikiText-2. The agent modifies
infer.py (batching, quantization, torch.compile, etc.) and automatically merges improvements
back to main.
Writing your own helix
- Create a new git repo.
- Add
helix.yamldescribing your metric, evaluation command, and editable scope. - Add
program.mdwith domain-specific instructions for the agent. - Add your codebase.
- Run
helix run.
Minimal helix.yaml:
name: my-helix
domain: AI/ML
description: Optimize X for task Y.
scope:
editable: [solver.py]
readonly: [evaluate.py, program.md, helix.yaml]
metrics:
primary:
name: accuracy
optimize: maximize
evaluate:
command: python evaluate.py
timeout_seconds: 120
output_format: pattern
patterns:
primary: '^accuracy:\s+([\d.]+)'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file helices-0.1.0.tar.gz.
File metadata
- Download URL: helices-0.1.0.tar.gz
- Upload date:
- Size: 113.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
317bd9e060f6850df92b9c9d551a1b84f5f6b288076b5762a34443641711e6c9
|
|
| MD5 |
0d2787046ec9bf08246eda3d9fc5c486
|
|
| BLAKE2b-256 |
e2199629d7a63ff421c36d982cab751d3abe1e928c5728effcf9e4cc2a96b67b
|
File details
Details for the file helices-0.1.0-py3-none-any.whl.
File metadata
- Download URL: helices-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
540c25ce9dfe289bb8d8bee2b1cf011fa646579444a49591555138b25c4d886b
|
|
| MD5 |
acbb23e03144764926c5d4e6a46b359e
|
|
| BLAKE2b-256 |
2ffd9ba05705b006f49fd985ac81b1716955a60dab8edcd5ffda09a30d43886a
|