Keep humans in the loop. HITL control library for AI agent workflows with LangGraph.
Project description
hitloop
Human-in-the-Loop control library for AI agent workflows with LangGraph integration.
hitloop provides explicit control nodes for human oversight in AI agent workflows, with strong instrumentation for research experiments. Unlike passive monitoring, human approval is a first-class control signal and event in the execution trace.
Core Concept
LLM proposes action → HITL policy decides → Human approves/rejects → Tool executes → Telemetry logs all
Human approval is not a UI gimmick. It is:
- A control signal that gates execution
- A first-class event in the trace
- A research artifact for measuring oversight effectiveness
Quick Start
Installation
# Clone the repository
git clone https://github.com/ebaenamar/hitloop.git
cd hitloop
# Install with uv (recommended)
uv pip install -e .
# Or with pip
pip install -e .
# For development
pip install -e ".[dev]"
Run an Example
# Basic workflow (with CLI approval prompts)
python examples/basic_workflow.py
# Auto-approve mode (no prompts)
python examples/basic_workflow.py --auto
# Run a full experiment
python examples/run_experiment.py --n-trials 20
Architecture
┌─────────────────────────────────────────────────────────────┐
│ LangGraph Workflow │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────────┐ │
│ │ LLM │───►│ HITL │───►│ Tool Executor │ │
│ │ Node │ │ Gate │ │ │ │
│ └──────────┘ └────┬─────┘ └──────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ HITL Policy │ │
│ │ ┌──────────┐ │ │
│ │ │ Approval │ │◄──► Human (CLI/Web/etc) │
│ │ │ Backend │ │ │
│ │ └──────────┘ │ │
│ └────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ Telemetry │───► SQLite / Analysis │
│ │ Logger │ │
│ └────────────────┘ │
└─────────────────────────────────────────────────────────────┘
API Overview
Core Models
from hitloop import Action, Decision, RiskClass
# Define an action
action = Action(
tool_name="send_email",
tool_args={"recipient": "alice@example.com", "subject": "Hello"},
risk_class=RiskClass.MEDIUM,
side_effects=["email_sent"],
rationale="Sending follow-up email to client",
)
# Decisions from human review
decision = Decision(
action_id=action.id,
approved=True,
reason="Verified recipient is correct",
decided_by="human:operator",
latency_ms=1500.0,
)
Policies
Three built-in policies for different oversight tiers:
from hitloop import AlwaysApprovePolicy, RiskBasedPolicy, AuditPlusEscalatePolicy
# Tier 4: No human oversight (baseline)
policy = AlwaysApprovePolicy()
# Risk-based: Approve high-risk actions only
policy = RiskBasedPolicy(
require_approval_for_high=True,
require_approval_for_medium=False,
high_risk_tools=["send_email", "delete_record"],
)
# Audit + Escalate: Random sampling + anomaly detection
policy = AuditPlusEscalatePolicy(
audit_sample_rate=0.1, # 10% random audit
escalate_on_high_risk=True,
anomaly_signals=["unusual_recipient", "large_amount"],
)
Adding a New Policy
Create a single file in src/hitloop/policies/:
# src/hitloop/policies/my_policy.py
from hitloop.core.interfaces import HITLPolicy
from hitloop.core.models import Action, Decision
class MyCustomPolicy(HITLPolicy):
@property
def name(self) -> str:
return "my_custom"
def should_request_approval(
self, action: Action, state: dict
) -> tuple[bool, str]:
# Your logic here
if action.tool_name in self.critical_tools:
return True, "Critical tool requires approval"
return False, "Auto-approved"
LangGraph Integration
from langgraph.graph import StateGraph
from hitloop import hitl_gate_node, execute_tool_node, RiskBasedPolicy, CLIBackend
# Create nodes
policy = RiskBasedPolicy()
backend = CLIBackend()
logger = TelemetryLogger("traces.db")
gate = hitl_gate_node(policy, backend, logger)
executor = execute_tool_node(tools, logger)
# Build graph
graph = StateGraph(HITLState)
graph.add_node("llm", llm_node)
graph.add_node("hitl_gate", gate)
graph.add_node("execute", executor)
graph.add_edge("llm", "hitl_gate")
graph.add_conditional_edges(
"hitl_gate",
should_execute_condition,
{"execute": "execute", "skip": "end"}
)
graph.add_edge("execute", "end")
Running Experiments
from hitloop import TelemetryLogger
from hitloop.eval import ExperimentRunner, ExperimentCondition
from hitloop.eval.runner import create_standard_conditions
from hitloop.scenarios import EmailDraftScenario
# Setup
logger = TelemetryLogger("experiment.db")
scenario = EmailDraftScenario()
# Create standard conditions (4 policies × scenarios)
conditions = create_standard_conditions(
scenario=scenario,
n_trials=20,
injection_rate=0.2, # 20% error injection
)
# Run
runner = ExperimentRunner(logger)
for c in conditions:
runner.add_condition(c)
await runner.run_all()
# Export
runner.export_results("results.csv", "summary.json")
Output: results.csv
| run_id | scenario_id | condition_id | policy_name | task_success | approval_requested | injected_error | error_caught |
|---|---|---|---|---|---|---|---|
| abc123 | email_draft | risk_based | risk_based | 1 | 1 | 0 | 0 |
| def456 | email_draft | risk_based | risk_based | 0 | 1 | 1 | 1 |
Output: summary.json
{
"risk_based": {
"n_runs": 20,
"success_rate": 0.85,
"approval_rate": 0.65,
"error_catch_rate": 0.75,
"false_reject_rate": 0.05,
"human_latency_mean_ms": 1200.5
}
}
Research Alignment
hitloop metrics map directly to the research framework:
| Metric | Research Concept | Description |
|---|---|---|
success_rate |
Quality | Task completion rate |
approval_rate |
Leverage proxy | Human involvement frequency |
error_catch_rate |
Appropriate reliance | Injected error detection |
false_reject_rate |
Appropriate reliance | Unnecessary rejections |
human_latency_ms |
Human burden | Time cost per decision |
cost_proxy |
Efficiency | Token/call overhead |
Injected Errors for Ground Truth
The error injector provides ground truth for measuring oversight effectiveness:
from hitloop.eval import ErrorInjector, InjectionConfig
injector = ErrorInjector(InjectionConfig(
injection_rate=0.2,
injection_types=[
InjectionType.WRONG_RECIPIENT,
InjectionType.WRONG_RECORD_ID,
]
))
# Every action has known correctness
result = injector.maybe_inject(action)
if result.injected:
# This action is KNOWN to be wrong
# If human approves it → false negative
# If human rejects it → true positive (error caught)
Project Structure
hitloop/
├── src/hitloop/
│ ├── core/
│ │ ├── models.py # Action, Decision, TraceEvent
│ │ ├── interfaces.py # ApprovalBackend, HITLPolicy
│ │ └── logger.py # TelemetryLogger (SQLite)
│ ├── policies/
│ │ ├── always_approve.py
│ │ ├── risk_based.py
│ │ └── audit_plus_escalate.py
│ ├── backends/
│ │ ├── cli_backend.py
│ │ └── humanlayer_backend.py # Optional
│ ├── langgraph/
│ │ └── nodes.py # hitl_gate_node, execute_tool_node
│ ├── scenarios/
│ │ ├── email_draft.py
│ │ └── record_update.py
│ └── eval/
│ ├── runner.py # ExperimentRunner
│ ├── injectors.py # ErrorInjector
│ └── metrics.py # MetricsCalculator
├── tests/
├── examples/
├── pyproject.toml
└── README.md
Instrumentation
Every run emits structured events:
# Per run
{
"run_id": "abc123",
"scenario_id": "email_draft",
"condition_id": "risk_based",
"seed": 42
}
# Per action
{
"action_id": "xyz789",
"tool_name": "send_email",
"args_hash": "a1b2c3d4",
"risk_class": "medium",
"injected_error": false
}
# Per approval
{
"channel": "cli",
"latency_ms": 1250.0,
"decision": true,
"decided_by": "human"
}
# Per execution
{
"success": true,
"execution_time_ms": 45.2
}
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Type checking
mypy src/hitloop
# Linting
ruff check src/hitloop
ruff format src/hitloop
License
MIT License - see LICENSE file.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hitloop-0.1.0.tar.gz.
File metadata
- Download URL: hitloop-0.1.0.tar.gz
- Upload date:
- Size: 44.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3e0c1511692bf0e54370ef27b99d0768d3ad3adc08f397a6e0878e3e0ebcfb4
|
|
| MD5 |
2cd1771343e49ae876cd90104b05ed23
|
|
| BLAKE2b-256 |
31e86b249701ebf0c16939a89cd04ca6fe921b41a7a02035ac7da8c1141838ec
|
File details
Details for the file hitloop-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hitloop-0.1.0-py3-none-any.whl
- Upload date:
- Size: 44.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ba197b16c66e134b65e0d58fc8a3a21904cb1f33381d1ec320dd75c875a4412
|
|
| MD5 |
30fd7fb2577c5f7a9975c9e710d065e0
|
|
| BLAKE2b-256 |
4bb8803a94801ed51c18baec7196dc4502447ecdb79fcce9cf8e384d91419eea
|