Native reinforcement learning SDK for AI agents using Azure AI Evaluation judge metrics.
Project description
azure-agents-learning-sdk
Native reinforcement learning SDK for AI agents. Replaces the
agent-lightning LLM fine-tuning loop with an in-process learner
that optimizes a small, interpretable policy over discrete agent
configuration choices (prompt variants, retrieval-k, tool selection
strategies, …) using Azure AI Evaluation judge metrics as the reward
signal.
Why native?
agent-lightning shipped agent improvement as LLM weight
fine-tuning, which requires Azure OpenAI fine-tune jobs, GPU
infrastructure, and an opaque update cycle. The native SDK takes a
different approach:
- The policy is a softmax distribution over
Ndiscrete actions (e.g., "use prompt template A", "use template B"). It lives in Python and updates in milliseconds. - Each episode is judged by three Azure AI Evaluation
evaluators —
IntentResolutionEvaluator,TaskAdherenceEvaluator, andTaskCompletionEvaluator— whose scores are combined into a single scalar reward. - A REINFORCE-with-baseline learner updates the policy logits directly from logged episodes. Updates are tiny gradient steps that run on CPU and persist immediately to Cosmos DB.
The result is the same lineage and audit trail that agent-lightning
provided (episodes, rewards, runs, deployments) without the cost or
operational burden of LLM fine-tuning.
Architecture
┌──────────────────────────────────────────────────────────┐
│ Orchestrator turn │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ policy.choose() → Action │ │
│ │ EpisodeCapture.start(action_id=…, logprob=…) │ │
│ │ … run agent, record tool calls … │ │
│ │ EpisodeCapture.end(assistant_output=…) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Cosmos DB: episodes, metrics, rewards, policies │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ LearningRunner.run_offline_batch(agent_id) │ │
│ │ ┌─ evaluate (3 judges) │ │
│ │ ├─ shape (weighted sum + penalties → reward) │ │
│ │ ├─ persist per-metric + aggregate rewards │ │
│ │ └─ ReinforceLearner.update(policy, episodes) │ │
│ └─────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
Install
pip install -e .
Configure
The SDK reads its configuration from environment variables. The most important ones are:
| Variable | Purpose | Default |
|---|---|---|
AGENT_LEARNING_COSMOS_ENDPOINT |
Cosmos DB account URL (enables persistence) | unset |
AGENT_LEARNING_COSMOS_DATABASE |
Database name | dq-rl |
AGENT_LEARNING_JUDGE_ENDPOINT |
Azure OpenAI endpoint used by the judge | unset |
AGENT_LEARNING_JUDGE_DEPLOYMENT |
Judge deployment name | unset |
AGENT_LEARNING_W_INTENT |
Weight for intent-resolution reward | 0.4 |
AGENT_LEARNING_W_ADHERENCE |
Weight for task-adherence reward | 0.3 |
AGENT_LEARNING_W_COMPLETION |
Weight for task-completion reward | 0.3 |
AGENT_LEARNING_LR |
REINFORCE learning rate | 0.05 |
AGENT_LEARNING_BASELINE_DECAY |
EMA decay on the value baseline | 0.9 |
When the Cosmos endpoint or judge configuration is missing, the SDK falls back to an in-memory store and skips evaluations so unit tests still pass.
Use it
from agent_learning import (
Action, EpisodeCapture, LearningRunner, SoftmaxPolicy,
)
actions = [
Action(id="concise"),
Action(id="detailed"),
]
policy = SoftmaxPolicy.from_actions(actions, agent_id="dq")
# At inference time
decision = policy.choose()
capture = EpisodeCapture()
ctx = capture.start(
user_input="Summarise Q3 sales",
policy_id=policy.snapshot().id,
policy_version=policy.snapshot().version,
action_id=decision.action.id,
action_logprob=decision.logprob,
)
# … run your agent, then call capture.end(ctx, assistant_output="…")
# Periodically (cron, manual, event-driven)
runner = LearningRunner(policy=policy)
run = runner.run_offline_batch("dq", episode_limit=500)
The included CLI exposes the same flow:
agent-learn init-policy --agent-id dq --actions ./actions.json
agent-learn train --agent-id dq --limit 500
agent-learn policy --agent-id dq
Layout
src/agent_learning/
├── types.py # Durable record types
├── config.py # Env-driven configuration
├── capture.py # Episode capture hook
├── storage/ # LearningStore (Cosmos + in-memory)
├── metrics/ # IntentResolution/TaskAdherence/TaskCompletion judges
├── rewards/ # Shaping + writer
├── policy/ # SoftmaxPolicy
├── learners/ # REINFORCE
├── training/ # End-to-end runner
└── cli.py # `agent-learn` command-line
Testing
pytest -q
The test suite covers types, the in-memory store, the policy, reward shaping, the REINFORCE learner, and an end-to-end training loop with a stubbed metric evaluator.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file azure_agents_learning_sdk-0.1.1.tar.gz.
File metadata
- Download URL: azure_agents_learning_sdk-0.1.1.tar.gz
- Upload date:
- Size: 36.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2eab510b0dcfafa7d8ae7438da346435816ead9b013928ffd0822c84cb79fe5c
|
|
| MD5 |
062080d68b85074dddac1ba1510256c1
|
|
| BLAKE2b-256 |
2a30260795e4ae6fe0549e35264320853fa7433e2875ea0315979f361d86d794
|
Provenance
The following attestation bundles were made for azure_agents_learning_sdk-0.1.1.tar.gz:
Publisher:
publish.yaml on microsoft/azure-agents-learning-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
azure_agents_learning_sdk-0.1.1.tar.gz -
Subject digest:
2eab510b0dcfafa7d8ae7438da346435816ead9b013928ffd0822c84cb79fe5c - Sigstore transparency entry: 1551523366
- Sigstore integration time:
-
Permalink:
microsoft/azure-agents-learning-sdk@1e0c0a4fad472b03af46601b4a0ec5d56e11cf7c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/microsoft
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@1e0c0a4fad472b03af46601b4a0ec5d56e11cf7c -
Trigger Event:
push
-
Statement type:
File details
Details for the file azure_agents_learning_sdk-0.1.1-py3-none-any.whl.
File metadata
- Download URL: azure_agents_learning_sdk-0.1.1-py3-none-any.whl
- Upload date:
- Size: 41.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e36d845772461ce4477f8966e83c655e9eb4f6c647de4e006bda93bfe69be2ad
|
|
| MD5 |
072bc063f4b830a2b950c2d363441b99
|
|
| BLAKE2b-256 |
bda944a36e49280b7aa882349d15a8c407e4c257b6a01c75aae1030ff61d2c31
|
Provenance
The following attestation bundles were made for azure_agents_learning_sdk-0.1.1-py3-none-any.whl:
Publisher:
publish.yaml on microsoft/azure-agents-learning-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
azure_agents_learning_sdk-0.1.1-py3-none-any.whl -
Subject digest:
e36d845772461ce4477f8966e83c655e9eb4f6c647de4e006bda93bfe69be2ad - Sigstore transparency entry: 1551523521
- Sigstore integration time:
-
Permalink:
microsoft/azure-agents-learning-sdk@1e0c0a4fad472b03af46601b4a0ec5d56e11cf7c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/microsoft
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@1e0c0a4fad472b03af46601b4a0ec5d56e11cf7c -
Trigger Event:
push
-
Statement type: