Native reinforcement learning SDK for AI agents using Azure AI Evaluation judge metrics.

These details have not been verified by PyPI

Project links

Project description

azure-agents-learning-sdk

Native reinforcement learning SDK for AI agents. Replaces the agent-lightning LLM fine-tuning loop with an in-process learner that optimizes a small, interpretable policy over discrete agent configuration choices (prompt variants, retrieval-k, tool selection strategies, …) using Azure AI Evaluation judge metrics as the reward signal.

Why native?

agent-lightning shipped agent improvement as LLM weight fine-tuning, which requires Azure OpenAI fine-tune jobs, GPU infrastructure, and an opaque update cycle. The native SDK takes a different approach:

The policy is a softmax distribution over N discrete actions (e.g., "use prompt template A", "use template B"). It lives in Python and updates in milliseconds.
Each episode is judged by three Azure AI Evaluation evaluators — IntentResolutionEvaluator, TaskAdherenceEvaluator, and TaskCompletionEvaluator — whose scores are combined into a single scalar reward.
A REINFORCE-with-baseline learner updates the policy logits directly from logged episodes. Updates are tiny gradient steps that run on CPU and persist immediately to Cosmos DB.

The result is the same lineage and audit trail that agent-lightning provided (episodes, rewards, runs, deployments) without the cost or operational burden of LLM fine-tuning.

Architecture

┌──────────────────────────────────────────────────────────┐
│  Orchestrator turn                                       │
│  ┌─────────────────────────────────────────────────────┐ │
│  │ policy.choose() → Action                            │ │
│  │ EpisodeCapture.start(action_id=…, logprob=…)        │ │
│  │ … run agent, record tool calls …                    │ │
│  │ EpisodeCapture.end(assistant_output=…)              │ │
│  └─────────────────────────────────────────────────────┘ │
│                       │                                  │
│                       ▼                                  │
│  ┌─────────────────────────────────────────────────────┐ │
│  │ Cosmos DB: episodes, metrics, rewards, policies     │ │
│  └─────────────────────────────────────────────────────┘ │
│                       │                                  │
│                       ▼                                  │
│  ┌─────────────────────────────────────────────────────┐ │
│  │ LearningRunner.run_offline_batch(agent_id)          │ │
│  │   ┌─ evaluate (3 judges)                            │ │
│  │   ├─ shape (weighted sum + penalties → reward)       │ │
│  │   ├─ persist per-metric + aggregate rewards          │ │
│  │   └─ ReinforceLearner.update(policy, episodes)       │ │
│  └─────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘

Install

pip install -e .

Configure

The SDK reads its configuration from environment variables. The most important ones are:

Variable	Purpose	Default
`AGENT_LEARNING_COSMOS_ENDPOINT`	Cosmos DB account URL (enables persistence)	unset
`AGENT_LEARNING_COSMOS_DATABASE`	Database name	`dq-rl`
`AGENT_LEARNING_JUDGE_ENDPOINT`	Azure OpenAI endpoint used by the judge	unset
`AGENT_LEARNING_JUDGE_DEPLOYMENT`	Judge deployment name	unset
`AGENT_LEARNING_W_INTENT`	Weight for intent-resolution reward	`0.4`
`AGENT_LEARNING_W_ADHERENCE`	Weight for task-adherence reward	`0.3`
`AGENT_LEARNING_W_COMPLETION`	Weight for task-completion reward	`0.3`
`AGENT_LEARNING_LR`	REINFORCE learning rate	`0.05`
`AGENT_LEARNING_BASELINE_DECAY`	EMA decay on the value baseline	`0.9`

When the Cosmos endpoint or judge configuration is missing, the SDK falls back to an in-memory store and skips evaluations so unit tests still pass.

Use it

from agent_learning import (
    Action, EpisodeCapture, LearningRunner, SoftmaxPolicy,
)

actions = [
    Action(id="concise"),
    Action(id="detailed"),
]
policy = SoftmaxPolicy.from_actions(actions, agent_id="dq")

# At inference time
decision = policy.choose()
capture = EpisodeCapture()
ctx = capture.start(
    user_input="Summarise Q3 sales",
    policy_id=policy.snapshot().id,
    policy_version=policy.snapshot().version,
    action_id=decision.action.id,
    action_logprob=decision.logprob,
)
# … run your agent, then call capture.end(ctx, assistant_output="…")

# Periodically (cron, manual, event-driven)
runner = LearningRunner(policy=policy)
run = runner.run_offline_batch("dq", episode_limit=500)

The included CLI exposes the same flow:

agent-learn init-policy --agent-id dq --actions ./actions.json
agent-learn train --agent-id dq --limit 500
agent-learn policy --agent-id dq

Layout

src/agent_learning/
├── types.py            # Durable record types
├── config.py           # Env-driven configuration
├── capture.py          # Episode capture hook
├── storage/            # LearningStore (Cosmos + in-memory)
├── metrics/            # IntentResolution/TaskAdherence/TaskCompletion judges
├── rewards/            # Shaping + writer
├── policy/             # SoftmaxPolicy
├── learners/           # REINFORCE
├── training/           # End-to-end runner
└── cli.py              # `agent-learn` command-line

Testing

pytest -q

The test suite covers types, the in-memory store, the policy, reward shaping, the REINFORCE learner, and an end-to-end training loop with a stubbed metric evaluator.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

May 17, 2026

0.1.1

May 16, 2026

This version

0.1.0

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure_agents_learning_sdk-0.1.0.tar.gz (37.0 kB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

azure_agents_learning_sdk-0.1.0-py3-none-any.whl (41.5 kB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file azure_agents_learning_sdk-0.1.0.tar.gz.

File metadata

Download URL: azure_agents_learning_sdk-0.1.0.tar.gz
Upload date: May 15, 2026
Size: 37.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for azure_agents_learning_sdk-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`13c2f2ac1310345660658928e99304939968cbf1ef0c5ab3dbc135e90e332cb9`
MD5	`ec5b536ef2e05dd3ebb23faf8c298083`
BLAKE2b-256	`6c32099828bc5727275b27dbb52acb80da7909dd4668431f1eebf5acf936dfb8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for azure_agents_learning_sdk-0.1.0.tar.gz:

Publisher: publish.yaml on microsoft/azure-agents-learning-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: azure_agents_learning_sdk-0.1.0.tar.gz
- Subject digest: 13c2f2ac1310345660658928e99304939968cbf1ef0c5ab3dbc135e90e332cb9
- Sigstore transparency entry: 1549962726
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: microsoft/azure-agents-learning-sdk@564860982d453425640e1d3c63ee29799cebb644
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/microsoft
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@564860982d453425640e1d3c63ee29799cebb644
- Trigger Event: push

File details

Details for the file azure_agents_learning_sdk-0.1.0-py3-none-any.whl.

File metadata

Download URL: azure_agents_learning_sdk-0.1.0-py3-none-any.whl
Upload date: May 15, 2026
Size: 41.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for azure_agents_learning_sdk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`511344754a3ae4dc228121cccf81b8ccc0718f62339464ec8bcce79c00be693d`
MD5	`35801507a3b86a521a50a31743b4c49e`
BLAKE2b-256	`f16772f92a68acbb4005d158868e722963e3ee3e066c298b029315fda0eea78d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for azure_agents_learning_sdk-0.1.0-py3-none-any.whl:

Publisher: publish.yaml on microsoft/azure-agents-learning-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: azure_agents_learning_sdk-0.1.0-py3-none-any.whl
- Subject digest: 511344754a3ae4dc228121cccf81b8ccc0718f62339464ec8bcce79c00be693d
- Sigstore transparency entry: 1549962780
- Sigstore integration time: May 15, 2026
Source repository:
- Permalink: microsoft/azure-agents-learning-sdk@564860982d453425640e1d3c63ee29799cebb644
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/microsoft
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@564860982d453425640e1d3c63ee29799cebb644
- Trigger Event: push

azure-agents-learning-sdk 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

azure-agents-learning-sdk

Why native?

Architecture

Install

Configure

Use it

Layout

Testing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance