In-Context Reinforcement Learning framework for LLMs — no fine-tuning required.
Project description
FastICRL
In-Context Reinforcement Learning for LLMs — no fine-tuning, no gradient updates, no GPU.
FastICRL implements the ICRL paradigm from Reward Is Enough: LLMs Are In-Context Reinforcement Learners (Song et al., 2025). A learner LLM improves its outputs purely by reading its own history of attempts and rewards inside the context window — guided by a meta-cognitive strategist. No training, no infrastructure, just inference.
How it works
Three LLM agents collaborate in a feedback loop:
┌──────────────────────────────────────────────────┐
│ ICRLLearner │
│ │
│ Task ──► Learner ──► Output ──► Reward Agent │
│ ▲ │ │
│ │ Attempt │ │
│ │ (task, output, score) │ │
│ └─────────────────────────┘ │
│ │ │
│ (every N episodes) │
│ ▼ │
│ Strategist │
│ (refines the strategy) │
└──────────────────────────────────────────────────┘
| Agent | Role |
|---|---|
| Learner | Generates task outputs; balances exploration vs. exploitation based on reward history |
| Reward | Scores each output on a 0–10 scale (acts as the reward function) |
| Strategist | Analyzes past attempts to synthesize actionable strategies for future episodes |
Each agent can be backed by a different model — e.g. a cheap model for reward, a powerful one for the learner.
Installation
pip install fasticrl
Or with uv:
uv add fasticrl
Model provider extras (install whichever you use):
pip install "fasticrl[openai]" # OpenAI
pip install "fasticrl[ollama]" # Ollama (local models)
Requires Python ≥ 3.13.
Quick start
from fasticrl import ICRLLearner
from agno.models.openai import OpenAIChat
model = OpenAIChat(id="gpt-4o-mini")
learner = ICRLLearner(
learner_model=model,
reward_model=model,
strategy_model=model,
task_description="Write a concise, compelling product description for an e-commerce listing.",
tasks=[
"Wireless noise-cancelling headphones",
"Ergonomic standing desk",
"Portable espresso maker",
],
)
# Run 3 episodes, update strategy every 2 steps, show progress bar
learner.auto_learn(episodes=3, batch_size=2, cli_mode=True, strategy_update_interval=2)
# Inspect what the agent learned
print(learner.strategy)
API
ICRLLearner
ICRLLearner(
learner_model, # agno Model for the learner agent
reward_model, # agno Model for the reward agent
strategy_model, # agno Model for the strategist agent
task_description, # describes the overall task domain (required)
tasks, # list of concrete task instances to cycle through
buffer, # optional: pre-loaded list of Attempt objects
strategy, # optional: pre-loaded strategy string
)
Key methods
| Method | Description |
|---|---|
auto_learn(episodes, batch_size, cli_mode, strategy_update_interval) |
Run N episodes. batch_size > 1 parallelizes tasks with a thread pool. cli_mode=True shows a progress bar. strategy_update_interval=K refreshes the strategy every K episodes. |
generate_action(task) |
Run the learner on a single task and return its output |
generate_reward(task, action) |
Score a learner output with the reward agent |
generate_attempt_by_present_task() |
Single step: generate + score the current task |
update_strategy() |
Ask the strategist to refine the strategy from the current buffer |
to_yaml(path) |
Persist the full agent state (buffer + strategy) to a YAML file |
ICRLLearner.from_yaml(path, ...) |
Resume from a saved state |
Saving and resuming
# Save
learner.to_yaml("my_agent.yaml")
# Resume later
learner = ICRLLearner.from_yaml(
"my_agent.yaml",
learner_model=model,
reward_model=model,
strategy_model=model,
)
learner.auto_learn(episodes=5)
Using Ollama (local models)
from agno.models.ollama import Ollama
learner = ICRLLearner(
learner_model=Ollama(id="llama3.2"),
reward_model=Ollama(id="llama3.2"),
strategy_model=Ollama(id="llama3.2"),
task_description="...",
tasks=[...],
)
Any agno-compatible model works.
Citation
This project is based on and inspired by the following papers:
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
Kefan Song, Amir Moeini, Peng Wang, Lei Gong, Rohan Chandra, Shangtong Zhang, Yanjun Qi
arXiv:2506.06303 — https://arxiv.org/abs/2506.06303
Large Language Models as Optimizers
Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen
arXiv:2309.03409 — https://arxiv.org/abs/2309.03409
Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs
Yifan Zhou, Sachin Grover, Mohamed El Mistiri, Kamalesh Kalirathinam, Pratyush Kerhalkar, Swaroop Mishra, Neelesh Kumar, Sanket Gaurav, Oya Aran, Heni Ben Amor
NeurIPS 2025 — https://openreview.net/forum?id=95plu1Mo20
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fasticrl-1.0.0.tar.gz.
File metadata
- Download URL: fasticrl-1.0.0.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ae4bda7966e9752d7e4ba54d7bcfe08d883a0bbbf2255c78de2fcb5529e7b95
|
|
| MD5 |
4d2fcb4880a8569c6dbb6e4ebd11ed38
|
|
| BLAKE2b-256 |
e38235f563d048101cad2805583835648e1e45e18367e96199c511f34de837e9
|
Provenance
The following attestation bundles were made for fasticrl-1.0.0.tar.gz:
Publisher:
python-publish.yml on makoeta/FastICRL
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fasticrl-1.0.0.tar.gz -
Subject digest:
5ae4bda7966e9752d7e4ba54d7bcfe08d883a0bbbf2255c78de2fcb5529e7b95 - Sigstore transparency entry: 2000818940
- Sigstore integration time:
-
Permalink:
makoeta/FastICRL@3020f0922afe30323c2cb7a90ee952e8b7a79bf1 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/makoeta
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3020f0922afe30323c2cb7a90ee952e8b7a79bf1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file fasticrl-1.0.0-py3-none-any.whl.
File metadata
- Download URL: fasticrl-1.0.0-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb6afcf207f6dd39bcd9369cf75f4df3519a53d6d2a07819ba4d1a65a1844074
|
|
| MD5 |
2c9eaea237ba5879e64210a94245babd
|
|
| BLAKE2b-256 |
34925b6a5394bdc5b84720ae97511d78688b3ae314a5a183d93be4e9e82b05af
|
Provenance
The following attestation bundles were made for fasticrl-1.0.0-py3-none-any.whl:
Publisher:
python-publish.yml on makoeta/FastICRL
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fasticrl-1.0.0-py3-none-any.whl -
Subject digest:
cb6afcf207f6dd39bcd9369cf75f4df3519a53d6d2a07819ba4d1a65a1844074 - Sigstore transparency entry: 2000819052
- Sigstore integration time:
-
Permalink:
makoeta/FastICRL@3020f0922afe30323c2cb7a90ee952e8b7a79bf1 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/makoeta
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3020f0922afe30323c2cb7a90ee952e8b7a79bf1 -
Trigger Event:
release
-
Statement type: