Multi-language REPL engines for DSPy Recursive Language Models (RLM) with trajectory tracing and benchmark tooling
Project description
DSPy-REPL
Modular non-Python REPL engines for DSPy Recursive Language Models.
dspy-repl is a modular package for non-Python REPL-based RLM engines compatible with DSPy, inspired by the Recursive Language Models paper.
Scope
- Keeps Python
dspy.RLMinside DSPy as the canonical Python implementation. - Provides modular engines for:
SchemeRLMSQLRLMHaskellRLMJavaScriptRLM
- Exposes extension points for adding new REPL languages.
Install
pip install dspy-repl
For local development:
pip install -e ".[dev]"
Quick usage
import dspy
from dspy_repl import SchemeRLM, SQLRLM, HaskellRLM, JavaScriptRLM
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
scheme_rlm = SchemeRLM("context, query -> answer")
result = scheme_rlm(context="...", query="...")
print(result.answer)
print(result.trajectory) # step-by-step REPL history
js_rlm = JavaScriptRLM("context, query -> answer")
js_result = js_rlm(context="...", query="...")
print(js_result.answer)
SQLRLM
SQLRLM uses Python's built-in sqlite3 as its REPL environment -- no external runtime needed. The LLM writes SQL to explore data, call tools, and produce results.
Basic usage
import dspy
from dspy_repl import SQLRLM
dspy.configure(lm=dspy.LM("openai/gpt-4o"))
rlm = SQLRLM("context, query -> answer")
result = rlm(context="...", query="...")
print(result.answer)
Pre-loaded schemas
When working with relational data, you can pre-load a SQL schema so the LLM sees the table structure from iteration 1 instead of spending iterations writing CREATE TABLE statements:
rlm = SQLRLM(
"query -> answer",
preload_sql="schema.sql", # path to .sql file or raw SQL string
db_path="data/my_project.db", # persist to file (default: ":memory:")
skip_variable_tables={"query"}, # don't create a table for this input
)
result = rlm(query="Find all active projects led by engineers")
preload_sql accepts either a file path (e.g. "schema.sql") or a raw SQL string. The DDL is executed once at startup. All tables -- including pre-loaded ones -- are visible in the LLM's prompt with column types, row counts, foreign key relationships, and CHECK constraints.
db_path controls where the SQLite database lives. Use a file path for persistence across runs. When reopening an existing database, preload_sql detects that tables already exist and skips the DDL.
skip_variable_tables prevents specified input variables from being materialized as SQL tables. Useful for string or structured inputs that serve as context rather than queryable data. These appear as plain text in the prompt instead.
The LLM prompt shows the full schema from iteration 1:
Input variables:
- query: "Find all active projects led by engineers"
Database tables:
- departments (id TEXT, name TEXT, budget REAL) -- 5 rows
CHECKs: budget >= 0
- employees (id TEXT, name TEXT, department_id TEXT, role TEXT) -- 12 rows
FKs: department_id -> departments.id
CHECKs: role IN ('engineer','manager','designer','analyst')
- projects (id TEXT, name TEXT, lead_id TEXT, status TEXT) -- 8 rows
FKs: lead_id -> employees.id
CHECKs: status IN ('active','completed','cancelled')
Using the SQLInterpreter directly
The underlying SQLInterpreter supports the same features and can be used standalone:
from dspy_repl.interpreters.sql_interpreter import SQLInterpreter
schema = """
CREATE TABLE authors (id TEXT PRIMARY KEY, name TEXT NOT NULL);
CREATE TABLE books (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
author_id TEXT NOT NULL REFERENCES authors(id),
genre TEXT CHECK(genre IN ('fiction','nonfiction','poetry'))
);
"""
with SQLInterpreter(preload_sql=schema) as interp:
# describe_tables() returns columns, row counts, FKs, and CHECKs
for t in interp.describe_tables():
print(t["name"], t.get("foreign_keys", []), t.get("checks", []))
interp.execute("INSERT INTO authors VALUES ('a1', 'Tolkien');")
interp.execute("INSERT INTO books VALUES ('b1', 'The Hobbit', 'a1', 'fiction');")
print(interp.execute("SELECT * FROM books;"))
Tools as SQL functions
Custom Python functions can be registered as SQLite UDFs, callable directly from SQL:
def classify(text: str) -> str:
return "positive" if "good" in text.lower() else "negative"
rlm = SQLRLM(
"reviews -> summary",
preload_sql="CREATE TABLE results (id INTEGER PRIMARY KEY, sentiment TEXT);",
tools=[classify],
)
# The LLM can write: INSERT INTO results SELECT id, classify(text) FROM reviews;
Observability and debugging
dspy-repl is designed to expose what happened inside an RLM run:
result.trajectorycontains the full iterative REPL trace.- Each trajectory step includes:
reasoning: model reasoning for that stepcode: code sent to the language REPLoutput: interpreter output/error text
SQLRLMadditionally exposeslast_sql_profiletiming breakdowns after each run.
Enable verbose engine logs:
scheme_rlm = SchemeRLM("context, query -> answer", verbose=True)
With verbose=True, each iteration is logged with reasoning/code/output previews, which is useful for prompt/tool/debug loops.
What happens inside an RLM
At a high level, each RLM run follows this loop:
- Build REPL variable metadata from inputs.
- Generate next action (reasoning + code) from the LM.
- Execute code in the target REPL (Scheme/Haskell/SQL/JavaScript).
- Append
{reasoning, code, output}to trajectory. - Repeat until final output is submitted or max iterations is reached.
- If max iterations is reached, run fallback extraction from accumulated trajectory.
This loop is shared in dspy_repl.core.base_rlm and specialized by language-specific wrappers.
Architecture
dspy_repl.core: shared execution loop and shared tool plumbingdspy_repl.languages: language-specific prompt templates and wrappersdspy_repl.interpreters: interpreter adapter exportsdspy_repl.compat: thin compatibility shims for DSPy touchpoints
DSPy compatibility
dspy>=3.0.0.
Runtime prerequisites
SQLRLM: no external runtime (uses Pythonsqlite3)SchemeRLM: requiresguileHaskellRLM: requiresghci(GHC)JavaScriptRLM: requiresnode
Install REPL runtimes
If you want to run all REPL-based engines and benchmark comparisons (including Python dspy.RLM), install:
- Python REPL engine in benchmarks (
dspy.RLM):deno - Scheme REPL engine (
SchemeRLM):guile - Haskell REPL engine (
HaskellRLM):ghcifrom GHC - JavaScript REPL engine (
JavaScriptRLM):node
macOS (Homebrew):
brew install deno guile ghc node
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install -y deno guile-3.0 ghc nodejs npm
Verify tools are available:
deno --version
guile --version
ghci --version
node --version
Python package dependencies for benchmarks
For Oolong benchmarks, you also need:
dspy-repl(this package)dspydatasets(Hugging Face datasets loader used by Oolong adapter)
Example:
pip install -e ".[dev]" datasets
Benchmarking (Oolong dataset)
The repository includes an Oolong benchmark runner with artifact saving and trajectory diagnostics.
Run benchmarks:
python -m dspy_repl.benchmarks.oolong_runner --model "gemini/gemini-3-flash-preview" --languages "python,scheme,sql,haskell"
Run OOLONG-Pairs benchmarks:
python -m dspy_repl.benchmarks.oolong_pairs_runner --model "gemini/gemini-3-flash-preview" --languages "sql,scheme,js" --max-samples 20
Run S-NIAH synthetic scaling benchmarks:
python -m dspy_repl.benchmarks.niah_runner --languages "python,sql,scheme" --num-tasks 50 --context-lengths "8192,32768,131072"
Generate a single HTML analytics report (tables + Plotly charts + insights):
python -m dspy_repl.benchmarks.report_runner --run-dir benchmark_results/<run_id>
Compare several runs in one report:
python -m dspy_repl.benchmarks.report_runner --run-dirs benchmark_results/<id1>,benchmark_results/<id2>
Multiprocessing
By default, selected languages run in parallel per sample using multiprocessing.
- Enable explicitly:
--parallel - Disable:
--no-parallel - Cap processes:
--max-workers 2
Example:
python -m dspy_repl.benchmarks.oolong_runner --languages "scheme,sql,haskell" --max-workers 2
Useful benchmark flags
--max-samples 20--sample-id <id>--engine-timeout-seconds 240--verbose--save-dir benchmark_results--config ./benchmark.json
Where results are saved
Each run creates a timestamped directory under save_dir with:
benchmark.log: structured lifecycle logsrun_config.json: effective run configincremental_results.jsonl: live per-sample writes (if enabled)results.jsonl: per-sample records with trajectory diagnosticssummary.jsonandby_engine.csv: aggregate metricstrajectory_stats.jsonandper_engine_trajectory_stats.jsontrajectories/<engine>/<sample_id>.json: full trajectories
To inspect one execution deeply, start with a trajectory file and then correlate with the same sample in results.jsonl and benchmark.log.
Full benchmark usage guide: BENCHMARKS.md.
Local validation before release
python -m build
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest -q
python -m twine check --strict dist/*
Backlog
- Add shared context with PostgreSQL/MySQL.
- Test shared context in a multi-agent environment.
- Extend benchmarks with additional long-context suites.
- Optimize REPL instructions with GEPA.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dspy_repl-0.4.2.tar.gz.
File metadata
- Download URL: dspy_repl-0.4.2.tar.gz
- Upload date:
- Size: 58.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
277328fdad1f837a1f2252f9504aee210d26a2845f25f4e307024bc050e7346b
|
|
| MD5 |
06c3dc4cd49d5778af7e89d086719313
|
|
| BLAKE2b-256 |
4c638e552647103f3a9dd8c4b60dde06e203bc072a6c0b26d2e1139354be6437
|
Provenance
The following attestation bundles were made for dspy_repl-0.4.2.tar.gz:
Publisher:
publish.yml on Archelunch/dspy-repl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dspy_repl-0.4.2.tar.gz -
Subject digest:
277328fdad1f837a1f2252f9504aee210d26a2845f25f4e307024bc050e7346b - Sigstore transparency entry: 974074194
- Sigstore integration time:
-
Permalink:
Archelunch/dspy-repl@020a7adb6190d6750658855a02aa42e2acaf5ed1 -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/Archelunch
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@020a7adb6190d6750658855a02aa42e2acaf5ed1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dspy_repl-0.4.2-py3-none-any.whl.
File metadata
- Download URL: dspy_repl-0.4.2-py3-none-any.whl
- Upload date:
- Size: 80.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59e2cd313ec0dcbd7190997f2e3b7ae85f128c8495403a852a9e8c551a2f4ac0
|
|
| MD5 |
14af8782fce903f8151177c0ce6c908a
|
|
| BLAKE2b-256 |
b7d8a1f518c1a86fff717ae60cf73479872753e05269c24771bed8d29d1879ec
|
Provenance
The following attestation bundles were made for dspy_repl-0.4.2-py3-none-any.whl:
Publisher:
publish.yml on Archelunch/dspy-repl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dspy_repl-0.4.2-py3-none-any.whl -
Subject digest:
59e2cd313ec0dcbd7190997f2e3b7ae85f128c8495403a852a9e8c551a2f4ac0 - Sigstore transparency entry: 974074274
- Sigstore integration time:
-
Permalink:
Archelunch/dspy-repl@020a7adb6190d6750658855a02aa42e2acaf5ed1 -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/Archelunch
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@020a7adb6190d6750658855a02aa42e2acaf5ed1 -
Trigger Event:
push
-
Statement type: