Skip to main content

Multi-language REPL engines for DSPy Recursive Language Models (RLM) with trajectory tracing and benchmark tooling

Project description

DSPy-REPL

Tests Python 3.10+ License: MIT Buy Me a Coffee

Modular non-Python REPL engines for DSPy Recursive Language Models.

dspy-repl is a modular package for non-Python REPL-based RLM engines compatible with DSPy, inspired by the Recursive Language Models paper.

Scope

  • Keeps Python dspy.RLM inside DSPy as the canonical Python implementation.
  • Provides modular engines for:
    • SchemeRLM
    • SQLRLM
    • HaskellRLM
    • JavaScriptRLM
  • Exposes extension points for adding new REPL languages.

Install

pip install dspy-repl

For local development:

pip install -e ".[dev]"

Quick usage

import dspy
from dspy_repl import SchemeRLM, SQLRLM, HaskellRLM, JavaScriptRLM

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

scheme_rlm = SchemeRLM("context, query -> answer")
result = scheme_rlm(context="...", query="...")
print(result.answer)
print(result.trajectory)  # step-by-step REPL history

js_rlm = JavaScriptRLM("context, query -> answer")
js_result = js_rlm(context="...", query="...")
print(js_result.answer)

SQLRLM

SQLRLM uses Python's built-in sqlite3 as its REPL environment -- no external runtime needed. The LLM writes SQL to explore data, call tools, and produce results.

Basic usage

import dspy
from dspy_repl import SQLRLM

dspy.configure(lm=dspy.LM("openai/gpt-4o"))

rlm = SQLRLM("context, query -> answer")
result = rlm(context="...", query="...")
print(result.answer)

Pre-loaded schemas

When working with relational data, you can pre-load a SQL schema so the LLM sees the table structure from iteration 1 instead of spending iterations writing CREATE TABLE statements:

rlm = SQLRLM(
    "query -> answer",
    preload_sql="schema.sql",          # path to .sql file or raw SQL string
    db_path="data/my_project.db",      # persist to file (default: ":memory:")
    skip_variable_tables={"query"},    # don't create a table for this input
)
result = rlm(query="Find all active projects led by engineers")

preload_sql accepts either a file path (e.g. "schema.sql") or a raw SQL string. The DDL is executed once at startup. All tables -- including pre-loaded ones -- are visible in the LLM's prompt with column types, row counts, foreign key relationships, and CHECK constraints.

db_path controls where the SQLite database lives. Use a file path for persistence across runs. When reopening an existing database, preload_sql detects that tables already exist and skips the DDL.

skip_variable_tables prevents specified input variables from being materialized as SQL tables. Useful for string or structured inputs that serve as context rather than queryable data. These appear as plain text in the prompt instead.

The LLM prompt shows the full schema from iteration 1:

Input variables:
- query: "Find all active projects led by engineers"

Database tables:
- departments (id TEXT, name TEXT, budget REAL) -- 5 rows
  CHECKs: budget >= 0
- employees (id TEXT, name TEXT, department_id TEXT, role TEXT) -- 12 rows
  FKs: department_id -> departments.id
  CHECKs: role IN ('engineer','manager','designer','analyst')
- projects (id TEXT, name TEXT, lead_id TEXT, status TEXT) -- 8 rows
  FKs: lead_id -> employees.id
  CHECKs: status IN ('active','completed','cancelled')

Using the SQLInterpreter directly

The underlying SQLInterpreter supports the same features and can be used standalone:

from dspy_repl.interpreters.sql_interpreter import SQLInterpreter

schema = """
CREATE TABLE authors (id TEXT PRIMARY KEY, name TEXT NOT NULL);
CREATE TABLE books (
    id TEXT PRIMARY KEY,
    title TEXT NOT NULL,
    author_id TEXT NOT NULL REFERENCES authors(id),
    genre TEXT CHECK(genre IN ('fiction','nonfiction','poetry'))
);
"""

with SQLInterpreter(preload_sql=schema) as interp:
    # describe_tables() returns columns, row counts, FKs, and CHECKs
    for t in interp.describe_tables():
        print(t["name"], t.get("foreign_keys", []), t.get("checks", []))

    interp.execute("INSERT INTO authors VALUES ('a1', 'Tolkien');")
    interp.execute("INSERT INTO books VALUES ('b1', 'The Hobbit', 'a1', 'fiction');")
    print(interp.execute("SELECT * FROM books;"))

Tools as SQL functions

Custom Python functions can be registered as SQLite UDFs, callable directly from SQL:

def classify(text: str) -> str:
    return "positive" if "good" in text.lower() else "negative"

rlm = SQLRLM(
    "reviews -> summary",
    preload_sql="CREATE TABLE results (id INTEGER PRIMARY KEY, sentiment TEXT);",
    tools=[classify],
)
# The LLM can write: INSERT INTO results SELECT id, classify(text) FROM reviews;

Observability and debugging

dspy-repl is designed to expose what happened inside an RLM run:

  • result.trajectory contains the full iterative REPL trace.
  • Each trajectory step includes:
    • reasoning: model reasoning for that step
    • code: code sent to the language REPL
    • output: interpreter output/error text
  • SQLRLM additionally exposes last_sql_profile timing breakdowns after each run.

Enable verbose engine logs:

scheme_rlm = SchemeRLM("context, query -> answer", verbose=True)

With verbose=True, each iteration is logged with reasoning/code/output previews, which is useful for prompt/tool/debug loops.

What happens inside an RLM

At a high level, each RLM run follows this loop:

  1. Build REPL variable metadata from inputs.
  2. Generate next action (reasoning + code) from the LM.
  3. Execute code in the target REPL (Scheme/Haskell/SQL/JavaScript).
  4. Append {reasoning, code, output} to trajectory.
  5. Repeat until final output is submitted or max iterations is reached.
  6. If max iterations is reached, run fallback extraction from accumulated trajectory.

This loop is shared in dspy_repl.core.base_rlm and specialized by language-specific wrappers.

Architecture

  • dspy_repl.core: shared execution loop and shared tool plumbing
  • dspy_repl.languages: language-specific prompt templates and wrappers
  • dspy_repl.interpreters: interpreter adapter exports
  • dspy_repl.compat: thin compatibility shims for DSPy touchpoints

DSPy compatibility

dspy>=3.0.0.

Runtime prerequisites

  • SQLRLM: no external runtime (uses Python sqlite3)
  • SchemeRLM: requires guile
  • HaskellRLM: requires ghci (GHC)
  • JavaScriptRLM: requires node

Install REPL runtimes

If you want to run all REPL-based engines and benchmark comparisons (including Python dspy.RLM), install:

  • Python REPL engine in benchmarks (dspy.RLM): deno
  • Scheme REPL engine (SchemeRLM): guile
  • Haskell REPL engine (HaskellRLM): ghci from GHC
  • JavaScript REPL engine (JavaScriptRLM): node

macOS (Homebrew):

brew install deno guile ghc node

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install -y deno guile-3.0 ghc nodejs npm

Verify tools are available:

deno --version
guile --version
ghci --version
node --version

Python package dependencies for benchmarks

For Oolong benchmarks, you also need:

  • dspy-repl (this package)
  • dspy
  • datasets (Hugging Face datasets loader used by Oolong adapter)

Example:

pip install -e ".[dev]" datasets

Benchmarking (Oolong dataset)

The repository includes an Oolong benchmark runner with artifact saving and trajectory diagnostics.

Run benchmarks:

python -m dspy_repl.benchmarks.oolong_runner --model "gemini/gemini-3-flash-preview" --languages "python,scheme,sql,haskell"

Run OOLONG-Pairs benchmarks:

python -m dspy_repl.benchmarks.oolong_pairs_runner --model "gemini/gemini-3-flash-preview" --languages "sql,scheme,js" --max-samples 20

Run S-NIAH synthetic scaling benchmarks:

python -m dspy_repl.benchmarks.niah_runner --languages "python,sql,scheme" --num-tasks 50 --context-lengths "8192,32768,131072"

Generate a single HTML analytics report (tables + Plotly charts + insights):

python -m dspy_repl.benchmarks.report_runner --run-dir benchmark_results/<run_id>

Compare several runs in one report:

python -m dspy_repl.benchmarks.report_runner --run-dirs benchmark_results/<id1>,benchmark_results/<id2>

Multiprocessing

By default, selected languages run in parallel per sample using multiprocessing.

  • Enable explicitly: --parallel
  • Disable: --no-parallel
  • Cap processes: --max-workers 2

Example:

python -m dspy_repl.benchmarks.oolong_runner --languages "scheme,sql,haskell" --max-workers 2

Useful benchmark flags

  • --max-samples 20
  • --sample-id <id>
  • --engine-timeout-seconds 240
  • --verbose
  • --save-dir benchmark_results
  • --config ./benchmark.json

Where results are saved

Each run creates a timestamped directory under save_dir with:

  • benchmark.log: structured lifecycle logs
  • run_config.json: effective run config
  • incremental_results.jsonl: live per-sample writes (if enabled)
  • results.jsonl: per-sample records with trajectory diagnostics
  • summary.json and by_engine.csv: aggregate metrics
  • trajectory_stats.json and per_engine_trajectory_stats.json
  • trajectories/<engine>/<sample_id>.json: full trajectories

To inspect one execution deeply, start with a trajectory file and then correlate with the same sample in results.jsonl and benchmark.log.

Full benchmark usage guide: BENCHMARKS.md.

Local validation before release

python -m build
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest -q
python -m twine check --strict dist/*

Backlog

  • Add shared context with PostgreSQL/MySQL.
  • Test shared context in a multi-agent environment.
  • Extend benchmarks with additional long-context suites.
  • Optimize REPL instructions with GEPA.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspy_repl-0.4.0.tar.gz (57.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dspy_repl-0.4.0-py3-none-any.whl (80.1 kB view details)

Uploaded Python 3

File details

Details for the file dspy_repl-0.4.0.tar.gz.

File metadata

  • Download URL: dspy_repl-0.4.0.tar.gz
  • Upload date:
  • Size: 57.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dspy_repl-0.4.0.tar.gz
Algorithm Hash digest
SHA256 b9e97bc5d351806497c5d63bc8f65be9d6bd540217cb145989f411095fa909bd
MD5 8ab9af2aabbd708abc872a1f2ad999fa
BLAKE2b-256 709b2becbd29778946778fed7f9131db6224a64be27d4ba7b9093c6942229d18

See more details on using hashes here.

Provenance

The following attestation bundles were made for dspy_repl-0.4.0.tar.gz:

Publisher: publish.yml on Archelunch/dspy-repl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dspy_repl-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: dspy_repl-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 80.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dspy_repl-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 70c62157c853910482f7cbcc1309d161b0960126c60e2fd427049cc336048dd4
MD5 3e5b16cbb54b8d597e1a5db0b5708547
BLAKE2b-256 829e4db73354aba2b43e3d04299348a6df5b50dacbf184a2be298f7ebbeb2f0a

See more details on using hashes here.

Provenance

The following attestation bundles were made for dspy_repl-0.4.0-py3-none-any.whl:

Publisher: publish.yml on Archelunch/dspy-repl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page