Skip to main content

Multi-language REPL engines for DSPy Recursive Language Models (RLM) with trajectory tracing and benchmark tooling

Project description

DSPy-REPL

Tests Python 3.10+ License: MIT Buy Me a Coffee

Modular non-Python REPL engines for DSPy Recursive Language Models.

dspy-repl is a modular package for non-Python REPL-based RLM engines compatible with DSPy, inspired by the Recursive Language Models paper.

Scope

  • Keeps Python dspy.RLM inside DSPy as the canonical Python implementation.
  • Provides modular engines for:
    • SchemeRLM
    • SQLRLM
    • HaskellRLM
    • JavaScriptRLM
  • Exposes extension points for adding new REPL languages.

Install

pip install dspy-repl

For local development:

pip install -e ".[dev]"

Quick usage

import dspy
from dspy_repl import SchemeRLM, SQLRLM, HaskellRLM, JavaScriptRLM

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

scheme_rlm = SchemeRLM("context, query -> answer")
result = scheme_rlm(context="...", query="...")
print(result.answer)
print(result.trajectory)  # step-by-step REPL history

js_rlm = JavaScriptRLM("context, query -> answer")
js_result = js_rlm(context="...", query="...")
print(js_result.answer)

Observability and debugging

dspy-repl is designed to expose what happened inside an RLM run:

  • result.trajectory contains the full iterative REPL trace.
  • Each trajectory step includes:
    • reasoning: model reasoning for that step
    • code: code sent to the language REPL
    • output: interpreter output/error text
  • SQLRLM additionally exposes last_sql_profile timing breakdowns after each run.

Enable verbose engine logs:

scheme_rlm = SchemeRLM("context, query -> answer", verbose=True)

With verbose=True, each iteration is logged with reasoning/code/output previews, which is useful for prompt/tool/debug loops.

What happens inside an RLM

At a high level, each RLM run follows this loop:

  1. Build REPL variable metadata from inputs.
  2. Generate next action (reasoning + code) from the LM.
  3. Execute code in the target REPL (Scheme/Haskell/SQL/JavaScript).
  4. Append {reasoning, code, output} to trajectory.
  5. Repeat until final output is submitted or max iterations is reached.
  6. If max iterations is reached, run fallback extraction from accumulated trajectory.

This loop is shared in dspy_repl.core.base_rlm and specialized by language-specific wrappers.

Architecture

  • dspy_repl.core: shared execution loop and shared tool plumbing
  • dspy_repl.languages: language-specific prompt templates and wrappers
  • dspy_repl.interpreters: interpreter adapter exports
  • dspy_repl.compat: thin compatibility shims for DSPy touchpoints

DSPy compatibility

dspy>=3.0.0.

Runtime prerequisites

  • SQLRLM: no external runtime (uses Python sqlite3)
  • SchemeRLM: requires guile
  • HaskellRLM: requires ghci (GHC)
  • JavaScriptRLM: requires node

Install REPL runtimes

If you want to run all REPL-based engines and benchmark comparisons (including Python dspy.RLM), install:

  • Python REPL engine in benchmarks (dspy.RLM): deno
  • Scheme REPL engine (SchemeRLM): guile
  • Haskell REPL engine (HaskellRLM): ghci from GHC
  • JavaScript REPL engine (JavaScriptRLM): node

macOS (Homebrew):

brew install deno guile ghc node

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install -y deno guile-3.0 ghc nodejs npm

Verify tools are available:

deno --version
guile --version
ghci --version
node --version

Python package dependencies for benchmarks

For Oolong benchmarks, you also need:

  • dspy-repl (this package)
  • dspy
  • datasets (Hugging Face datasets loader used by Oolong adapter)

Example:

pip install -e ".[dev]" datasets

Benchmarking (Oolong dataset)

The repository includes an Oolong benchmark runner with artifact saving and trajectory diagnostics.

Run benchmarks:

python -m dspy_repl.benchmarks.oolong_runner --model "gemini/gemini-3-flash-preview" --languages "python,scheme,sql,haskell"

Run OOLONG-Pairs benchmarks:

python -m dspy_repl.benchmarks.oolong_pairs_runner --model "gemini/gemini-3-flash-preview" --languages "sql,scheme,js" --max-samples 20

Run S-NIAH synthetic scaling benchmarks:

python -m dspy_repl.benchmarks.niah_runner --languages "python,sql,scheme" --num-tasks 50 --context-lengths "8192,32768,131072"

Generate a single HTML analytics report (tables + Plotly charts + insights):

python -m dspy_repl.benchmarks.report_runner --run-dir benchmark_results/<run_id>

Compare several runs in one report:

python -m dspy_repl.benchmarks.report_runner --run-dirs benchmark_results/<id1>,benchmark_results/<id2>

Multiprocessing

By default, selected languages run in parallel per sample using multiprocessing.

  • Enable explicitly: --parallel
  • Disable: --no-parallel
  • Cap processes: --max-workers 2

Example:

python -m dspy_repl.benchmarks.oolong_runner --languages "scheme,sql,haskell" --max-workers 2

Useful benchmark flags

  • --max-samples 20
  • --sample-id <id>
  • --engine-timeout-seconds 240
  • --verbose
  • --save-dir benchmark_results
  • --config ./benchmark.json

Where results are saved

Each run creates a timestamped directory under save_dir with:

  • benchmark.log: structured lifecycle logs
  • run_config.json: effective run config
  • incremental_results.jsonl: live per-sample writes (if enabled)
  • results.jsonl: per-sample records with trajectory diagnostics
  • summary.json and by_engine.csv: aggregate metrics
  • trajectory_stats.json and per_engine_trajectory_stats.json
  • trajectories/<engine>/<sample_id>.json: full trajectories

To inspect one execution deeply, start with a trajectory file and then correlate with the same sample in results.jsonl and benchmark.log.

Full benchmark usage guide: BENCHMARKS.md.

Local validation before release

python -m build
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest -q
python -m twine check --strict dist/*

Backlog

  • Add shared context with PostgreSQL/MySQL.
  • Test shared context in a multi-agent environment.
  • Extend benchmarks with additional long-context suites.
  • Optimize REPL instructions with GEPA.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspy_repl-0.3.0.tar.gz (50.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dspy_repl-0.3.0-py3-none-any.whl (72.8 kB view details)

Uploaded Python 3

File details

Details for the file dspy_repl-0.3.0.tar.gz.

File metadata

  • Download URL: dspy_repl-0.3.0.tar.gz
  • Upload date:
  • Size: 50.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dspy_repl-0.3.0.tar.gz
Algorithm Hash digest
SHA256 937cfdda4367e22fdcc9a3b907b494b40c7935fda51fe3e7220115638eea98ca
MD5 5f1854cf59b274c5b8c588ebb082528a
BLAKE2b-256 6043181c2e6a102e2e0efc45d013c83c2db8fb7f51c4e3d7b759e9a11eb842b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for dspy_repl-0.3.0.tar.gz:

Publisher: publish.yml on Archelunch/dspy-repl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dspy_repl-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: dspy_repl-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 72.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dspy_repl-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 63faf9fe92dca8384dad45b21c45b67764b541fbd5c841b97489198812a6013f
MD5 ef21ebd6ecc14b36a264d73b544016f1
BLAKE2b-256 29b03f7852cbafd23a3ac3842873361503580e8e387efbb5bbb15ca94e757c70

See more details on using hashes here.

Provenance

The following attestation bundles were made for dspy_repl-0.3.0-py3-none-any.whl:

Publisher: publish.yml on Archelunch/dspy-repl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page