Skip to main content

NL -> ProblemSpec -> Search(D&C+RLM) -> SQL Evaluation -> Optimal Solution

Project description

syrch — Symbolic Search Runtime

English | [한국어]

NL Problem → ProblemSpec → Search(D&C+RLM) → SQL Executor → Optimal Solution

Project Goal

symbolic-search-runtime (syrch) is a search harness that finds optimal solutions to natural language problems over structured data. Unlike a simple QA agent that answers in one shot, syrch explores multiple reasoning paths using Divide & Conquer decomposition and Recursive Language Models, executing candidate solutions against real databases to select the best result.

Key Ideas

  • Divide & Conquer: Decompose a problem into logically independent sub-problems (sub-tasks), solve each independently, then merge results. Sub-problems can depend on each other forming a DAG.
  • RLM (Recursive Language Model): Each sub-task runs its own REPL loop — generate code → validate syntax → validate schema → execute SQL → check quality → evaluate confidence → refine or stop. Multiple reasoning paths are explored per node.
  • Confidence Calibration: LLM self-assessed confidence is discounted by execution signals (retries, errors, empty results) for more reliable scoring.
  • Grid Search: Systematic hyperparameter testing (max_depth, high_confidence, max_attempts, calibration_enabled) to find optimal configs.
  • Multi-table Schema: Planner and RLM see all database tables, not just one.
  • Search over reasoning, not execution: D&C splits the problem space, not the SQL. Each sub-problem is a complete reasoning unit (think → code → validate → execute → evaluate).
  • Pluggable Executors: Abstract BaseExecutor with SQLite, JDBC, and Databricks implementations — PEP 249 compatible.

Architecture

User Question
    │
    ▼
┌──────────────────┐
│    Planner       │  ← LLM decomposes question into sub-task DAG
│  (D&C)           │     (depends_on, is_atomic, expected_output)
│                  │     Multi-table schema: all tables visible
└──────┬───────────┘
       │ TaskDAG (topo_layers)
       ▼
┌──────────────────┐
│    Scheduler     │  ← Layer-by-layer DAG execution
│                   │
│  For each node:   │
│  ┌─────────────┐  │
│  │ RLM Agent    │  │  ← 3-step validation loop:
│  │ 1. SQLGlot   │  │     1. Syntax check (sqlglot.parse_one)
│  │    syntax    │  │     2. Schema AST check (valid columns)
│  │ 2. Schema    │  │     3. Execute + quality check
│  │    AST check │  │     Confidence calibration applied
│  │ 3. Execute   │  │
│  │ 4. Quality   │  │
│  │ 5. Calibrate │  │
│  └─────────────┘  │
│                   │
│  Pruning:          │
│  conf ≥ threshold → greedy stop
│  max_attempts hit → best path selected
└──────┬───────────┘
       │ NodeResults (DataFrames + SQL + confidence)
       ▼
┌──────────────────┐
│   Aggregator     │  ← Merge leaf results → final answer
│                   │     Tiebreaker: equal confidence → lower token_cost
└──────┬───────────┘
       │ FinalSolution
       ▼
 Optimal Answer + SQL + Reasoning Trace

 ═══════ Optional: RLM Clarification ═══════
       │
       ▼ (if --interactive)
┌──────────────────┐
│  RLM Clarifier   │  ← RLM exhaustion detected
│                   │     ambiguity score >= threshold
│  Node-level:      │     → ask user question
│  ┌─────────────┐  │     → refine ProblemSpec
│  │ no_sql       │  │     → re-run pipeline
│  │ empty_result │  │
│  │ quality_fail │  │
│  │ low_confidence│  │
│  └─────────────┘  │
└──────┬───────────┘
       │ Clarification answer → refined problem
       ▼
     Back to Planner (retry)

 ═══════ Optional: Grid Search ═══════
       │
       ▼
┌──────────────────┐
│   Grid Search    │  ← 27-54 cells (Product of params)
│                   │     ProcessPoolExecutor (max_workers=3)
│                   │     Reports: config.json, results.json,
│                   │              best.json, summary.md
└──────┬───────────┘
       │ Best config → run_pipeline again

How a Sub-Task Executes (RLM Node)

Node "Find top 10% customers"
    │
    ├── Attempt 1: SQL path A
    │   ├── [PASS] Syntax check (sqlglot)
    │   ├── [PASS] Schema column check
    │   ├── [PASS] Execute → 5,234 rows
    │   ├── [WARN] Quality: returned 5234 rows (>1000)
    │   └── confidence: 0.72 (below threshold, retry)
    │
    ├── Attempt 2: SQL path B
    │   ├── [PASS] Syntax check
    │   ├── [PASS] Schema column check
    │   ├── [PASS] Execute → 534 rows
    │   ├── [PASS] Quality: OK
    │   └── confidence: 0.91 → calibrated: 0.86 (above threshold, stop)
    │
    └── Return best (calibrated) result to parent

Directory Structure

syrch/
├── pyproject.toml
├── README.md
├── AGENTS.md
├── PLAN.md
├── LICENSE
├── .gitignore
├── benchmarks/example.jsonl
├── src/syrch/
│   ├── __init__.py
│   ├── cli/app.py                # Typer CLI
│   ├── core/
│   │   ├── models.py             # Data types (dataclasses)
│   │   ├── config.py             # ExecutionConfig + config loader
│   │   └── logging.py            # Structured logging
│   ├── executors/
│   │   ├── base.py               # BaseExecutor (ABC)
│   │   ├── sqlite_executor.py    # SQLite
│   │   ├── jdbc_executor.py      # JDBC via SQLAlchemy
│   │   ├── databricks_executor.py # Databricks SQL
│   │   ├── spark_executor.py     # SparkSession (Databricks/EMR/standalone)
│   │   └── cached_executor.py    # diskcache-backed SQL cache
│   ├── llm/
│   │   ├── base.py               # BaseLLM (ABC)
│   │   ├── openai_llm.py         # OpenAI
│   │   ├── anthropic_llm.py      # Anthropic Claude
│   │   └── cache.py              # CachedLLM + CentralCache
│   ├── search/
│   │   ├── planner.py            # D&C: NL -> TaskDAG
│   │   ├── scheduler.py          # DAG execution engine
│   │   ├── rlm_engine.py         # RLM REPL loop
│   │   ├── aggregator.py         # Result merge
│   │   ├── calibrator.py         # Confidence calibration
│   │   ├── clarify.py            # Ambiguity detection
│   │   ├── grid.py               # Grid search
│   │   └── pipeline.py           # Orchestrator
│   └── eval/
│       ├── runner.py             # Benchmark harness
│       ├── metrics.py            # Evaluation metrics
│       └── report.py             # Report export
├── validate_real.py              # Real LLM validation
├── orders_10dim.sqlite           # TPC-H derived (7.5M rows)
├── wikipedia_clickstream.sqlite  # Clickstream data (3K rows)
└── tests/
    ├── test_cache.py
    ├── test_clarify.py
    ├── test_e2e.py
    ├── test_eval.py
    ├── test_integration.py
    ├── test_planner.py
    ├── test_rlm_engine.py
    └── test_scheduler.py

Data Model

ProblemSpec { question, schema, all_schemas, goal_metric }
    │
    ▼
TaskDAG { nodes: {A, B, C, ...}, root_id, topo_layers }
    │  각 TaskNode: { id, description, depends_on, is_atomic, join_type }
    ▼
Scheduler → NodeResult { node_id, data(DataFrame), sql, confidence,
                         reasoning_paths, cost_tokens, error }
    │
    ▼
Aggregator → FinalSolution { answer, sql, confidence, data, token_cost, tree }
             (tiebreaker: equal confidence → lower cost_tokens wins)

Installation

# Core (CLI + SQLite)
pip install syrch

# Databricks SQL Warehouse (external connection)
pip install "syrch[databricks-sql]"

# Spark executor (Databricks Runtime, EMR, standalone)
pip install "syrch[spark]"

# Development (tests + lint)
pip install -e ".[dev]"

# Everything
pip install "syrch[all]"

Python API (Library Mode)

Use directly from Databricks notebooks or Python scripts:

from syrch import query

result = query(
    question="What discount × shipping combo maximizes revenue?",
    executor_type="databricks-sql",
    model="gpt-4o",
)
print(result.answer)
print(result.sql)
print(result.confidence)
print(result.data)

CLI Usage

# Install
pip install syrch

# Inspect database schema
syrch schema wikipedia_clickstream.sqlite
syrch schema orders_10dim.sqlite -t orders_10dim

# Show default config
syrch config

# Solve a problem (requires LLM API key)
export OPENAI_API_KEY="sk-..."
syrch search -q "What discount × shipping combo maximizes revenue for top 10% customers?"

# With config file
syrch search -q "..." --config syrch.yml

# With options
syrch search -q "Which click type generates the most traffic?" \
  --db wikipedia_clickstream.sqlite \
  --max-depth 3 \
  --high-conf 0.85 \
  --max-attempts 3 \
  --verbose

# Grid search over hyperparameters (54 cells)
syrch search -q "..." --db orders_10dim.sqlite --grid

# Benchmark against expected results
syrch eval -q "..." --db orders_10dim.sqlite --expected expected.csv

# Run benchmark suite
syrch benchmark benchmarks/orders.jsonl

CLI Reference

Command Option Description
search -q / --question Natural language problem (required)
--db Database path (default: orders_10dim.sqlite)
--max-depth Max D&C recursion depth (default: 3)
--executor sqlite / databricks-sql / spark / jdbc
--max-attempts Max RLM attempts per node (default: 3)
--high-conf Confidence threshold for greedy stop (default: 0.85)
--budget Token budget (default: 100000)
--llm openai / anthropic
--model LLM model name (default: qwen3.5-4b-4bit)
-v / --verbose Show reasoning traces
--cache/--no-cache Enable/disable LLM + SQL cache (default: on)
--cache-ttl Cache TTL in seconds (default: 86400)
--grid Run grid search over hyperparameters
--grid-parallel/--grid-sequential Parallel vs sequential grid execution
--grid-max-workers Max concurrent API calls (default: 3)
--max-concurrency Max concurrent LLM calls (default: 5; use 1 for local models)
--interactive Ask clarification questions when SQL cannot solve the task
--non-interactive One-shot mode with no clarification (default)
--config Path to YAML config file (syrch.yml or ~/.syrch/config.yml)
eval -q Question
--db Database path
--executor Executor type
--expected Expected results CSV
--report-format md / json
benchmark PATH JSONL benchmark file (positional)
--executor Executor type
--report Output report path
schema DB Database path (positional)
-t / --table Specific table
config --db Database path

Configuration

Config loaded from (priority order): CLI args > env vars (SYRCH_*) > config file > Databricks Secrets > defaults.

Config File (syrch.yml)

llm:
  provider: openai
  model: qwen3.5-4b-4bit
  base_url: http://localhost:11434/v1
  temperature: 0.7
  max_tokens_per_call: 4096
  timeout_seconds: 120

execution:
  executor_type: sqlite
  max_depth: 3
  max_attempts_per_node: 3
  high_confidence: 0.85
  token_budget: 100000
  cache_enabled: true
  cache_ttl: 86400
  verbose: false

Search paths: ./syrch.yml > ~/.syrch/config.yml > --config <path> explicit override

Environment Variables

Variable Maps to Example
SYRCH_MODEL llm.model gpt-4o
SYRCH_API_KEY llm.api_key sk-...
SYRCH_BASE_URL llm.base_url http://localhost:11434/v1
SYRCH_MAX_DEPTH execution.max_depth 3
SYRCH_VERBOSE execution.verbose true

Databricks Connection

Variable Auth type Description
DATABRICKS_SERVER_HOSTNAME all Databricks workspace URL
DATABRICKS_HTTP_PATH all SQL Warehouse HTTP path
DATABRICKS_TOKEN pat Personal Access Token
DATABRICKS_AUTH_TYPE all pat (default), databricks-oauth, or azure
DATABRICKS_CLIENT_ID oauth/azure OAuth client ID
DATABRICKS_CLIENT_SECRET oauth/azure OAuth client secret
AZURE_TENANT_ID azure Azure AD tenant ID

Structured Logging

Internal diagnostics go to stderr via logging. User-facing output (Solution, SQL) goes to stdout via rich.

# Default: WARNING+ only to stderr
syrch search -q "..."

# Verbose: INFO level
syrch search -q "..." -v

# Library mode
python -c "
from syrch import query
result = query('Total revenue?', verbose=True)
"

Log format: LEVEL:logger_name:message (stdlib logging default)

INFO:syrch.scheduler:Layer 0: dispatching 2 nodes
WARNING:syrch.rlm_engine:Empty result, confidence penalized

CI

GitHub Actions (push/PRmain):

Step Command
Lint ruff check src/syrch/
Type check mypy src/syrch/ --ignore-missing-imports
Test pytest tests/ -v --cov=src/syrch/ (Python 3.11 + 3.12)

Confidence Calibration

LLM self-assessed confidence is adjusted by execution signals:

Signal Weight Effect
syntax_error 0.10 ×0.90 per occurrence
execution_error 0.10 ×0.90 per occurrence
empty_result 0.15 ×0.85 if result is empty
schema_error 0.05 ×0.95 per occurrence
null_column 0.05 ×0.95 if result has all-NULL columns
retry_ratio 0.05 Scales with attempts used

Heuristic penalties (aggregator):

  • Empty result: +0.15 per node
  • Error present: +0.15 per node
  • TOP-N mismatch: +0.05 per node
  • "by year" without year column: +0.10 (once, global)
  • Capped at 0.40 total

Formula: calibrated = raw × Π(1 - weight_if_applicable)

Disabled by passing --no-cache (sets calibration_enabled=False in ExecutionConfig).

Grid Search

Automated hyperparameter search for optimal configuration:

syrch search -q "What discount × shipping combo maximizes revenue?" \
  --db orders_10dim.sqlite --grid

Default parameter grid (54 cells):

Parameter Values
max_depth 1, 3, 5
high_confidence 0.7, 0.85, 0.95
max_attempts_per_node 1, 3, 5
calibration_enabled True, False

Output: autoresearch/reports/{YYYYMMDD-HHMMSS}/{config,results,best}.json + summary.md

Best config selection: exact_match > confidence (cells with errors are skipped).

Pruning Strategy

The RLM engine uses a confidence-based pruning strategy:

  1. Generate first reasoning path → SQL → 3-step validation (syntax → schema → quality)
  2. Execute → score
  3. Apply confidence calibration (if enabled)
  4. If calibrated confidence ≥ HIGH_CONFIDENCE (0.85) → greedy accept, stop
  5. If below threshold → generate alternative path
  6. After max_attempts → pick best path by calibrated confidence

This balances search thoroughness with token budget. Simple problems resolve quickly (greedy path), while complex ones explore multiple candidates.

Executor Abstraction

All executors conform to BaseExecutor:

class BaseExecutor(ABC):
    def execute(sql: str) -> DataFrame: ...
    def get_schema(table_name?: str) -> TableSchema: ...
    def list_tables() -> list[str]: ...
    def close(): ...
Executor Backend Connection
SQLiteExecutor SQLite sqlite3 (thread-safe via threading.local)
JDBCExecutor Any JDBC SQLAlchemy
DatabricksExecutor Databricks SQL databricks-sql-connector (PEP 249)
SparkExecutor SparkSession pyspark (SparkSession.builder.getOrCreate())

Caching

All LLM and SQL calls are cached via diskcache (24h TTL):

Layer Cache Key
LLM generate() CachedLLM SHA256(system + user + model + temperature)
LLM generate_json() CachedLLM SHA256(system + user + model + temperature)
SQL execute() CachedExecutor SHA256(sql)

Toggle with --cache/--no-cache flag; TTL configurable with --cache-ttl.

Datasets

Dataset Rows Size Description
wikipedia_clickstream.sqlite 3,138 280 KB Aggregated Wikipedia clickstream data with mutual information metadata
orders_10dim.sqlite 7,500,000 1.3 GB TPC-H derived synthetic orders with 10 dimension columns

Testing

# Unit tests (FakeLLM, no API key required)
pytest tests/ -v

# All 69 tests pass:
#   7  cache tests (CentralCache, CachedLLM, CachedExecutor)
#   9  clarify tests (ambiguity score, question generation, worst detection)
#   9  e2e tests (real SQLite DBs + pipeline)
#   14 eval tests (runner, metrics, benchmark, join merge)
#   8  integration tests (DAG, grid, clarification loop, multi-table)
#   8  planner tests (decompose, cycle, join keys, recursive)
#   10 rlm_engine tests (validation, calibration, quality, calibrator)
#   4  scheduler tests (DAG execution)

Real-world Validation

# Run full validation (requires LLM API key)
python validate_real.py

# Specific level
python validate_real.py --level 3 --verbose

# Custom question
python validate_real.py --question "Total revenue by year?" --db orders_10dim.sqlite

# With local model
python validate_real.py --model qwen3.5-4b --max-concurrency 1

# Results (2026-06-15, minimax-m3:cloud):
#   L1 Easy           3/3 PASS
#   L2 Medium         3/3 PASS
#   L3 Complex        2/2 PASS
#   L4 Very Complex   2/2 PASS
#   L5 Ambiguous      2/2 AMBIGUOUS (expected)
#   ─────────────────────────────
#   Total             10/10 PASS  100% (2 AMBIGUOUS)

Research Background

  • RLM (Recursive Language Model): MIT CSAIL OASYS Lab, 2025. Inference paradigm where LLMs recursively decompose input via REPL environments. paper code
  • RDD (Recursive Decomposition with Dependencies): Formal D&C framework with dependency DAGs. paper
  • PAC-MCTS: Bias-aware pruning with formal guarantees for tree search. paper
  • ROMA: Recursive meta-agent framework with Atomizer/Planner/Executor/Aggregator roles. paper
  • Graph Harness: Structured DAG execution with immutable plan versions. paper
  • AdaptOrch: Topology-aware multi-agent orchestration (parallel/sequential/hierarchical/hybrid). paper
  • DST: Adaptive tree search with confidence-based pruning (26-75% computation reduction). paper
  • LLM Compiler: Parallel task scheduling via dependency graphs; closely related to syrch's DAG scheduler and layer-by-layer execution. paper

Open Research Questions

Question Approach
When to stop dividing? (Unit case detection) Experiment with LLM self-assessment + complexity heuristics
How to merge sub-task results? DAG-based REPL variable passing + Aggregator role
How to prune search space? Confidence-based pruning + uncertainty-aware allocation
Optimal D&C strategy? Topology routing (AdaptOrch) based on DAG structure metrics
Optimal calibration weights? Grid search over penalty coefficients per signal
Join key inference? Planner emits join_keys between sub-tasks
Recursive decomposition? Planner recurses on non-atomic sub-tasks
When SQL cannot solve? RLM clarification: ambiguity score → interactive feedback → re-decompose
Optimal clarification threshold? Grid search over score weights + decision boundary

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syrch-0.1.2.tar.gz (58.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syrch-0.1.2-py3-none-any.whl (48.2 kB view details)

Uploaded Python 3

File details

Details for the file syrch-0.1.2.tar.gz.

File metadata

  • Download URL: syrch-0.1.2.tar.gz
  • Upload date:
  • Size: 58.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for syrch-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ab2712e4f6c76a9eac891ae6cdfc025f14300e21b6e92c6842dd60f81dec8738
MD5 95236054b6f9c5b288b4f308cfea7d55
BLAKE2b-256 50cbe683d775be22a1c8af3a7b4ef0e61313b18ae32d095cf2d3c5fa94447702

See more details on using hashes here.

File details

Details for the file syrch-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: syrch-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 48.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for syrch-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7047bc5febff915ee63c9787c39f58ef9b396ce446169e0eaf8b22a2b13b626b
MD5 fc68ab9a6ee59f00c7c83e21bb713d20
BLAKE2b-256 9f381860bf4f880276b6c1242f7c67b6d6c57470c21e1293b236827ef2e40efc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page