Skip to main content

Recursive Language Model - Process unlimited context with any LLM

Project description

RLM Engine

Recursive Language Model - Process unlimited context by having LLMs write and execute code.

Overview

RLM solves the context length limitation of LLMs by treating them as a "neurosymbolic operating system." Instead of feeding entire documents to the model, RLM:

  1. Provides a Python REPL environment
  2. LLM writes code to explore/search the document
  3. Code executes and returns results
  4. LLM iterates until finding the answer

This enables processing 10M+ character documents that would overflow traditional context windows.

Installation

pip install -e .

# Optional: For YAML config support
pip install pyyaml

# Optional: For API server
pip install fastapi uvicorn

Quick Start

Python API

from rlm import RLM, RLMConfig

# With OpenAI
rlm = RLM(backend="openai", model="gpt-4o")

# With vLLM (self-hosted)
rlm = RLM(
    backend="vllm",
    model="meta-llama/Llama-3.1-70B-Instruct",
    base_url="http://localhost:8000/v1"
)

# Process a document
result = rlm.completion(
    query="What is the secret code?",
    context=huge_document,  # Can be 10M+ characters
)

print(result.answer)
print(f"Iterations: {result.iterations}")
print(f"Time: {result.execution_time:.2f}s")

CLI

# Query a file
rlm query "What is the revenue?" --file report.txt

# Pipe from stdin
cat document.txt | rlm query "Summarize this"

# Use specific backend
rlm query "Find dates" --file data.txt --backend vllm --base-url http://localhost:8000/v1

# Output as JSON
rlm query "Count words" --file doc.txt --json

API Server

# Start server
rlm serve --port 8080

# Or with Python
python -m rlm.server
# Query the API
curl -X POST http://localhost:8080/v1/rlm/completion \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the revenue?",
    "context": "Q4 Report: Revenue $500M..."
  }'

Configuration

Environment Variables

export RLM_BACKEND=vllm
export RLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
export RLM_BASE_URL=http://localhost:8000/v1
export RLM_MAX_ITERATIONS=10
export RLM_VERBOSE=true

Config File (rlm.yaml)

model:
  backend: vllm
  model: meta-llama/Llama-3.1-70B-Instruct
  base_url: http://localhost:8000/v1

rlm:
  max_iterations: 10
  max_depth: 3
  temperature: 0.7
  verbose: false

optimizations:
  cache_enabled: true
  parallel_chunks: 5

server:
  host: 0.0.0.0
  port: 8080
# Initialize config
rlm init

Optimized Variants

FastRLM

Optimized for speed with relevance filtering:

from rlm import FastRLM

rlm = FastRLM(
    backend="vllm",
    base_url="http://localhost:8000/v1",
    use_relevance_filtering=True,
)

result = await rlm.fast_completion(query, context)

ScalableRLM

Optimized for large documents with chunking and caching:

from rlm import ScalableRLM

rlm = ScalableRLM(
    backend="openai",
    enable_cache=True,
    max_concurrent=10,
)

result = await rlm.scalable_completion(
    query="Summarize this 10M document",
    context=massive_document,
)

Streaming

from rlm import RLM, StreamingRLM

rlm = RLM(backend="openai")
streaming = StreamingRLM(rlm)

async for event in streaming.stream_completion(query, context):
    if event.event_type == "code":
        print(f"Executing: {event.data}")
    elif event.event_type == "output":
        print(f"Output: {event.data}")
    elif event.event_type == "answer":
        print(f"Answer: {event.data}")

Supported Backends

Backend Requires API Key Self-Hosted
openai Yes (OPENAI_API_KEY) No
anthropic Yes (ANTHROPIC_API_KEY) No
vllm No Yes
ollama No Yes

Architecture

┌─────────────────────────────────────────────────────────────┐
│                         RLM Engine                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Query + Context                                           │
│         │                                                   │
│         ▼                                                   │
│   ┌─────────────┐                                           │
│   │  System     │ ◄── Few-shot examples for code writing   │
│   │  Prompt     │                                           │
│   └─────────────┘                                           │
│         │                                                   │
│         ▼                                                   │
│   ┌─────────────┐      ┌─────────────┐                     │
│   │    LLM      │ ──► │   Parser    │ ──► Extract code    │
│   │   Call      │      └─────────────┘                     │
│   └─────────────┘                                           │
│         │                                                   │
│         ▼                                                   │
│   ┌─────────────┐                                           │
│   │  Python     │ ◄── Safe sandbox with context access     │
│   │   REPL      │                                           │
│   └─────────────┘                                           │
│         │                                                   │
│         ▼                                                   │
│   Output fed back to LLM ──────────────────► Loop          │
│         │                                                   │
│         ▼                                                   │
│   FINAL(answer) ──────────────────────────► Return         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Benchmarks

Run benchmarks:

rlm benchmark --backend vllm --base-url http://localhost:8000/v1 -o results.json
Model Accuracy Avg Latency
GPT-4o 95% 5s
Claude Sonnet 92% 6s
Llama-3.1-70B 88% 4s
Phi-3.5-mini 25-100%* 3-7s

*Accuracy varies by task complexity

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run specific tests
pytest tests/test_parser.py -v

# Run with coverage
pytest tests/ --cov=rlm --cov-report=html

Project Structure

rlm-engine/
├── rlm/
│   ├── __init__.py         # Package exports
│   ├── core.py             # Main RLM implementation
│   ├── fast_rlm.py         # Speed-optimized variant
│   ├── scalable_rlm.py     # Scale-optimized variant
│   ├── parser.py           # Code/answer extraction
│   ├── prompts.py          # System prompts
│   ├── repl.py             # Python REPL sandbox
│   ├── streaming.py        # Streaming support
│   ├── server.py           # FastAPI server
│   ├── cli.py              # CLI tool
│   ├── config.py           # Configuration
│   ├── logging_config.py   # Structured logging
│   ├── clients/            # LLM backend clients
│   └── optimizations/      # Caching, chunking, etc.
├── tests/                  # Test suite
├── examples/               # Usage examples
├── benchmarks/             # Benchmark suite
└── README.md

License

MIT License

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rlm_engine-1.0.0.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rlm_engine-1.0.0-py3-none-any.whl (41.5 kB view details)

Uploaded Python 3

File details

Details for the file rlm_engine-1.0.0.tar.gz.

File metadata

  • Download URL: rlm_engine-1.0.0.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for rlm_engine-1.0.0.tar.gz
Algorithm Hash digest
SHA256 eee3e1dc82dc0c750112f6589c7418c1084f7b89496df4549133bb3a8af0a410
MD5 e60eb8c8871e8b3e1ca42eb0b9aab179
BLAKE2b-256 4c79220a48e17227737b12866dde5785df694ba18bfe454812bbe20d7a001f77

See more details on using hashes here.

File details

Details for the file rlm_engine-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: rlm_engine-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 41.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for rlm_engine-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3208d73c2906efe90177b195a101d5e111255879ab8fe90ee6e49216cf81a409
MD5 44b9add53fc301b446944fee4b3fa21e
BLAKE2b-256 41748f79a00f138eb8669f10118ef9d2508df0c2de940393abcbc0fd752251e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page