Recursive Language Model - Process unlimited context with any LLM

These details have not been verified by PyPI

Project links

Project description

RLM Engine

Recursive Language Model - Process unlimited context by having LLMs write and execute code.

Overview

RLM solves the context length limitation of LLMs by treating them as a "neurosymbolic operating system." Instead of feeding entire documents to the model, RLM:

Provides a Python REPL environment
LLM writes code to explore/search the document
Code executes and returns results
LLM iterates until finding the answer

This enables processing 10M+ character documents that would overflow traditional context windows.

Installation

pip install -e .

# Optional: For YAML config support
pip install pyyaml

# Optional: For API server
pip install fastapi uvicorn

Quick Start

Python API

from rlm import RLM, RLMConfig

# With OpenAI
rlm = RLM(backend="openai", model="gpt-4o")

# With vLLM (self-hosted)
rlm = RLM(
    backend="vllm",
    model="meta-llama/Llama-3.1-70B-Instruct",
    base_url="http://localhost:8000/v1"
)

# Process a document
result = rlm.completion(
    query="What is the secret code?",
    context=huge_document,  # Can be 10M+ characters
)

print(result.answer)
print(f"Iterations: {result.iterations}")
print(f"Time: {result.execution_time:.2f}s")

CLI

# Query a file
rlm query "What is the revenue?" --file report.txt

# Pipe from stdin
cat document.txt | rlm query "Summarize this"

# Use specific backend
rlm query "Find dates" --file data.txt --backend vllm --base-url http://localhost:8000/v1

# Output as JSON
rlm query "Count words" --file doc.txt --json

API Server

# Start server
rlm serve --port 8080

# Or with Python
python -m rlm.server

# Query the API
curl -X POST http://localhost:8080/v1/rlm/completion \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the revenue?",
    "context": "Q4 Report: Revenue $500M..."
  }'

Configuration

Environment Variables

export RLM_BACKEND=vllm
export RLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
export RLM_BASE_URL=http://localhost:8000/v1
export RLM_MAX_ITERATIONS=10
export RLM_VERBOSE=true

Config File (rlm.yaml)

model:
  backend: vllm
  model: meta-llama/Llama-3.1-70B-Instruct
  base_url: http://localhost:8000/v1

rlm:
  max_iterations: 10
  max_depth: 3
  temperature: 0.7
  verbose: false

optimizations:
  cache_enabled: true
  parallel_chunks: 5

server:
  host: 0.0.0.0
  port: 8080

# Initialize config
rlm init

Optimized Variants

FastRLM

Optimized for speed with relevance filtering:

from rlm import FastRLM

rlm = FastRLM(
    backend="vllm",
    base_url="http://localhost:8000/v1",
    use_relevance_filtering=True,
)

result = await rlm.fast_completion(query, context)

ScalableRLM

Optimized for large documents with chunking and caching:

from rlm import ScalableRLM

rlm = ScalableRLM(
    backend="openai",
    enable_cache=True,
    max_concurrent=10,
)

result = await rlm.scalable_completion(
    query="Summarize this 10M document",
    context=massive_document,
)

Streaming

from rlm import RLM, StreamingRLM

rlm = RLM(backend="openai")
streaming = StreamingRLM(rlm)

async for event in streaming.stream_completion(query, context):
    if event.event_type == "code":
        print(f"Executing: {event.data}")
    elif event.event_type == "output":
        print(f"Output: {event.data}")
    elif event.event_type == "answer":
        print(f"Answer: {event.data}")

Supported Backends

Backend	Requires API Key	Self-Hosted
`openai`	Yes (OPENAI_API_KEY)	No
`anthropic`	Yes (ANTHROPIC_API_KEY)	No
`vllm`	No	Yes
`ollama`	No	Yes

Architecture

┌─────────────────────────────────────────────────────────────┐
│                         RLM Engine                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Query + Context                                           │
│         │                                                   │
│         ▼                                                   │
│   ┌─────────────┐                                           │
│   │  System     │ ◄── Few-shot examples for code writing   │
│   │  Prompt     │                                           │
│   └─────────────┘                                           │
│         │                                                   │
│         ▼                                                   │
│   ┌─────────────┐      ┌─────────────┐                     │
│   │    LLM      │ ──► │   Parser    │ ──► Extract code    │
│   │   Call      │      └─────────────┘                     │
│   └─────────────┘                                           │
│         │                                                   │
│         ▼                                                   │
│   ┌─────────────┐                                           │
│   │  Python     │ ◄── Safe sandbox with context access     │
│   │   REPL      │                                           │
│   └─────────────┘                                           │
│         │                                                   │
│         ▼                                                   │
│   Output fed back to LLM ──────────────────► Loop          │
│         │                                                   │
│         ▼                                                   │
│   FINAL(answer) ──────────────────────────► Return         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Benchmarks

Run benchmarks:

rlm benchmark --backend vllm --base-url http://localhost:8000/v1 -o results.json

Model	Accuracy	Avg Latency
GPT-4o	95%	5s
Claude Sonnet	92%	6s
Llama-3.1-70B	88%	4s
Phi-3.5-mini	25-100%*	3-7s

*Accuracy varies by task complexity

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run specific tests
pytest tests/test_parser.py -v

# Run with coverage
pytest tests/ --cov=rlm --cov-report=html

Project Structure

rlm-engine/
├── rlm/
│   ├── __init__.py         # Package exports
│   ├── core.py             # Main RLM implementation
│   ├── fast_rlm.py         # Speed-optimized variant
│   ├── scalable_rlm.py     # Scale-optimized variant
│   ├── parser.py           # Code/answer extraction
│   ├── prompts.py          # System prompts
│   ├── repl.py             # Python REPL sandbox
│   ├── streaming.py        # Streaming support
│   ├── server.py           # FastAPI server
│   ├── cli.py              # CLI tool
│   ├── config.py           # Configuration
│   ├── logging_config.py   # Structured logging
│   ├── clients/            # LLM backend clients
│   └── optimizations/      # Caching, chunking, etc.
├── tests/                  # Test suite
├── examples/               # Usage examples
├── benchmarks/             # Benchmark suite
└── README.md

License

MIT License

References

RLM Paper - Original research
MIT Implementation - Official implementation

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Jan 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rlm_engine-1.0.0.tar.gz (39.9 kB view details)

Uploaded Jan 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rlm_engine-1.0.0-py3-none-any.whl (41.5 kB view details)

Uploaded Jan 7, 2026 Python 3

File details

Details for the file rlm_engine-1.0.0.tar.gz.

File metadata

Download URL: rlm_engine-1.0.0.tar.gz
Upload date: Jan 7, 2026
Size: 39.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for rlm_engine-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`eee3e1dc82dc0c750112f6589c7418c1084f7b89496df4549133bb3a8af0a410`
MD5	`e60eb8c8871e8b3e1ca42eb0b9aab179`
BLAKE2b-256	`4c79220a48e17227737b12866dde5785df694ba18bfe454812bbe20d7a001f77`

See more details on using hashes here.

File details

Details for the file rlm_engine-1.0.0-py3-none-any.whl.

File metadata

Download URL: rlm_engine-1.0.0-py3-none-any.whl
Upload date: Jan 7, 2026
Size: 41.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for rlm_engine-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3208d73c2906efe90177b195a101d5e111255879ab8fe90ee6e49216cf81a409`
MD5	`44b9add53fc301b446944fee4b3fa21e`
BLAKE2b-256	`41748f79a00f138eb8669f10118ef9d2508df0c2de940393abcbc0fd752251e2`

See more details on using hashes here.

rlm-engine 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RLM Engine

Overview

Installation

Quick Start

Python API

CLI

API Server

Configuration

Environment Variables

Config File (rlm.yaml)

Optimized Variants

FastRLM

ScalableRLM

Streaming

Supported Backends

Architecture

Benchmarks

Development

Project Structure

License

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes