Skip to main content

Reinforcement learning-based PostgreSQL query optimizer

Project description

rl-query-optimizer

A reinforcement learning-based PostgreSQL query optimizer.

Automatically selects the best query execution plan hints — such as forcing a Hash Join, Index Scan, or Nested Loop — based on a pre-trained PPO model, improving performance for complex JOINs, aggregations, and large table scans.


Installation

pip install rl-query-optimizer

Requires Python 3.8+ and a running PostgreSQL instance.


Quick start

from rl_query_optimizer import QueryOptimizer

optimizer = QueryOptimizer(db_config={
    "host":     "localhost",
    "database": "mydb",
    "user":     "admin",
    "password": "secret",
})

# Run a query with automatic plan optimization
result = optimizer.execute(
    "SELECT s.name, c.course_name "
    "FROM students s "
    "JOIN courses c ON s.course_id = c.id "
    "WHERE s.age > 21;"
)

print(result["rows"])
print(f"Action: {result['action_name']}  ({result['optimized_ms']:.1f}ms)")

API

QueryOptimizer(db_config, model_path=None, verbose=False)

Parameter Type Description
db_config dict psycopg2 connection kwargs (host, database, user, password)
model_path str|None Path to a .zip model file. Uses bundled default if None.
verbose bool Print action and timing for each query. Default False.

optimizer.execute(sql, params=None, compare=False)

Execute a query using the optimizer's recommended plan.

result = optimizer.execute("SELECT * FROM orders JOIN customers ON ...", compare=True)

print(result["rows"])            # query results
print(result["action_name"])     # e.g. "Prefer Hash Join"
print(result["optimized_ms"])    # execution time with optimizer hints
print(result["default_ms"])      # execution time without hints (if compare=True)
print(result["improvement_pct"]) # % improvement (if compare=True)

optimizer.suggest(sql)

Get the optimizer's recommendation without executing the query.

suggestion = optimizer.suggest("SELECT * FROM orders JOIN customers ON ...")
# {
#     "action":     3,
#     "name":       "Prefer Hash Join",
#     "hints":      ["SET enable_hashjoin = ON", ...],
#     "confidence": 8.42
# }

optimizer.replace_model(model_path)

Hot-swap the model without restarting. The new model is validated before the old one is replaced — if validation fails, the original keeps running.

optimizer.replace_model("models/improved_model_v2.zip")

When you release an improved model, simply bump the package version and developers get the new model automatically via pip install --upgrade rl-query-optimizer.


optimizer.invalidate_cache(sql=None)

Clear EXPLAIN and feature caches. Call after major schema changes (new indexes, ANALYZE, etc.).

optimizer.invalidate_cache()        # clear all
optimizer.invalidate_cache(sql)     # clear one query

Context manager

with QueryOptimizer(db_config=...) as opt:
    result = opt.execute("SELECT ...")
# connection closed automatically

Error handling

from rl_query_optimizer import (
    QueryOptimizer,
    ModelValidationError,
    DatabaseConnectionError,
    QueryExecutionError,
)

try:
    optimizer.replace_model("bad_model.zip")
except ModelValidationError as e:
    print(f"Model rejected: {e}")

try:
    result = optimizer.execute("SELECT ...")
except QueryExecutionError as e:
    print(f"Query failed: {e}")

Replacing the model

The package ships with a default trained model. To use your own:

# Option 1: pass at construction time
optimizer = QueryOptimizer(db_config=..., model_path="my_model.zip")

# Option 2: hot-swap at any time
optimizer.replace_model("my_model.zip")

The model must be a Stable Baselines3 PPO or DQN model trained with:

  • Observation space: shape (10,)
  • Action space: Discrete(7)

Supported actions

ID Action When it helps
0 Default (no hints) When PostgreSQL already picks the best plan
1 Force Index Scan Highly selective WHERE clauses
2 Force Sequential Scan Full table scans with low selectivity
3 Prefer Hash Join Large table JOINs
4 Prefer Merge Join Pre-sorted data JOINs
5 Prefer Nested Loop Small inner tables, highly filtered JOINs
6 Disable Parallel Query Simple queries where parallelism adds overhead

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rl_query_optimizer-0.2.0.tar.gz (149.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rl_query_optimizer-0.2.0-py3-none-any.whl (146.0 kB view details)

Uploaded Python 3

File details

Details for the file rl_query_optimizer-0.2.0.tar.gz.

File metadata

  • Download URL: rl_query_optimizer-0.2.0.tar.gz
  • Upload date:
  • Size: 149.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for rl_query_optimizer-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1c6e8fecddecdeb24e86f60668c75c20625e0561eb2100a14335f06b52d33640
MD5 308ff4ecc8619fb666a7597662b0b266
BLAKE2b-256 259e295a4bc6f103d030ca0b53825db4d58f5b4d78079eae501ab70a38d1bb70

See more details on using hashes here.

File details

Details for the file rl_query_optimizer-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for rl_query_optimizer-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 25f8b0a91ecb9fed29307849e3eec3a8c541cea1a984bb0bf2db6f3219a447f5
MD5 bdcb7f2d26e3a595f61ac320728f21af
BLAKE2b-256 78d59f613629bacbf8a04d84be334dab9bbe0d3d31d70161530816b8bc75371c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page