Skip to main content

AI-powered natural language database query system using OpenAI Agents SDK

Project description

DB Query Agent ๐Ÿค–๐Ÿ’ฌ

AI-powered natural language database query system using OpenAI Agents SDK

PyPI version Python 3.10+ License: MIT Downloads

A powerful, production-ready Python package that lets you query databases using natural language. Built with OpenAI Agents SDK, featuring intelligent safety guardrails, streaming responses, and optimized for speed.

โœจ Features

  • ๐Ÿ—ฃ๏ธ Natural Language Queries - Ask questions in plain English, get SQL and results
  • โšก Blazing Fast - Streaming responses, adaptive model selection, multi-layer caching
  • ๐Ÿ”’ Production-Ready Safety - Read-only mode, SQL injection prevention, query validation
  • ๐Ÿ’ฌ Session Management - Maintains conversation context across multiple queries
  • ๐ŸŽฏ Smart Schema Loading - Only loads relevant tables for faster responses
  • ๐Ÿ”Œ Universal Database Support - PostgreSQL, MySQL, SQLite, SQL Server
  • ๐Ÿ“ฆ Easy Integration - Works with Django, Flask, FastAPI, or any Python app

๐Ÿš€ Quick Start

Installation

pip install db-query-agent

# With database-specific drivers
pip install db-query-agent[postgres]  # PostgreSQL
pip install db-query-agent[mysql]     # MySQL
pip install db-query-agent[all]       # All drivers

Basic Usage

Option 1: Load from .env (Recommended)

# Create .env file
DATABASE_URL=postgresql://user:pass@localhost/mydb
OPENAI_API_KEY=sk-...
FAST_MODEL=gpt-4o-mini
READ_ONLY=true
from db_query_agent import DatabaseQueryAgent

# Load everything from .env
agent = DatabaseQueryAgent.from_env()

# Or override specific values
agent = DatabaseQueryAgent.from_env(
    fast_model="gpt-4.1",
    enable_statistics=True
)

Option 2: Direct Configuration

from db_query_agent import DatabaseQueryAgent

# Pass all parameters directly
agent = DatabaseQueryAgent(
    database_url="postgresql://user:pass@localhost/mydb",
    openai_api_key="sk-...",
    fast_model="gpt-4o-mini",
    read_only=True,
    enable_cache=True
)

Query the Database

# Query in natural language (async)
result = await agent.query("How many users signed up last month?")

print(result["natural_response"])
# Output: "245 users signed up last month"

print(result["sql"])
# Output: "SELECT COUNT(*) FROM users WHERE created_at >= '2025-09-01'"

With Streaming (Recommended)

# Stream responses for better UX
async for chunk in agent.query_stream("Show me top 10 customers by revenue"):
    print(chunk, end="", flush=True)

Session-based Chat

# Create a session for multi-turn conversations
session = agent.create_session(session_id="user_123")

# First query
response1 = await session.ask("Show me all products")

# Follow-up query (maintains context)
response2 = await session.ask("Filter those by category=electronics")

# Another follow-up
response3 = await session.ask("Sort by price descending")

๐Ÿ”ง Utility Methods

Session Management

# List all active sessions
sessions = agent.list_sessions()

# Get conversation history
history = agent.get_session_history("user_123")

# Clear session history
agent.clear_session("user_123")

# Delete session
agent.delete_session("user_123")

Schema Exploration

# Get basic schema
schema = agent.get_schema()

# Get detailed schema with relationships
schema_info = agent.get_schema_info(include_foreign_keys=True)
print(f"Total tables: {schema_info['total_tables']}")
print(f"Relationships: {len(schema_info['relationships'])}")

Statistics and Monitoring

# Get comprehensive statistics
stats = agent.get_stats()

print(f"Total queries: {stats['total_queries']}")
print(f"Cache hit rate: {stats['cache_hits'] / stats['total_queries'] * 100:.1f}%")
print(f"Active connections: {stats['pool']['checked_out']}")
print(f"Total sessions: {stats['sessions']['total_sessions']}")

๐ŸŽฏ Framework Integration

Django

# views.py
from django.conf import settings
from db_query_agent import DatabaseQueryAgent

agent = DatabaseQueryAgent(
    database_url=settings.DATABASES['default']['URL'],
    openai_api_key=settings.OPENAI_API_KEY
)

def query_database(request):
    question = request.POST.get('question')
    result = agent.query(question)
    return JsonResponse(result)

FastAPI

# main.py
from fastapi import FastAPI
from db_query_agent import DatabaseQueryAgent

app = FastAPI()
agent = DatabaseQueryAgent(database_url=os.getenv("DATABASE_URL"))

@app.post("/query")
async def query_db(question: str):
    return agent.query(question)

Flask

# app.py
from flask import Flask, request
from db_query_agent import DatabaseQueryAgent

app = Flask(__name__)
agent = DatabaseQueryAgent(database_url=os.getenv("DATABASE_URL"))

@app.route('/query', methods=['POST'])
def query():
    return agent.query(request.json['question'])

โš™๏ธ Configuration

Environment Variables (Recommended)

Create a .env file with all configuration:

# Required
OPENAI_API_KEY=sk-your-api-key
DATABASE_URL=postgresql://user:pass@localhost/db

# Model Configuration
MODEL_STRATEGY=adaptive
FAST_MODEL=gpt-4o-mini
BALANCED_MODEL=gpt-4.1-mini
COMPLEX_MODEL=gpt-4.1

# Cache Configuration
CACHE_ENABLED=true
CACHE_BACKEND=memory
CACHE_SCHEMA_TTL=3600
CACHE_QUERY_TTL=300
CACHE_LLM_TTL=3600

# Safety Configuration
READ_ONLY=true
QUERY_TIMEOUT=30
MAX_RESULT_ROWS=10000

# Connection Configuration
DB_POOL_SIZE=10
DB_MAX_OVERFLOW=20

# Performance Configuration
LAZY_SCHEMA_LOADING=true
ENABLE_STREAMING=true
WARMUP_ON_INIT=false

Then load with a single line:

agent = DatabaseQueryAgent.from_env()

Direct Configuration

Pass parameters directly (overrides .env):

from db_query_agent import DatabaseQueryAgent

agent = DatabaseQueryAgent(
    database_url="postgresql://...",
    openai_api_key="sk-...",
    
    # Model configuration
    model_strategy="adaptive",  # Use fast model for simple queries
    fast_model="gpt-4o-mini",   # 2s generation time
    balanced_model="gpt-4.1-mini",  # 3s generation time
    complex_model="gpt-4.1",     # 5s generation time
    
    # Cache configuration
    enable_cache=True,
    cache_backend="redis",
    schema_cache_ttl=3600,  # 1 hour
    query_cache_ttl=300,    # 5 minutes
    llm_cache_ttl=3600,     # 1 hour
    
    # Safety configuration
    read_only=True,
    allowed_tables=["users", "orders", "products"],
    blocked_tables=["sensitive_data"],
    max_query_timeout=30,
    max_result_rows=10000,
    
    # Connection configuration
    pool_size=10,
    max_overflow=20,
    
    # Performance configuration
    lazy_schema_loading=True,
    max_tables_in_context=5,
    enable_streaming=True,
    warmup_on_init=False,
    
    # Statistics configuration
    enable_statistics=True,  # Track queries, cache hits, etc.
    
    # Session configuration
    session_backend="sqlite",
    session_db_path="./sessions.db"
)

Mixed Configuration

Load from .env and override specific values:

# Load most settings from .env, override specific ones
agent = DatabaseQueryAgent.from_env(
    fast_model="gpt-4.1",  # Override model
    read_only=False,       # Override safety
    enable_statistics=True  # Add statistics
)

๐Ÿ“Š Performance

With all optimizations enabled:

Scenario Response Time Cache Hit
Simple query (cached) 0.5s โœ…
Simple query (uncached) 1.5s โŒ
Complex query (cached) 2s โœ…
Complex query (uncached) 5s โŒ
  • 90% of queries complete in < 3 seconds
  • First token appears in < 500ms with streaming
  • Cache hit rate typically > 60% in production

๐Ÿ”’ Security Features

  • โœ… Read-only mode by default (only SELECT queries)
  • โœ… SQL injection prevention with query parsing and validation
  • โœ… Table access control with allowlist/blocklist
  • โœ… Query timeout enforcement
  • โœ… Dangerous keyword detection (DROP, DELETE, etc.)
  • โœ… Input/output guardrails with OpenAI Agents SDK

๐Ÿ“š Documentation

Core Documentation

Examples

๐Ÿงช Development

Setup

# Clone repository
git clone https://github.com/Ocolus1/db-query-agent
cd db-query-agent

# Install dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run demo UI
streamlit run demo/streamlit_app.py

Learn More

๐Ÿ“ Project Structure

db-query-agent/
โ”œโ”€โ”€ src/db_query_agent/          # Main package source code
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ agent.py                 # DatabaseQueryAgent - main interface
โ”‚   โ”œโ”€โ”€ simple_multi_agent_system.py  # Multi-agent orchestration
โ”‚   โ”œโ”€โ”€ agents/                  # Agent implementations
โ”‚   โ”œโ”€โ”€ tools/                   # Agent tools
โ”‚   โ”œโ”€โ”€ schema_extractor.py      # Database schema introspection
โ”‚   โ”œโ”€โ”€ cache_manager.py         # Multi-layer caching system
โ”‚   โ”œโ”€โ”€ connection_manager.py    # Database connection pooling
โ”‚   โ”œโ”€โ”€ query_validator.py       # SQL validation and safety
โ”‚   โ”œโ”€โ”€ session_manager.py       # Conversation session management
โ”‚   โ”œโ”€โ”€ config.py                # Configuration classes
โ”‚   โ””โ”€โ”€ exceptions.py            # Custom exceptions
โ”‚
โ”œโ”€โ”€ docs/                        # Documentation
โ”‚   โ”œโ”€โ”€ API_REFERENCE.md         # Complete API documentation
โ”‚   โ”œโ”€โ”€ INTEGRATION_GUIDES.md    # Framework integration guides
โ”‚   โ”œโ”€โ”€ TROUBLESHOOTING.md       # Common issues & solutions
โ”‚   โ””โ”€โ”€ ARCHITECTURE.md          # System architecture
โ”‚
โ”œโ”€โ”€ examples/                    # Usage examples
โ”‚   โ”œโ”€โ”€ basic_usage.py           # 7 basic examples
โ”‚   โ”œโ”€โ”€ advanced_usage.py        # 8 advanced examples
โ”‚   โ””โ”€โ”€ README.md                # Examples documentation
โ”‚
โ”œโ”€โ”€ demo/                        # Demo application
โ”‚   โ”œโ”€โ”€ streamlit_app.py         # Interactive demo UI
โ”‚   โ””โ”€โ”€ create_demo_db.py        # Demo database setup
โ”‚
โ”œโ”€โ”€ tests/                       # Test suite (133 tests)
โ”‚   โ”œโ”€โ”€ test_agent_integration.py
โ”‚   โ”œโ”€โ”€ test_cache_manager.py
โ”‚   โ”œโ”€โ”€ test_dynamic_configuration.py
โ”‚   โ”œโ”€โ”€ test_streaming.py
โ”‚   โ””โ”€โ”€ ...
โ”‚
โ””โ”€โ”€ pyproject.toml               # Package configuration

๐Ÿค Contributing

Contributions are welcome! Here's how you can help:

  1. Report bugs - Open an issue with details about the problem
  2. Suggest features - Share your ideas for improvements
  3. Submit PRs - Fix bugs or add features
  4. Improve docs - Help make the documentation better
  5. Share feedback - Let us know how you're using the package

Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ“ง Support


Made with โค๏ธ for developers who want to query databases with natural language

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

db_query_agent-0.1.4.tar.gz (74.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

db_query_agent-0.1.4-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file db_query_agent-0.1.4.tar.gz.

File metadata

  • Download URL: db_query_agent-0.1.4.tar.gz
  • Upload date:
  • Size: 74.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for db_query_agent-0.1.4.tar.gz
Algorithm Hash digest
SHA256 c52e0c0fbf349068074386fb44e17955a7a8a1962d5545d33813bde929a1965d
MD5 efef6303614997c63dadaaf9eb7010b6
BLAKE2b-256 0e0835a1ce868cb1642a9ee3deb4a353b708efdc8af216049678d5743f0dfe90

See more details on using hashes here.

File details

Details for the file db_query_agent-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: db_query_agent-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for db_query_agent-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 67112e5002716d85635b9ba648d1ee31848f8a17d9b0c7f3daafbd6b8de5aec8
MD5 148bf48bb1af1e86afb26b77e3429a38
BLAKE2b-256 452c1c76d7bb11af27ce6765386a01bb7d2fd6d2923caeb15761600ae0a4284d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page