Skip to main content

AI-powered natural language database query system using OpenAI Agents SDK

Project description

DB Query Agent ๐Ÿค–๐Ÿ’ฌ

AI-powered natural language database query system using OpenAI Agents SDK

Python 3.10+ License: MIT

A powerful, production-ready Python package that lets you query databases using natural language. Built with OpenAI Agents SDK, featuring intelligent safety guardrails, streaming responses, and optimized for speed.

โœจ Features

  • ๐Ÿ—ฃ๏ธ Natural Language Queries - Ask questions in plain English, get SQL and results
  • โšก Blazing Fast - Streaming responses, adaptive model selection, multi-layer caching
  • ๐Ÿ”’ Production-Ready Safety - Read-only mode, SQL injection prevention, query validation
  • ๐Ÿ’ฌ Session Management - Maintains conversation context across multiple queries
  • ๐ŸŽฏ Smart Schema Loading - Only loads relevant tables for faster responses
  • ๐Ÿ”Œ Universal Database Support - PostgreSQL, MySQL, SQLite, SQL Server
  • ๐Ÿ“ฆ Easy Integration - Works with Django, Flask, FastAPI, or any Python app

๐Ÿš€ Quick Start

Installation

pip install db-query-agent

# With database-specific drivers
pip install db-query-agent[postgres]  # PostgreSQL
pip install db-query-agent[mysql]     # MySQL
pip install db-query-agent[all]       # All drivers

Basic Usage

Option 1: Load from .env (Recommended)

# Create .env file
DATABASE_URL=postgresql://user:pass@localhost/mydb
OPENAI_API_KEY=sk-...
FAST_MODEL=gpt-4o-mini
READ_ONLY=true
from db_query_agent import DatabaseQueryAgent

# Load everything from .env
agent = DatabaseQueryAgent.from_env()

# Or override specific values
agent = DatabaseQueryAgent.from_env(
    fast_model="gpt-4.1",
    enable_statistics=True
)

Option 2: Direct Configuration

from db_query_agent import DatabaseQueryAgent

# Pass all parameters directly
agent = DatabaseQueryAgent(
    database_url="postgresql://user:pass@localhost/mydb",
    openai_api_key="sk-...",
    fast_model="gpt-4o-mini",
    read_only=True,
    enable_cache=True
)

Query the Database

# Query in natural language (async)
result = await agent.query("How many users signed up last month?")

print(result["natural_response"])
# Output: "245 users signed up last month"

print(result["sql"])
# Output: "SELECT COUNT(*) FROM users WHERE created_at >= '2025-09-01'"

With Streaming (Recommended)

# Stream responses for better UX
async for chunk in agent.query_stream("Show me top 10 customers by revenue"):
    print(chunk, end="", flush=True)

Session-based Chat

# Create a session for multi-turn conversations
session = agent.create_session(session_id="user_123")

# First query
response1 = await session.ask("Show me all products")

# Follow-up query (maintains context)
response2 = await session.ask("Filter those by category=electronics")

# Another follow-up
response3 = await session.ask("Sort by price descending")

๐Ÿ”ง Utility Methods

Session Management

# List all active sessions
sessions = agent.list_sessions()

# Get conversation history
history = agent.get_session_history("user_123")

# Clear session history
agent.clear_session("user_123")

# Delete session
agent.delete_session("user_123")

Schema Exploration

# Get basic schema
schema = agent.get_schema()

# Get detailed schema with relationships
schema_info = agent.get_schema_info(include_foreign_keys=True)
print(f"Total tables: {schema_info['total_tables']}")
print(f"Relationships: {len(schema_info['relationships'])}")

Statistics and Monitoring

# Get comprehensive statistics
stats = agent.get_stats()

print(f"Total queries: {stats['total_queries']}")
print(f"Cache hit rate: {stats['cache_hits'] / stats['total_queries'] * 100:.1f}%")
print(f"Active connections: {stats['pool']['checked_out']}")
print(f"Total sessions: {stats['sessions']['total_sessions']}")

๐ŸŽฏ Framework Integration

Django

# views.py
from django.conf import settings
from db_query_agent import DatabaseQueryAgent

agent = DatabaseQueryAgent(
    database_url=settings.DATABASES['default']['URL'],
    openai_api_key=settings.OPENAI_API_KEY
)

def query_database(request):
    question = request.POST.get('question')
    result = agent.query(question)
    return JsonResponse(result)

FastAPI

# main.py
from fastapi import FastAPI
from db_query_agent import DatabaseQueryAgent

app = FastAPI()
agent = DatabaseQueryAgent(database_url=os.getenv("DATABASE_URL"))

@app.post("/query")
async def query_db(question: str):
    return agent.query(question)

Flask

# app.py
from flask import Flask, request
from db_query_agent import DatabaseQueryAgent

app = Flask(__name__)
agent = DatabaseQueryAgent(database_url=os.getenv("DATABASE_URL"))

@app.route('/query', methods=['POST'])
def query():
    return agent.query(request.json['question'])

โš™๏ธ Configuration

Environment Variables (Recommended)

Create a .env file with all configuration:

# Required
OPENAI_API_KEY=sk-your-api-key
DATABASE_URL=postgresql://user:pass@localhost/db

# Model Configuration
MODEL_STRATEGY=adaptive
FAST_MODEL=gpt-4o-mini
BALANCED_MODEL=gpt-4.1-mini
COMPLEX_MODEL=gpt-4.1

# Cache Configuration
CACHE_ENABLED=true
CACHE_BACKEND=memory
CACHE_SCHEMA_TTL=3600
CACHE_QUERY_TTL=300
CACHE_LLM_TTL=3600

# Safety Configuration
READ_ONLY=true
QUERY_TIMEOUT=30
MAX_RESULT_ROWS=10000

# Connection Configuration
DB_POOL_SIZE=10
DB_MAX_OVERFLOW=20

# Performance Configuration
LAZY_SCHEMA_LOADING=true
ENABLE_STREAMING=true
WARMUP_ON_INIT=false

Then load with a single line:

agent = DatabaseQueryAgent.from_env()

Direct Configuration

Pass parameters directly (overrides .env):

from db_query_agent import DatabaseQueryAgent

agent = DatabaseQueryAgent(
    database_url="postgresql://...",
    openai_api_key="sk-...",
    
    # Model configuration
    model_strategy="adaptive",  # Use fast model for simple queries
    fast_model="gpt-4o-mini",   # 2s generation time
    balanced_model="gpt-4.1-mini",  # 3s generation time
    complex_model="gpt-4.1",     # 5s generation time
    
    # Cache configuration
    enable_cache=True,
    cache_backend="redis",
    schema_cache_ttl=3600,  # 1 hour
    query_cache_ttl=300,    # 5 minutes
    llm_cache_ttl=3600,     # 1 hour
    
    # Safety configuration
    read_only=True,
    allowed_tables=["users", "orders", "products"],
    blocked_tables=["sensitive_data"],
    max_query_timeout=30,
    max_result_rows=10000,
    
    # Connection configuration
    pool_size=10,
    max_overflow=20,
    
    # Performance configuration
    lazy_schema_loading=True,
    max_tables_in_context=5,
    enable_streaming=True,
    warmup_on_init=False,
    
    # Statistics configuration
    enable_statistics=True,  # Track queries, cache hits, etc.
    
    # Session configuration
    session_backend="sqlite",
    session_db_path="./sessions.db"
)

Mixed Configuration

Load from .env and override specific values:

# Load most settings from .env, override specific ones
agent = DatabaseQueryAgent.from_env(
    fast_model="gpt-4.1",  # Override model
    read_only=False,       # Override safety
    enable_statistics=True  # Add statistics
)

๐Ÿ“Š Performance

With all optimizations enabled:

Scenario Response Time Cache Hit
Simple query (cached) 0.5s โœ…
Simple query (uncached) 1.5s โŒ
Complex query (cached) 2s โœ…
Complex query (uncached) 5s โŒ
  • 90% of queries complete in < 3 seconds
  • First token appears in < 500ms with streaming
  • Cache hit rate typically > 60% in production

๐Ÿ”’ Security Features

  • โœ… Read-only mode by default (only SELECT queries)
  • โœ… SQL injection prevention with query parsing and validation
  • โœ… Table access control with allowlist/blocklist
  • โœ… Query timeout enforcement
  • โœ… Dangerous keyword detection (DROP, DELETE, etc.)
  • โœ… Input/output guardrails with OpenAI Agents SDK

๐Ÿ“š Documentation

Core Documentation

Examples

Additional Resources

๐Ÿงช Development

Setup

# Clone repository
git clone https://github.com/yourusername/db-query-agent
cd db-query-agent

# Install dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run demo UI
streamlit run demo/streamlit_app.py

Learn More


### Project Structure

db-query-agent/ โ”œโ”€โ”€ src/db_query_agent/ โ”‚ โ”œโ”€โ”€ init.py โ”‚ โ”œโ”€โ”€ agent.py # Main agent class โ”‚ โ”œโ”€โ”€ simple_multi_agent_system.py # Multi-agent orchestration โ”‚ โ”œโ”€โ”€ schema_extractor.py # Schema introspection โ”‚ โ”œโ”€โ”€ cache_manager.py # Multi-layer caching โ”‚ โ”œโ”€โ”€ connection_manager.py # DB connection pooling โ”‚ โ”œโ”€โ”€ query_validator.py # SQL validation โ”‚ โ”œโ”€โ”€ session_manager.py # Session management โ”‚ โ”œโ”€โ”€ config.py # Configuration classes โ”‚ โ””โ”€โ”€ exceptions.py # Custom exceptions โ”œโ”€โ”€ docs/ โ”‚ โ”œโ”€โ”€ API_REFERENCE.md # Complete API documentation โ”‚ โ”œโ”€โ”€ INTEGRATION_GUIDES.md # Framework integration guides โ”‚ โ”œโ”€โ”€ TROUBLESHOOTING.md # Common issues & solutions โ”‚ โ””โ”€โ”€ ARCHITECTURE.md # System architecture โ”œโ”€โ”€ examples/ โ”‚ โ”œโ”€โ”€ basic_usage.py # 7 basic examples โ”‚ โ”œโ”€โ”€ advanced_usage.py # 8 advanced examples โ”‚ โ””โ”€โ”€ README.md # Examples documentation โ”œโ”€โ”€ demo/ โ”‚ โ”œโ”€โ”€ streamlit_app.py # Interactive demo UI โ”‚ โ””โ”€โ”€ create_demo_db.py # Demo database setup โ”œโ”€โ”€ tests/ # 133 tests (all passing) โ”‚ โ”œโ”€โ”€ test_dynamic_configuration.py โ”‚ โ”œโ”€โ”€ test_streaming.py โ”‚ โ”œโ”€โ”€ test_utility_methods.py โ”‚ โ””โ”€โ”€ test_phase4_integration.py โ””โ”€โ”€ pyproject.toml # Package configuration


## ๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## ๐Ÿ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## ๐Ÿ™ Acknowledgments

- Built with [OpenAI Agents SDK](https://github.com/openai/openai-agents-python)
- Database abstraction by [SQLAlchemy](https://www.sqlalchemy.org/)
- SQL parsing by [sqlparse](https://github.com/andialbrecht/sqlparse)

## ๐Ÿ“ง Support

- ๐Ÿ“– [Documentation](https://github.com/yourusername/db-query-agent#readme)
- ๐Ÿ› [Issue Tracker](https://github.com/yourusername/db-query-agent/issues)
- ๐Ÿ’ฌ [Discussions](https://github.com/yourusername/db-query-agent/discussions)

---

**Made with โค๏ธ for developers who want to query databases with natural language**

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

db_query_agent-0.1.0.tar.gz (81.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

db_query_agent-0.1.0-py3-none-any.whl (30.9 kB view details)

Uploaded Python 3

File details

Details for the file db_query_agent-0.1.0.tar.gz.

File metadata

  • Download URL: db_query_agent-0.1.0.tar.gz
  • Upload date:
  • Size: 81.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for db_query_agent-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6f10d09dca6391fa1876092230ba38180ce9d16b094bf366b5dfbf89ee166a2e
MD5 4f0d82db2c64c81453a82363a62f0cfa
BLAKE2b-256 9938cdbd6a3ed2054598672af0146d9b174bdec1660b8eb0fdf7ff65a31e2bcc

See more details on using hashes here.

File details

Details for the file db_query_agent-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: db_query_agent-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 30.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for db_query_agent-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e7a12944c91cdd4812aa219b78be4488bfc2b817fdddd9273d12d252b9fd7546
MD5 7c5d84ebdcd57f4c52337d9375785018
BLAKE2b-256 fe55c3d674bde7d8d82d34acb8eca8efa698487da7f5cd202726ef6bbeae0e9f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page