Skip to main content

Enterprise-ready Natural Language to SQL converter with multi-provider support. Built for production scale (1000+ tables) with Clean Architecture.

Project description

nlp2sql

License: MIT Python 3.9+ Code style: black

Enterprise-ready Natural Language to SQL converter with multi-provider support

A powerful Python library for converting natural language queries to optimized SQL using multiple AI providers. Built with Clean Architecture principles for enterprise-scale applications handling 1000+ table databases.

๐Ÿš€ Why nlp2sql?

Unlike academic frameworks focused on composability, nlp2sql is built for enterprise production environments from day one:

  • ๐Ÿข Enterprise Scale: Handle databases with 1000+ tables efficiently
  • ๐Ÿค– Multi-Provider Native: OpenAI, Anthropic, Gemini support - no vendor lock-in
  • โšก Production Ready: Advanced caching, async support, schema optimization
  • ๐Ÿ› ๏ธ Developer First: Professional CLI, Docker setup, automated installation
  • ๐Ÿ—๏ธ Clean Architecture: Maintainable, testable, extensible codebase
  • ๐Ÿ“Š Performance Focused: Benchmarking, schema filtering, vector embeddings

โœจ Features

  • ๐Ÿค– Multiple AI Providers: OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI
  • ๐Ÿ—„๏ธ Database Support: PostgreSQL (with MySQL, SQLite, Oracle, MSSQL coming soon)
  • ๐Ÿ“Š Large Schema Handling: Advanced strategies for databases with 1000+ tables
  • โšก Smart Caching: Intelligent result caching for improved performance
  • ๐Ÿ” Query Optimization: Built-in SQL query optimization
  • ๐Ÿง  Schema Analysis: AI-powered relevance scoring and schema compression
  • ๐Ÿ” Vector Embeddings: Semantic search for schema elements
  • ๐Ÿ“ˆ Token Management: Efficient token usage across different providers
  • โšก Async Support: Full async/await support for better performance
  • ๐Ÿ—๏ธ Clean Architecture: Ports & Adapters pattern for maintainability

๐Ÿš€ Quick Start

Installation

# Install with UV (recommended)
uv add nlp2sql

# Or with pip
pip install nlp2sql

# Release candidate with latest features (multi-provider support)
pip install nlp2sql==0.2.0rc1

# With specific providers
pip install nlp2sql[anthropic,gemini]  # Multiple providers
pip install nlp2sql[all-providers]     # All providers

One-Line Usage (Simplest)

import asyncio
import os
from nlp2sql import generate_sql_from_db

async def main():
    # Automatic provider detection
    providers = [
        {"name": "openai", "key": os.getenv("OPENAI_API_KEY")},
        {"name": "anthropic", "key": os.getenv("ANTHROPIC_API_KEY")},
        {"name": "gemini", "key": os.getenv("GOOGLE_API_KEY")}
    ]
    
    # Use first available provider
    selected = next((p for p in providers if p["key"]), None)
    if not selected:
        raise ValueError("No API key found. Set OPENAI_API_KEY, ANTHROPIC_API_KEY, or GOOGLE_API_KEY")
    
    result = await generate_sql_from_db(
        database_url="postgresql://testuser:testpass@localhost:5432/testdb",
        question="Show me all active users",
        ai_provider=selected["name"],
        api_key=selected["key"]
    )
    print(result['sql'])

asyncio.run(main())

Pre-Initialized Service (Better Performance)

import asyncio
import os
from nlp2sql import create_and_initialize_service

async def main():
    # Smart provider detection
    api_key = os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY") or os.getenv("GOOGLE_API_KEY")
    provider = "openai" if os.getenv("OPENAI_API_KEY") else \
               "anthropic" if os.getenv("ANTHROPIC_API_KEY") else "gemini"
    
    # Initialize once with Docker test database
    service = await create_and_initialize_service(
        database_url="postgresql://testuser:testpass@localhost:5432/testdb",
        ai_provider=provider,
        api_key=api_key
    )
    
    # Use multiple times
    result1 = await service.generate_sql("Count total users")
    result2 = await service.generate_sql("Find inactive accounts")
    result3 = await service.generate_sql("Show user registration trends")
    
    print(f"Using {provider} provider")
    for i, result in enumerate([result1, result2, result3], 1):
        print(f"Query {i}: {result['sql']}")

asyncio.run(main())

Manual Service Creation (Full Control)

import asyncio
import os
from nlp2sql import create_query_service, DatabaseType

async def main():
    # Create service with schema filtering for large databases
    service = create_query_service(
        database_url="postgresql://demo:demo123@localhost:5433/enterprise",
        ai_provider="anthropic",  # Good for large schemas
        api_key=os.getenv("ANTHROPIC_API_KEY"),
        schema_filters={
            "include_schemas": ["sales", "finance"],
            "exclude_system_tables": True
        }
    )
    
    # Initialize (loads schema automatically)
    await service.initialize(DatabaseType.POSTGRES)
    
    # Generate SQL
    result = await service.generate_sql(
        question="Show revenue by month for the sales team",
        database_type=DatabaseType.POSTGRES
    )
    
    print(f"SQL: {result['sql']}")
    print(f"Confidence: {result['confidence']}")
    print(f"Explanation: {result['explanation']}")
    print(f"Valid: {result['validation']['is_valid']}")

asyncio.run(main())

๐Ÿค– Multiple AI Providers Support

nlp2sql supports multiple AI providers - you're not locked into OpenAI!

Supported Providers

# OpenAI GPT-4 (default)
service = await create_and_initialize_service(
    database_url="postgresql://testuser:testpass@localhost:5432/testdb",
    ai_provider="openai",
    api_key="your-openai-key"
)

# Anthropic Claude
service = await create_and_initialize_service(
    database_url="postgresql://testuser:testpass@localhost:5432/testdb", 
    ai_provider="anthropic",
    api_key="your-anthropic-key"
)

# Google Gemini
service = await create_and_initialize_service(
    database_url="postgresql://testuser:testpass@localhost:5432/testdb",
    ai_provider="gemini", 
    api_key="your-google-key"
)

Provider Comparison

Provider Context Size Cost/1K tokens Best For
OpenAI GPT-4 128K $0.030 Complex reasoning
Anthropic Claude 200K $0.015 Large schemas
Google Gemini 1M $0.001 High volume/cost

๐Ÿ“Š Large Schema Support

For databases with 1000+ tables, use schema filters:

# Basic filtering
filters = {
    "exclude_system_tables": True,
    "exclude_tables": ["audit_log", "temp_data", "migration_history"]
}

service = await create_and_initialize_service(
    database_url="postgresql://demo:demo123@localhost:5433/enterprise",
    api_key="your-api-key",
    schema_filters=filters
)

# Business domain filtering
business_filters = {
    "include_tables": [
        "users", "customers", "orders", "products",
        "invoices", "payments", "addresses"
    ],
    "exclude_system_tables": True
}

# Multi-schema filtering for enterprise databases
enterprise_filters = {
    "include_schemas": ["sales", "hr", "finance"],
    "exclude_schemas": ["archive", "temp"],
    "include_tables": ["customers", "orders", "employees", "transactions"],
    "exclude_tables": ["audit_logs", "system_logs"],
    "exclude_system_tables": True
}

๐Ÿ—๏ธ Architecture

nlp2sql follows Clean Architecture principles with clear separation of concerns:

nlp2sql/
โ”œโ”€โ”€ core/           # Business entities and domain logic
โ”œโ”€โ”€ ports/          # Interfaces/abstractions
โ”œโ”€โ”€ adapters/       # External service implementations
โ”œโ”€โ”€ services/       # Application services
โ”œโ”€โ”€ schema/         # Schema management strategies
โ”œโ”€โ”€ config/         # Configuration management
โ””โ”€โ”€ exceptions/     # Custom exceptions

Configuration

Environment Variables

# AI Provider API Keys (at least one required)
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export GOOGLE_API_KEY="your-google-key"  # Note: GOOGLE_API_KEY, not GEMINI_API_KEY

# Database (Docker test databases)
export DATABASE_URL="postgresql://testuser:testpass@localhost:5432/testdb"  # Simple DB
# export DATABASE_URL="postgresql://demo:demo123@localhost:5433/enterprise"  # Large DB

# Optional Settings
export NLP2SQL_MAX_SCHEMA_TOKENS=8000
export NLP2SQL_CACHE_ENABLED=true
export NLP2SQL_LOG_LEVEL=INFO

Development

Setup Development Environment

# Clone repository
git clone https://github.com/luiscarbonel1991/nlp2sql.git
cd nlp2sql

# Install dependencies
uv sync

# Setup Docker test databases
cd docker
docker-compose up -d
cd ..

# Test CLI with Docker database
export OPENAI_API_KEY=your-key
uv run nlp2sql query \
  --database-url "postgresql://testuser:testpass@localhost:5432/testdb" \
  --question "How many users are there?" \
  --provider openai

# Run tests
uv run pytest

# Format code
uv run ruff format .

# Lint code
uv run ruff check .

# Type checking
uv run mypy src/

๐Ÿข Enterprise Use Cases

Data Analytics Teams

  • Large Schema Navigation: Query enterprise databases with 1000+ tables
  • Multi-Tenant Support: Schema filtering for different business units
  • Performance Optimization: Intelligent caching and query optimization

DevOps & Platform Teams

  • Multi-Provider Strategy: Avoid vendor lock-in, optimize costs
  • Infrastructure as Code: Docker setup, automated deployment
  • Monitoring & Benchmarking: Performance tracking across providers

Business Intelligence

  • Self-Service Analytics: Non-technical users query databases naturally
  • Audit & Compliance: Explainable queries with confidence scoring
  • Cost Management: Provider comparison and optimization

๐Ÿ“Š Performance & Scale

Metric nlp2sql Typical Framework
Max Tables Supported 1000+ ~100
AI Providers 3+ (OpenAI, Anthropic, Gemini) Usually 1
Query Cache โœ… Advanced โŒ Basic/None
Schema Optimization โœ… Vector embeddings โŒ Manual
Enterprise CLI โœ… Professional โŒ Basic/None
Docker Setup โœ… Production-ready โŒ Manual

๐Ÿ”„ Migration from Other Frameworks

Coming from other NLP-to-SQL frameworks? nlp2sql provides:

  • Drop-in replacement for most common patterns
  • Enhanced performance with minimal code changes
  • Additional features without breaking existing workflows

See our Migration Guide for framework-specific instructions.

๐Ÿค Contributing

We welcome contributions! This project follows enterprise development practices:

  • Clean Architecture patterns
  • Comprehensive testing
  • Type safety with mypy
  • Code formatting with black/ruff

See CONTRIBUTING.md for guidelines.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ‘จโ€๐Ÿ’ป Author & Maintainer

Luis Carbonel - Initial work and ongoing development

Built with enterprise needs in mind, refined through real-world production use cases.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp2sql-0.2.0rc1.tar.gz (311.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlp2sql-0.2.0rc1-py3-none-any.whl (51.2 kB view details)

Uploaded Python 3

File details

Details for the file nlp2sql-0.2.0rc1.tar.gz.

File metadata

  • Download URL: nlp2sql-0.2.0rc1.tar.gz
  • Upload date:
  • Size: 311.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for nlp2sql-0.2.0rc1.tar.gz
Algorithm Hash digest
SHA256 93c4895f51232c2348885fb775f736f39298b6e01315af50d922caaa489a7498
MD5 fec20857e6b34f8f3c955b2f13ab36e8
BLAKE2b-256 3148859981448b1f209bd31ea62ba00ef2d55e5731691ce02cfed575d16f9ede

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlp2sql-0.2.0rc1.tar.gz:

Publisher: publish-pypi.yml on luiscarbonel1991/nlp2sql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nlp2sql-0.2.0rc1-py3-none-any.whl.

File metadata

  • Download URL: nlp2sql-0.2.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 51.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for nlp2sql-0.2.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 68323fc628a3ad826e86cbe03121ce8b884097866ff5f95b0a81aeae79e3dcc1
MD5 1dc7d876b576e6d6873fdc79ec1d3507
BLAKE2b-256 c6f1ea9c69fdda31824248fd85d3e0c99a3061686cb860c1bf569022fd60adfa

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlp2sql-0.2.0rc1-py3-none-any.whl:

Publisher: publish-pypi.yml on luiscarbonel1991/nlp2sql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page