Skip to main content

Enterprise-ready Natural Language to SQL converter with multi-provider support. Built for production scale (1000+ tables) with Clean Architecture.

Project description

nlp2sql

License: MIT Python 3.9+ Code style: black

Enterprise-ready Natural Language to SQL converter with multi-provider support

Convert natural language queries to optimized SQL using multiple AI providers. Built with Clean Architecture principles for enterprise-scale applications handling 1000+ table databases.

Features

  • Multiple AI Providers: OpenAI, Anthropic Claude, Google Gemini - no vendor lock-in
  • Database Support: PostgreSQL, Amazon Redshift
  • Large Schema Handling: Vector embeddings and intelligent filtering for 1000+ tables
  • Smart Caching: Query and schema embedding caching for improved performance
  • Async Support: Full async/await support
  • Clean Architecture: Ports & Adapters pattern for maintainability

Documentation

Document Description
Architecture Component diagram and data flow
API Reference Python API and CLI command reference
Configuration Environment variables and schema filters
Enterprise Guide Large-scale deployment and migration
Redshift Support Amazon Redshift setup and examples
Contributing Contribution guidelines

Installation

# With UV (recommended)
uv add nlp2sql

# With pip
pip install nlp2sql

# With specific providers
pip install nlp2sql[anthropic,gemini]
pip install nlp2sql[all-providers]

# With embeddings
pip install nlp2sql[embeddings-local]   # Local embeddings (free)
pip install nlp2sql[embeddings-openai]  # OpenAI embeddings

Quick Start

1. Set Environment Variables

# At least one AI provider key required
export OPENAI_API_KEY="your-openai-key"
# export ANTHROPIC_API_KEY="your-anthropic-key"
# export GOOGLE_API_KEY="your-google-key"

2. One-Line Usage

import asyncio
import os
from nlp2sql import generate_sql_from_db

async def main():
    result = await generate_sql_from_db(
        database_url="postgresql://user:pass@localhost:5432/mydb",
        question="Show me all active users",
        ai_provider="openai",
        api_key=os.getenv("OPENAI_API_KEY")
    )
    print(result['sql'])
    print(f"Confidence: {result['confidence']}")

asyncio.run(main())

3. Pre-Initialized Service (Better Performance)

from nlp2sql import create_and_initialize_service

async def main():
    # Initialize once
    service = await create_and_initialize_service(
        database_url="postgresql://user:pass@localhost:5432/mydb",
        ai_provider="openai",
        api_key=os.getenv("OPENAI_API_KEY")
    )

    # Use multiple times
    result1 = await service.generate_sql("Count total users")
    result2 = await service.generate_sql("Show recent orders")

4. Large Database with Schema Filtering

from nlp2sql import create_and_initialize_service

service = await create_and_initialize_service(
    database_url="postgresql://localhost/enterprise",
    ai_provider="anthropic",  # Best for large schemas (200K context)
    api_key=os.getenv("ANTHROPIC_API_KEY"),
    schema_filters={
        "include_schemas": ["sales", "finance"],
        "exclude_system_tables": True
    }
)

result = await service.generate_sql("Show revenue by month")

5. CLI Usage

# Generate SQL
nlp2sql query \
  --database-url postgresql://user:pass@localhost:5432/mydb \
  --question "Show all active users" \
  --explain

# Inspect schema
nlp2sql inspect --database-url postgresql://localhost/mydb

# Benchmark providers
nlp2sql benchmark --database-url postgresql://localhost/mydb

Provider Comparison

Provider Context Size Best For
OpenAI GPT-4 128K Complex reasoning
Anthropic Claude 200K Large schemas
Google Gemini 1M High volume, cost efficiency

See Configuration for detailed provider setup.

Architecture

nlp2sql/
├── core/           # Business entities
├── ports/          # Interfaces/abstractions
├── adapters/       # External implementations (AI providers, databases)
├── services/       # Application services
├── schema/         # Schema management and embeddings
├── config/         # Configuration
└── exceptions/     # Custom exceptions

Development

# Clone and install
git clone https://github.com/luiscarbonel1991/nlp2sql.git
cd nlp2sql
uv sync

# Start test databases
cd docker && docker-compose up -d

# Run tests
uv run pytest

# Code quality
uv run ruff format .
uv run ruff check .
uv run mypy src/

MCP Server

nlp2sql includes a Model Context Protocol server for AI assistant integration.

{
  "mcpServers": {
    "nlp2sql": {
      "command": "python",
      "args": ["/path/to/nlp2sql/mcp_server/server.py"],
      "env": {
        "OPENAI_API_KEY": "${OPENAI_API_KEY}",
        "NLP2SQL_DEFAULT_DB_URL": "postgresql://user:pass@localhost:5432/mydb"
      }
    }
  }
}

Tools: ask_database, explore_schema, run_sql, list_databases, explain_sql

See mcp_server/README.md for complete setup.

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE.

Author

Luis Carbonel - @luiscarbonel1991

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp2sql-0.2.0rc5.tar.gz (384.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlp2sql-0.2.0rc5-py3-none-any.whl (81.0 kB view details)

Uploaded Python 3

File details

Details for the file nlp2sql-0.2.0rc5.tar.gz.

File metadata

  • Download URL: nlp2sql-0.2.0rc5.tar.gz
  • Upload date:
  • Size: 384.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nlp2sql-0.2.0rc5.tar.gz
Algorithm Hash digest
SHA256 02bbe8181b1bef0d81158882b8fcab79f7788645eca7d0378856ae3d84ebd9e7
MD5 1be1db94db89aea708191f0cb4091240
BLAKE2b-256 d42c673ebdc80a6c9a3c875a3251409de032b2ddcc302ccfda9e717dd4320556

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlp2sql-0.2.0rc5.tar.gz:

Publisher: publish-pypi.yml on luiscarbonel1991/nlp2sql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nlp2sql-0.2.0rc5-py3-none-any.whl.

File metadata

  • Download URL: nlp2sql-0.2.0rc5-py3-none-any.whl
  • Upload date:
  • Size: 81.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nlp2sql-0.2.0rc5-py3-none-any.whl
Algorithm Hash digest
SHA256 3eab299ff79b3a7b9c90d1398f2b92be0634e7e551412aca671806cfda20f89a
MD5 12b767b4b496bb3e8668d072a358474e
BLAKE2b-256 dc269bbf6e71fd5ee9d3773a3c20e1619e263ab0e92d7b713bdc12c6ffe94c9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for nlp2sql-0.2.0rc5-py3-none-any.whl:

Publisher: publish-pypi.yml on luiscarbonel1991/nlp2sql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page