Skip to main content

A FastMCP server for generating natural language to SQL templates from database schemas.

Project description

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program, Google Cloud Platform/SecOps Terms of Service, How Gemini for Google Cloud uses your data. This tool is provided "as is" without warranty of any kind. Users are solely responsible for understanding and managing the tool's interaction with their databases. Use of this tool constitutes acceptance of all risks associated with database access, reading, usage, and modifications.

Context Engineering Agent

[!IMPORTANT] Repository Reorganization The original Go-based DB Schema Enricher CLI has been archived and moved to the legacy/go-enricher/ directory. This repository root is now dedicated to the Python-based Context Engineering Agent (MCP Server).

This project is a Gemini CLI extension that bridges the gap between Large Language Models (LLMs) and structured databases by generating and managing tailored context. This context helps the LLM understand database schema, business logic, and terminology, enabling more accurate Natural Language to SQL generation.

Core Concepts

A ContextSet is the central artifact generated by the Context Engineering Agent, containing structured knowledge in three primary forms:

  • Templates: End-to-end mappings linking a natural language query to a complete, runnable SQL query. They teach the system overarching operational logic, table join infrastructures, and broad business rules.
    • Full Structure Example:
      {
        "nl_query": "How many accounts are in London?",
        "sql": "SELECT count(*) FROM account WHERE account.city = 'London'",
        "intent": "How many accounts are in London?",
        "manifest": "How many accounts are in a given city?",
        "parameterized": {
          "parameterized_sql": "SELECT count(*) FROM account WHERE account.city = $1",
          "parameterized_intent": "How many accounts are in $1?"
        }
      }
      
  • Facets: Reusable, modular SQL fragments (like a WHERE clause or specialized join) linked to specific vocabulary or terminology. They are not standalone queries but dynamically injected filters.
    • Full Structure Example:
      {
        "sql_snippet": "products.rating > 4.5",
        "intent": "highly rated products (above 4.5)",
        "manifest": "highly rated products (above a given number)",
        "parameterized": {
          "parameterized_sql_snippet": "products.rating > $1",
          "parameterized_intent": "highly rated products (above $1)"
        }
      }
      
  • Value Searches: Specialized queries used when a value in the natural language query (e.g., "Lndn") does not perfectly match the stored value in the database ("London"). They employ mapping functions (like fuzzy trigram matching or semantic similarity) to find candidate values and their distance from the search term.
    • Full Structure Example (Conceptual Fuzzy Match):
      {
        "concept_type": "City",
        "query": "SELECT T.\"location\" AS value, 'users.location' AS columns, 'City' AS concept_type, fuzzy_distance(T.\"location\", $value) AS distance FROM \"users\" T WHERE fuzzy_match(T.\"location\", $value)",
        "description": "Fuzzy match for city in location column"
      }
      

Prerequisites

Before you begin, ensure you have completed the following setup:

1. Required Services & Access

  • Enable the Data Analytics API with Gemini, Gemini for Google Cloud API, and Dataplex Universal Catalog API in your Google Cloud project.
  • Ensure your IAM user or service account has the necessary roles for database access (appropriate for your database engine), serviceusage.serviceUsageConsumer, and geminidataanalytics.queryDataUser.
  • Ensure the database instance has executesql permission enabled (Data API allowed).

2. Gemini CLI

3. Google Cloud Authentication

  • Configure Application Default Credentials (ADC) by running:
    gcloud auth application-default login
    

Installation

To install the Context Engineering Agent via the Gemini CLI:

gemini extensions install https://github.com/GoogleCloudPlatform/db-context-enrichment

Note: The extension requires a Gemini API key at installation or via environment variable GEMINI_API_KEY.

Optional: VSCode Integration

For an enhanced editing and diffing experience when reviewing context changes:

  1. Install VSCode.
  2. Install the Gemini CLI Companion extension from the VSCode Marketplace.
  3. Open your workspace folder in VSCode, open the integrated terminal, and run gemini. You can verify the IDE extension is active by running /ide status.

Workflows

The extension is designed to support the Critical User Journeys (CUJs) for context engineering, following an iterative optimization loop:

  1. Bootstrap: Generate an initial baseline context.
  2. Evaluate: Measure context effectiveness against a golden dataset.
  3. Hill-Climbing: Perform gap analysis on failures and generate automated fixes.
  4. Iterate: Apply the improved context and re-run evaluation to continuously improve quality.
  5. Final Validation (Optional): Verify mutations against a separated test set to ensure generalization and prevent overfitting.

Automated Iterative Optimization (Autoctx)

The extension automates this loop via the following commands. Start the Gemini CLI by running gemini in your workspace folder:

  1. Initialize (/autoctx:init): Sets up your local workspace by creating an autoctx/ directory. It checks for the presence of a valid tools.yaml configuration inside it. If missing, the agent will interactively prompt you for your database connection details and generate the file for you. It also creates the state.md file to track experiment progress and an experiments/ directory, all within autoctx/.
  2. Generate Dataset (/autoctx:generate-dataset): Rapidly creates or expands a baseline of evaluation questions (golden dataset). It asks you for sample queries or descriptions of what users might ask, and generates a JSON file with Natural Language Queries (NLQs) and Golden SQL statements.
  3. Bootstrap (/autoctx:bootstrap): Generates an initial context set. It performs progressive schema discovery to understand your database structure and qualified tables. It then generates starting Templates and Facets based on the schema and any user-provided documentation or sample queries.
  4. Evaluate (/autoctx:evaluate): Measures context effectiveness against your golden dataset. This step automatically generates all necessary Evalbench configuration files (db_config.yaml, model_config.yaml, run_config.yaml, llmrater_config.yaml) inside the experiment folder and runs the evaluation pipeline to produce accuracy scores and identify failure cases.
  5. Hill-Climb (/autoctx:hillclimb): Performs gap analysis on failures identified in the evaluation step. It reads the failure cases, determines why the LLM failed to generate correct SQL, and proposes updates or new additions to the context set (Templates or Facets) to improve performance in the next iteration.

Targeted Manual Generation

These are more basic workflows for context engineering to manually author specific context elements:

  • Generate Templates (/generate_targeted_templates): Initiates a guided workflow where you provide a sample question and SQL, and the agent helps you parameterize and save it as a template.
  • Generate Facets (/generate_targeted_facets): Guides you to define a specific intent and the corresponding SQL snippet (e.g., filter condition) to save as a facet.
  • Generate Value Searches (/generate_targeted_value_searches): Helps you configure how the system searches for and matches specific values within a concept type (e.g., setting up exact match or trigram fuzzy search for product names).

Development and Testing

To develop a new feature and test it with the correct dependencies and full bundling (including Evalbench and Toolbox binaries), it is recommended to create a custom release in your fork.

1. Integration Testing (Fork & Release Workflow)

  1. Create a fork of the repository on GitHub.
  2. Set up the fork/upstream relationship in your local environment:
    git clone https://github.com/YOUR-USERNAME/db-context-enrichment.git
    git remote add upstream https://github.com/GoogleCloudPlatform/db-context-enrichment
    
  3. Develop changes in a new branch in your local repo and push them to your fork.
  4. Create a new release for your fork on GitHub (e.g., using tag 0.0.1-test).
  5. Wait for the release assets to be generated by GitHub Actions in your fork.
  6. Install the custom release via Gemini CLI:
    gemini extensions install https://github.com/YOUR-USERNAME/db-context-enrichment --ref 0.0.1-test
    
    Note: Use gemini extensions uninstall google-cloud-db-context-engineering to remove previous installations if needed.

2. Official Release Pipeline

  • Releases are versioned and prepared automatically by the Release Please GitHub App.
  • When functional PRs are merged, Release Please opens/updates a pending Release PR (bumping the extension version and updating the changelog).
  • Merging the Release PR signals Release Please to tag the commit and create an official GitHub Release.
  • The creation of the GitHub Release triggers the .github/workflows/release.yml pipeline.
  • The pipeline uses PyInstaller to build standalone binary executables for Linux (x64), macOS (arm64), and Windows (x64).
  • The pipeline packages the binary, LICENSE, GEMINI.md, and dynamically updates gemini-extension.json into .tar.gz and .zip archives.
  • These archives are automatically attached back to the GitHub release as downloadable assets.
  • Users receive the update the next time they install or upgrade the extension via Gemini CLI (gemini extensions update --all).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google_cloud_db_context_engineering-0.5.1.tar.gz (38.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file google_cloud_db_context_engineering-0.5.1.tar.gz.

File metadata

File hashes

Hashes for google_cloud_db_context_engineering-0.5.1.tar.gz
Algorithm Hash digest
SHA256 5dc41fb9a035097416129a938deb60f0c6151caf1332c4cb359c633566dded7c
MD5 816d25d8104793f26509dffb01323451
BLAKE2b-256 a2877acf9f574ff0fa9fd6977e32c5fc5c594fbd494fa46a62431338856be450

See more details on using hashes here.

Provenance

The following attestation bundles were made for google_cloud_db_context_engineering-0.5.1.tar.gz:

Publisher: google-cloud-sdk-py@oss-exit-gate-prod.iam.gserviceaccount.com

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.
  • Statement: Publication detail:
    • Token Issuer: https://accounts.google.com
    • Service Account: google-cloud-sdk-py@oss-exit-gate-prod.iam.gserviceaccount.com

File details

Details for the file google_cloud_db_context_engineering-0.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for google_cloud_db_context_engineering-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 01afa5d775a1022e911c1ef148ac7f04a9ce6c641eaef1172fbf1c7e7c26d8fb
MD5 778d873f06ead02158445d4f1dc70100
BLAKE2b-256 fedb95a79662352eb42ee41b263abd511184cba1f4e6355fd2baabf7c3859c23

See more details on using hashes here.

Provenance

The following attestation bundles were made for google_cloud_db_context_engineering-0.5.1-py3-none-any.whl:

Publisher: google-cloud-sdk-py@oss-exit-gate-prod.iam.gserviceaccount.com

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.
  • Statement: Publication detail:
    • Token Issuer: https://accounts.google.com
    • Service Account: google-cloud-sdk-py@oss-exit-gate-prod.iam.gserviceaccount.com

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page