A FastMCP server for generating natural language to SQL templates from database schemas.
Project description
This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program, Google Cloud Platform/SecOps Terms of Service, How Gemini for Google Cloud uses your data. This tool is provided "as is" without warranty of any kind. Users are solely responsible for understanding and managing the tool's interaction with their databases. Use of this tool constitutes acceptance of all risks associated with database access, reading, usage, and modifications.
Context Engineering Agent
[!IMPORTANT] Repository Reorganization The original Go-based DB Schema Enricher CLI has been archived and moved to the legacy/go-enricher/ directory. This repository root is now dedicated to the Python-based Context Engineering Agent (MCP Server).
This project is a Gemini CLI extension that bridges the gap between Large Language Models (LLMs) and structured databases by generating and managing tailored context. This context helps the LLM understand database schema, business logic, and terminology, enabling more accurate Natural Language to SQL generation.
Core Concepts
A ContextSet is the central artifact generated by the Context Engineering Agent, containing structured knowledge in three primary forms:
- Templates: End-to-end mappings linking a natural language query to a complete, runnable SQL query. They teach the system overarching operational logic, table join infrastructures, and broad business rules.
- Full Structure Example:
{ "nl_query": "How many accounts are in London?", "sql": "SELECT count(*) FROM account WHERE account.city = 'London'", "intent": "How many accounts are in London?", "manifest": "How many accounts are in a given city?", "parameterized": { "parameterized_sql": "SELECT count(*) FROM account WHERE account.city = $1", "parameterized_intent": "How many accounts are in $1?" } }
- Full Structure Example:
- Facets: Reusable, modular SQL fragments (like a
WHEREclause or specialized join) linked to specific vocabulary or terminology. They are not standalone queries but dynamically injected filters.- Full Structure Example:
{ "sql_snippet": "products.rating > 4.5", "intent": "highly rated products (above 4.5)", "manifest": "highly rated products (above a given number)", "parameterized": { "parameterized_sql_snippet": "products.rating > $1", "parameterized_intent": "highly rated products (above $1)" } }
- Full Structure Example:
- Value Searches: Specialized queries used when a value in the natural language query (e.g., "Lndn") does not perfectly match the stored value in the database ("London"). They employ mapping functions (like fuzzy trigram matching or semantic similarity) to find candidate values and their distance from the search term.
- Full Structure Example (Conceptual Fuzzy Match):
{ "concept_type": "City", "query": "SELECT T.\"location\" AS value, 'users.location' AS columns, 'City' AS concept_type, fuzzy_distance(T.\"location\", $value) AS distance FROM \"users\" T WHERE fuzzy_match(T.\"location\", $value)", "description": "Fuzzy match for city in location column" }
- Full Structure Example (Conceptual Fuzzy Match):
Prerequisites
Before you begin, ensure you have completed the following setup:
1. Required Services & Access
- Enable the Data Analytics API with Gemini, Gemini for Google Cloud API, and Dataplex Universal Catalog API in your Google Cloud project.
- Ensure your IAM user or service account has the necessary roles for database access (appropriate for your database engine),
serviceusage.serviceUsageConsumer, andgeminidataanalytics.queryDataUser. - Ensure the database instance has
executesqlpermission enabled (Data API allowed).
2. Gemini CLI
- Install the Gemini CLI. See Get Started with Gemini CLI.
- Run
gemini --versionto ensure it's installed correctly.
3. Google Cloud Authentication
- Configure Application Default Credentials (ADC) by running:
gcloud auth application-default login
Installation
To install the Context Engineering Agent via the Gemini CLI:
gemini extensions install https://github.com/GoogleCloudPlatform/db-context-enrichment
Optional: VSCode Integration
For an enhanced editing and diffing experience when reviewing context changes:
- Install VSCode.
- Install the Gemini CLI Companion extension from the VSCode Marketplace.
- Open your workspace folder in VSCode, open the integrated terminal, and run
gemini. You can verify the IDE extension is active by running/ide status.
Workflows
The extension is designed to support the Critical User Journeys (CUJs) for context engineering, following an iterative optimization loop:
- Bootstrap: Generate an initial baseline context.
- Evaluate: Measure context effectiveness against a golden dataset.
- Hill-Climbing: Perform gap analysis on failures and generate automated fixes.
- Iterate: Apply the improved context and re-run evaluation to continuously improve quality.
- Final Validation (Optional): Verify mutations against a separated test set to ensure generalization and prevent overfitting.
All workflows are exposed as Agent Skills. Activate them by asking the agent in natural language (e.g. "bootstrap the context for my database") or by invoking the skill by name. Start the Gemini CLI by running gemini in your workspace folder, or launch Claude Code with the plugin installed.
Automated Iterative Optimization (Autoctx)
- Initialize (
autoctx-init): Sets up your local workspace by creating anautoctx/directory. It checks for the presence of a validtools.yamlconfiguration inside it. If missing, the agent will interactively prompt you for your database connection details and generate the file for you. It also creates thestate.mdfile to track experiment progress and anexperiments/directory, all withinautoctx/. - Generate Dataset (
autoctx-dataset-generation): Rapidly creates or expands a baseline of evaluation questions (golden dataset). It asks you for sample queries or descriptions of what users might ask, and generates a JSON file with Natural Language Queries (NLQs) and Golden SQL statements. - Bootstrap (
autoctx-bootstrap): Generates an initial context set. It performs progressive schema discovery to understand your database structure and qualified tables. It then generates starting Templates and Facets based on the schema and any user-provided documentation or sample queries. - Evaluate (
autoctx-evaluate): Measures context effectiveness against your golden dataset. This step automatically generates all necessary Evalbench configuration files (db_config.yaml,model_config.yaml,run_config.yaml,llmrater_config.yaml) inside the experiment folder and runs the evaluation pipeline to produce accuracy scores and identify failure cases. - Hill-Climb (
autoctx-hillclimb): Performs gap analysis on failures identified in the evaluation step. It reads the failure cases, determines why the LLM failed to generate correct SQL, and proposes updates or new additions to the context set (Templates or Facets) to improve performance in the next iteration.
Targeted Manual Generation
These are more basic workflows for context engineering to manually author specific context elements, each exposed via the context-generation-guide skill scoped to the relevant context type:
- Generate Templates: Initiates a guided workflow where you provide a sample question and SQL, and the agent helps you parameterize and save it as a template.
- Generate Facets: Guides you to define a specific intent and the corresponding SQL snippet (e.g., filter condition) to save as a facet.
- Generate Value Searches: Helps you configure how the system searches for and matches specific values within a concept type (e.g., setting up exact match or trigram fuzzy search for product names).
Development and Testing
To develop a new feature and test it with the correct dependencies and full bundling (including Evalbench and Toolbox binaries), it is recommended to create a custom release in your fork.
1. Integration Testing (Fork & Release Workflow)
- Create a fork of the repository on GitHub.
- Set up the fork/upstream relationship in your local environment:
git clone https://github.com/YOUR-USERNAME/db-context-enrichment.git git remote add upstream https://github.com/GoogleCloudPlatform/db-context-enrichment
- Develop changes in a new branch in your local repo and push them to your fork.
- Create a new release for your fork on GitHub (e.g., using tag
0.0.1-test). - Wait for the release assets to be generated by GitHub Actions in your fork.
- Install the custom release via Gemini CLI:
gemini extensions install https://github.com/YOUR-USERNAME/db-context-enrichment --ref 0.0.1-test
Note: Usegemini extensions uninstall google-cloud-db-context-engineeringto remove previous installations if needed.
2. Official Release Pipeline
- Releases are versioned and prepared automatically by the Release Please GitHub App.
- When functional PRs are merged, Release Please opens/updates a pending Release PR (bumping the extension version and updating the changelog).
- Merging the Release PR signals Release Please to tag the commit and create an official GitHub Release.
- The creation of the GitHub Release triggers the
.github/workflows/release.ymlpipeline. - The pipeline uses PyInstaller to build standalone binary executables for Linux (x64), macOS (arm64), and Windows (x64).
- The pipeline packages the binary,
LICENSE,GEMINI.md, and dynamically updatesgemini-extension.jsoninto.tar.gzand.ziparchives. - These archives are automatically attached back to the GitHub release as downloadable assets.
- Users receive the update the next time they install or upgrade the extension via Gemini CLI (
gemini extensions update --all).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file google_cloud_db_context_engineering-0.6.0.tar.gz.
File metadata
- Download URL: google_cloud_db_context_engineering-0.6.0.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45187f1752a3f98e3e818fd515d734cc6e8c1ed0e0f10fd9fe7631db56c992e3
|
|
| MD5 |
97fa864fe081ce56bd5b2a353bf49813
|
|
| BLAKE2b-256 |
e508ba495f9c920261eb405e708eb8390361803a8c31f6424f7fd560eab6c9e3
|
Provenance
The following attestation bundles were made for google_cloud_db_context_engineering-0.6.0.tar.gz:
Publisher:
google-cloud-sdk-py@oss-exit-gate-prod.iam.gserviceaccount.com
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
google_cloud_db_context_engineering-0.6.0.tar.gz -
Subject digest:
45187f1752a3f98e3e818fd515d734cc6e8c1ed0e0f10fd9fe7631db56c992e3 - Sigstore transparency entry: 1831720906
- Sigstore integration time:
-
Token Issuer:
https://accounts.google.com -
Service Account:
google-cloud-sdk-py@oss-exit-gate-prod.iam.gserviceaccount.com
-
Statement type:
File details
Details for the file google_cloud_db_context_engineering-0.6.0-py3-none-any.whl.
File metadata
- Download URL: google_cloud_db_context_engineering-0.6.0-py3-none-any.whl
- Upload date:
- Size: 28.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b53a7eea4c8c3761c753f8229adb0c8b8305df9914b976776583ccdb3074c5fd
|
|
| MD5 |
e8304bd01388f844f163809dc302d984
|
|
| BLAKE2b-256 |
70c1d5c25e822c295bf880e6f68444111e1a581ae467a39ea36b7367a784afa3
|
Provenance
The following attestation bundles were made for google_cloud_db_context_engineering-0.6.0-py3-none-any.whl:
Publisher:
google-cloud-sdk-py@oss-exit-gate-prod.iam.gserviceaccount.com
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
google_cloud_db_context_engineering-0.6.0-py3-none-any.whl -
Subject digest:
b53a7eea4c8c3761c753f8229adb0c8b8305df9914b976776583ccdb3074c5fd - Sigstore transparency entry: 1831720848
- Sigstore integration time:
-
Token Issuer:
https://accounts.google.com -
Service Account:
google-cloud-sdk-py@oss-exit-gate-prod.iam.gserviceaccount.com
-
Statement type: