Skip to main content

CLI toolkit for Gemini Enterprise Connector evaluation — init, check, run, and evaluate search quality against golden datasets.

Project description

Gemini Enterprise Connector — Evaluation Toolkit

License: Apache 2.0 PyPI version Python

CLI toolkit for evaluating Gemini Enterprise Connector search quality. Compares actual responses against a golden dataset, producing pass/fail grades, root cause analysis, and an interactive HTML dashboard.

Installation

Using uv (recommended)

uv pip install ge-eval

Using pip

pip install ge-eval

From source

git clone https://github.com/cloud-ai-fde/weiyih-gemini-enterprise-connector-eval.git
cd weiyih-gemini-enterprise-connector-eval
uv sync

Quick Start

# Step 1: Scaffold a working directory with sample inputs
ge-eval init

# Step 2: Edit .env with your Google Cloud project settings

# Step 3: Validate configuration
ge-eval check

# Step 4: Query the API (sends questions, gets responses)
ge-eval run

# Step 5: Run LLM-judge evaluation
ge-eval eval

# Step 6: View results in the browser
ge-eval serve
# Then open http://localhost:8080

Note: Run ge-eval init to generate an INSTRUCTION.md file with a comprehensive setup guide covering authentication, input structure, and command details.

What the Pipeline Does

ge-eval eval orchestrates 6 steps in sequence:

Step Action Description
1 Validate Inputs Checks that golden dataset, CSV, and HTML folder exist; validates question alignment
2 Generate Summaries Extracts agent trajectories from dolphin debug HTML files → outputs/summaries/
3 Run LLM Judge Calls Gemini to evaluate all questions
4 Enrich Golden Source Adds expected citations from golden dataset to the CSV
5 Normalize Columns Reorders CSV to final 15-column schema
6 Generate Report Creates outputs/G_REPORT.md with stats and detailed RCA

CLI Commands

Command Description
ge-eval init Scaffold a working directory with sample inputs, .env, and INSTRUCTION.md
ge-eval check Validate config, env vars, and input file alignment
ge-eval run Batch-query the GE streamAssist API
ge-eval eval Run the full LLM-judge evaluation pipeline
ge-eval serve Start a local HTTP server for the evaluation viewer
ge-eval clean Remove all files from inputs/, outputs/, and INSTRUCTION.md

Documentation

Full documentation is available at: https://cloud-ai-fde.github.io/weiyih-gemini-enterprise-connector-eval/

After running ge-eval init, see the generated INSTRUCTION.md for a detailed setup guide including authentication, input structure, and output column schema.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines on how to get started.

Development Setup

# Clone and install dev dependencies
git clone https://github.com/cloud-ai-fde/weiyih-gemini-enterprise-connector-eval.git
cd weiyih-gemini-enterprise-connector-eval
uv sync

# Run tests
uv run pytest tests/ -v

# Build documentation locally
uv run mkdocs serve

Running Tests

# Run all tests
uv run pytest tests/ -v

# Run specific test module
uv run pytest tests/test_ge_eval_cli.py -v

# Run with coverage
uv run pytest tests/ --cov=ge_eval -v

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ge_eval-0.2.1.tar.gz (61.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ge_eval-0.2.1-py3-none-any.whl (92.9 kB view details)

Uploaded Python 3

File details

Details for the file ge_eval-0.2.1.tar.gz.

File metadata

  • Download URL: ge_eval-0.2.1.tar.gz
  • Upload date:
  • Size: 61.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ge_eval-0.2.1.tar.gz
Algorithm Hash digest
SHA256 a44ec7ef41a4e28121cfe538144cb5222ffb85720d3e2f33101f67d45e4c56c5
MD5 b7e6293cbea3fd2035ad30721e6068c4
BLAKE2b-256 f23cb77fc7aa1d951e59245286c9fc4249a1e8d8e5b89b5bffc9a78516277ac1

See more details on using hashes here.

File details

Details for the file ge_eval-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: ge_eval-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 92.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ge_eval-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a7c29a6380472beaa7fd0ebfafa45d9b6aade6ac8a0ed5378dfa07fc2e4c7a42
MD5 b1abb5596a6ea55433f317b7733d5e8f
BLAKE2b-256 be34b7f023941423f38e1de612987894ff1231010f9225dabe6930fa53f36a0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page