Skip to main content

CLI toolkit for Gemini Enterprise Connector evaluation — init, check, run, and evaluate search quality against golden datasets.

Project description

Gemini Enterprise Connector — Evaluation Toolkit

License: Apache 2.0 PyPI version Python

CLI toolkit for evaluating Gemini Enterprise Connector search quality. Compares actual responses against a golden dataset, producing pass/fail grades, root cause analysis, and an interactive HTML dashboard.

Installation

Using uv (recommended)

uv pip install ge-eval

Using pip

pip install ge-eval

From source

git clone https://github.com/gcp-ai-fde/weiyih-gemini-enterprise-connector-eval.git
cd weiyih-gemini-enterprise-connector-eval
uv sync

Quick Start

# Step 1: Scaffold a working directory with sample inputs
ge-eval init

# Step 2: Edit .env with your Google Cloud project settings

# Step 3: Validate configuration
ge-eval check

# Step 4: Query the API (sends questions, gets responses)
ge-eval run

# Step 5: Run LLM-judge evaluation
ge-eval eval

# Step 6: View results in the browser
ge-eval serve
# Then open http://localhost:8080

Note: Run ge-eval init to generate an INSTRUCTION.md file with a comprehensive setup guide covering authentication, input structure, and command details.

What the Pipeline Does

ge-eval eval orchestrates 6 steps in sequence:

Step Action Description
1 Validate Inputs Checks that golden dataset, CSV, and HTML folder exist; validates question alignment
2 Generate Summaries Extracts agent trajectories from dolphin debug HTML files → outputs/summaries/
3 Run LLM Judge Calls Gemini to evaluate all questions
4 Enrich Golden Source Adds expected citations from golden dataset to the CSV
5 Normalize Columns Reorders CSV to final 15-column schema
6 Generate Report Creates outputs/G_REPORT.md with stats and detailed RCA

CLI Commands

Command Description
ge-eval init Scaffold a working directory with sample inputs, .env, and INSTRUCTION.md
ge-eval check Validate config, env vars, and input file alignment
ge-eval run Batch-query the GE streamAssist API
ge-eval eval Run the full LLM-judge evaluation pipeline
ge-eval serve Start a local HTTP server for the evaluation viewer
ge-eval clean Remove all files from inputs/, outputs/, and INSTRUCTION.md

Documentation

Full documentation is available at: https://gcp-ai-fde.github.io/weiyih-gemini-enterprise-connector-eval/

After running ge-eval init, see the generated INSTRUCTION.md for a detailed setup guide including authentication, input structure, and output column schema.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines on how to get started.

Development Setup

# Clone and install dev dependencies
git clone https://github.com/gcp-ai-fde/weiyih-gemini-enterprise-connector-eval.git
cd weiyih-gemini-enterprise-connector-eval
uv sync

# Run tests
uv run pytest tests/ -v

# Build documentation locally
uv run mkdocs serve

Running Tests

# Run all tests
uv run pytest tests/ -v

# Run specific test module
uv run pytest tests/test_ge_eval_cli.py -v

# Run with coverage
uv run pytest tests/ --cov=ge_eval -v

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ge_eval-0.2.0.tar.gz (61.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ge_eval-0.2.0-py3-none-any.whl (92.6 kB view details)

Uploaded Python 3

File details

Details for the file ge_eval-0.2.0.tar.gz.

File metadata

  • Download URL: ge_eval-0.2.0.tar.gz
  • Upload date:
  • Size: 61.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ge_eval-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9ef9147bf9bcdbce6590ae48f29720509e6367d5bb6743116bd4df3c0eb34ef5
MD5 008b336d7f0afb1036ff7fa057ae2d8a
BLAKE2b-256 d4d830a8b8ba032d9cd25063716fa38fc3fc25aa73bbddcfe26e9bbc84b7a414

See more details on using hashes here.

File details

Details for the file ge_eval-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ge_eval-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 92.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ge_eval-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fbee6a4c68cd4d204596d8e418d758b9937dbf501dcbff43f9e487c18c740187
MD5 c510d73ab3b3ed1350ef6aaf7efd87a1
BLAKE2b-256 c89eccc4df1c2894213e3799e4c3f85dc0cc8db4a8e53d6617f405f428755a72

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page