Skip to main content

CLI toolkit for Gemini Enterprise Connector evaluation — init, check, run, and evaluate RAG pipelines against golden datasets.

Project description

Gemini Enterprise Connector — Evaluation Toolkit

License: Apache 2.0 PyPI version Python Unit Tests

CLI toolkit for evaluating Gemini Enterprise Connector RAG pipelines. Compares actual responses against a golden dataset, producing pass/fail grades, root cause analysis, and an interactive HTML dashboard.

Installation

Using uv (recommended)

uv pip install ge-eval

Using pip

pip install ge-eval

From source

git clone https://github.com/yapweiyih/gemini-enterprise-connector-eval.git
cd gemini-enterprise-connector-eval
uv sync

Quick Start

# Step 1: Scaffold a working directory with sample inputs
ge-eval init

# Step 2: Edit .env with your Google Cloud project settings

# Step 3: Validate configuration
ge-eval check

# Step 4: Query the API (sends questions, gets responses)
ge-eval run

# Step 5: Run LLM-judge evaluation
ge-eval eval

# Step 6: View results in the browser
ge-eval serve
# Then open http://localhost:8080

Note: Run ge-eval init to generate an INSTRUCTION.md file with a comprehensive setup guide covering authentication, input structure, and command details.

What the Pipeline Does

ge-eval eval orchestrates 6 steps in sequence:

Step Action Description
1 Validate Inputs Checks that golden dataset, CSV, and HTML folder exist; validates question alignment
2 Generate Summaries Extracts agent trajectories from dolphin debug HTML files → outputs/summaries/
3 Run LLM Judge Calls Gemini to evaluate all questions
4 Enrich Golden Source Adds expected citations from golden dataset to the CSV
5 Normalize Columns Reorders CSV to final 15-column schema
6 Generate Report Creates outputs/G_REPORT.md with stats and detailed RCA

CLI Commands

Command Description
ge-eval init Scaffold a working directory with sample inputs, .env, and INSTRUCTION.md
ge-eval check Validate config, env vars, and input file alignment
ge-eval run Batch-query the GE streamAssist API
ge-eval eval Run the full LLM-judge evaluation pipeline
ge-eval serve Start a local HTTP server for the evaluation viewer
ge-eval clean Remove all files from inputs/, outputs/, and INSTRUCTION.md

Documentation

Full documentation is available at: https://yapweiyih.github.io/gemini-enterprise-connector-eval/

After running ge-eval init, see the generated INSTRUCTION.md for a detailed setup guide including authentication, input structure, and output column schema.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines on how to get started.

Development Setup

# Clone and install dev dependencies
git clone https://github.com/yapweiyih/gemini-enterprise-connector-eval.git
cd gemini-enterprise-connector-eval
uv sync

# Run tests
uv run pytest tests/ -v

# Build documentation locally
uv run mkdocs serve

Running Tests

# Run all tests
uv run pytest tests/ -v

# Run specific test module
uv run pytest tests/test_ge_eval_cli.py -v

# Run with coverage
uv run pytest tests/ --cov=ge_eval -v

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ge_eval-0.1.2.tar.gz (61.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ge_eval-0.1.2-py3-none-any.whl (92.7 kB view details)

Uploaded Python 3

File details

Details for the file ge_eval-0.1.2.tar.gz.

File metadata

  • Download URL: ge_eval-0.1.2.tar.gz
  • Upload date:
  • Size: 61.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ge_eval-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b166e41103a8ac1d4519913176c5adb4b1ed851cf55fa9703ff90955aeb840b9
MD5 af1897f89a7c85fe4dd3782c28bcdde0
BLAKE2b-256 4bdde35373bc53c409c54324f184e7fdd01161d59fbfdf4df64c13d518f94ed6

See more details on using hashes here.

File details

Details for the file ge_eval-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ge_eval-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 92.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ge_eval-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 adefffc54952506afba06a0088cffd8d9c47a57a849d61f1876347816336473f
MD5 70f850c48f7ffe8130e9d1f427509464
BLAKE2b-256 af9b538e6cec26c30b9e1eb142be535772f7643498aa1446ccb0b34b1d64043e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page