First comprehensive benchmark for Generative Engine Marketing (GEM), an emerging field that focuses on monetizing generative AI by seamlessly integrating advertisements into Large Language Model (LLM) responses. Our work addresses the core problem of ad-injected response (AIR) generation and provides a framework for its evaluation.
Project description
GEM-BENCH
This repository provides a comprehensive benchmark for Generative Engine Marketing (GEM), an emerging field that focuses on monetizing generative AI by seamlessly integrating advertisements into Large Language Model (LLM) responses. Our work addresses the core problem of ad-injected response (AIR) generation and provides a framework for its evaluation.
- Generative Engine Marketing (GEM): A new ecosystem where relevant ads are integrated directly into responses from generative AI assistants, such as LLM-based chatbots.
- Ad-injected Response (AIR) Generation: The process of creating responses that seamlessly include relevant advertisements while maintaining a high-quality user experience and satisfying advertiser objectives.
- GEM-BENCH: The first comprehensive benchmark designed for the generation and evaluation of ad-injected responses.
📋 Table of Contents
🔧 Installation
Prerequisites
- Python 3.12 or higher
- Conda (recommended for environment management)
Setup
# Clone the repository
git clone https://github.com/Generative-Engine-Marketing/GEM-Bench.git
cd GemBench
# Create and activate conda environment
conda create --name GemBench python=3.12
conda activate GemBench
# Install Project
pip install -e .
Environment Configuration
Create a .env file in the root directory with the following variables:
# Please fill in your own API keys here and change the file name to .env
OPENAI_API_KEY="<LLMs API Key>"
BASE_URL="<LLMs Base URL>"
TRANSFORMERS_OFFLINE=1 # Enable offline mode for Hugging Face Transformers
HF_HUB_OFFLINE=1 # Enable offline mode for Hugging Face Hub
# Embedding
EMBEDDING_API_KEY="<Embedding API Key>"
EMBEDDING_BASE_URL="<Embedding Base URL>"
🚀 Getting Started
After setting up your environment and configuration, you can run the main script to reproduce the experiments from our paper.
python paper.py
To modify the evaluation, edit the paper.py file to adjust the data_sets, solutions dictionary, and model_name/judge_model parameters.
Available Datasets
The GEM-BENCH benchmark includes three curated datasets that cover both chatbot and search scenarios. You can find their paths within the paper.py script.
- MT-Human: Based on the humanities questions from the MT-Bench benchmark, this dataset is suitable for ad injection in a multi-turn chatbot scenario.
- LM-Market: Curated from the LMSYS-Chat-1M dataset, it contains real user-LLM conversations focused on marketing-related topics.
- CA-Prod: Simulates the AI overview feature in search engines using commercial advertising data from a search engine.
Evaluation Methods
GEM-BENCH provides a multi-faceted metric ontology for evaluating ad-injected responses, covering both quantitative and qualitative aspects of user satisfaction and engagement. The evaluation logic is located in evaluation/.
-
Quantitative Metrics:
- Response Flow & Coherence: Measure the semantic smoothness and topic consistency of the response.
- Ad Flow & Coherence: Specifically assess how well the ad sentence integrates with the surrounding text.
- Injection Rate & Click-Through Rate (CTR): Capture the system's ability to deliver ads and user engagement.
-
Qualitative Metrics:
- User Satisfaction: Evaluated on dimensions like Accuracy, Naturalness (interruptiveness, authenticity), Personality (helpfulness, salesmanship), and Trust (credibility, bias).
- User Engagement: Measured by Notice (awareness, attitude) and Click (awareness of sponsored links, likelihood to click).
Supported Solutions
The benchmark provides implementations for several baseline solutions, allowing for flexible experimentation. You can find their configurations and exposed parameters within the paper.py file.
-
Ad-Chat: An existing solution that integrates ads into the system prompt of the LLM.
- Parameters:
model_name(default:doubao-1-5-lite-32k).
- Parameters:
-
Ad-LLM: A multi-agent framework inspired by recent work, implemented with different configurations:
- GI-R: Generate and Inject with ad Retrieval based on the raw response. This is a retrieval-augmented generation (RAG) approach that skips the final rewriting step.
- GIR-R: Generate, Inject, and Rewrite with ad Retrieval based on the raw response.
- GIR-P: Generate, Inject, and Rewrite with ad Retrieval based on the user Prompt.
- Parameters: All Ad-LLM solutions expose the
embedding_modelandad_retrieveras configurable parameters. Theresponse_rewriterandad_injectormodules also have internal parameters that can be modified.
📖 Citation
If you use GEM-BENCH in your research, please cite our paper:
@article{hu2025gembench,
title={GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing},
author={Hu, Silan and Zhang, Shiqi and Shi, Yimin and Xiao, Xiaokui},
journal={arXiv preprint arXiv:2509.14221},
year={2025}
}
For more information, visit our website: https://gem-bench.org
📄 License
This project is licensed under the Apache-2.0 License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gembench-1.0.9-py3-none-any.whl.
File metadata
- Download URL: gembench-1.0.9-py3-none-any.whl
- Upload date:
- Size: 28.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3d16a8c7667044edd77a48827675937920c0a9b179d67aa6926d18d684e43c4
|
|
| MD5 |
68d3de8de60ce6cfd434f7466440c5cb
|
|
| BLAKE2b-256 |
3e73cb1874682c1603515d5017589aa4925880fae50f733c2091e1c55717fcb3
|