First comprehensive benchmark for Generative Engine Marketing (GEM), an emerging field that focuses on monetizing generative AI by seamlessly integrating advertisements into Large Language Model (LLM) responses. Our work addresses the core problem of ad-injected response (AIR) generation and provides a framework for its evaluation.

These details have not been verified by PyPI

Project links

Project description

GEM-BENCH

Screenshot

This repository provides a comprehensive benchmark for Generative Engine Marketing (GEM), an emerging field that focuses on monetizing generative AI by seamlessly integrating advertisements into Large Language Model (LLM) responses. Our work addresses the core problem of ad-injected response (AIR) generation and provides a framework for its evaluation.

Generative Engine Marketing (GEM): A new ecosystem where relevant ads are integrated directly into responses from generative AI assistants, such as LLM-based chatbots.
Ad-injected Response (AIR) Generation: The process of creating responses that seamlessly include relevant advertisements while maintaining a high-quality user experience and satisfying advertiser objectives.
GEM-BENCH: The first comprehensive benchmark designed for the generation and evaluation of ad-injected responses.

🔧 Installation

Prerequisites

Python 3.12 or higher
Conda (recommended for environment management)

Setup

# Clone the repository
git clone https://github.com/Generative-Engine-Marketing/GEM-Bench.git
cd GemBench

# Create and activate conda environment
conda create --name GemBench python=3.12
conda activate GemBench

# Install Project
pip install -e .

Environment Configuration

Create a .env file in the root directory with the following variables:

# Please fill in your own API keys here and change the file name to .env
OPENAI_API_KEY="<LLMs API Key>"
BASE_URL="<LLMs Base URL>"

TRANSFORMERS_OFFLINE=1 # Enable offline mode for Hugging Face Transformers
HF_HUB_OFFLINE=1 # Enable offline mode for Hugging Face Hub

# Embedding
EMBEDDING_API_KEY="<Embedding API Key>"
EMBEDDING_BASE_URL="<Embedding Base URL>"

🚀 Getting Started

After setting up your environment and configuration, you can run the main script to reproduce the experiments from our paper.

python paper.py

To modify the evaluation, edit the paper.py file to adjust the data_sets, solutions dictionary, and model_name/judge_model parameters.

Available Datasets

The GEM-BENCH benchmark includes three curated datasets that cover both chatbot and search scenarios. You can find their paths within the paper.py script.

MT-Human: Based on the humanities questions from the MT-Bench benchmark, this dataset is suitable for ad injection in a multi-turn chatbot scenario.
LM-Market: Curated from the LMSYS-Chat-1M dataset, it contains real user-LLM conversations focused on marketing-related topics.
CA-Prod: Simulates the AI overview feature in search engines using commercial advertising data from a search engine.

Evaluation Methods

GEM-BENCH provides a multi-faceted metric ontology for evaluating ad-injected responses, covering both quantitative and qualitative aspects of user satisfaction and engagement. The evaluation logic is located in evaluation/.

Quantitative Metrics:
- Response Flow & Coherence: Measure the semantic smoothness and topic consistency of the response.
- Ad Flow & Coherence: Specifically assess how well the ad sentence integrates with the surrounding text.
- Injection Rate & Click-Through Rate (CTR): Capture the system's ability to deliver ads and user engagement.
Qualitative Metrics:
- User Satisfaction: Evaluated on dimensions like Accuracy, Naturalness (interruptiveness, authenticity), Personality (helpfulness, salesmanship), and Trust (credibility, bias).
- User Engagement: Measured by Notice (awareness, attitude) and Click (awareness of sponsored links, likelihood to click).

Supported Solutions

The benchmark provides implementations for several baseline solutions, allowing for flexible experimentation. You can find their configurations and exposed parameters within the paper.py file.

Ad-Chat: An existing solution that integrates ads into the system prompt of the LLM.
- Parameters: model_name (default: doubao-1-5-lite-32k).
Ad-LLM: A multi-agent framework inspired by recent work, implemented with different configurations:
- GI-R: Generate and Inject with ad Retrieval based on the raw response. This is a retrieval-augmented generation (RAG) approach that skips the final rewriting step.
- GIR-R: Generate, Inject, and Rewrite with ad Retrieval based on the raw response.
- GIR-P: Generate, Inject, and Rewrite with ad Retrieval based on the user Prompt.
- Parameters: All Ad-LLM solutions expose the embedding_model and ad_retriever as configurable parameters. The response_rewriter and ad_injector modules also have internal parameters that can be modified.

📖 Citation

If you use GEM-BENCH in your research, please cite our paper:

@article{hu2025gembench,
  title={GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing},
  author={Hu, Silan and Zhang, Shiqi and Shi, Yimin and Xiao, Xiaokui},
  journal={arXiv preprint arXiv:2509.14221},
  year={2025}
}

For more information, visit our website: https://gem-bench.org

📄 License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.9

Oct 13, 2025

1.0.2

Oct 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gembench-1.0.9-py3-none-any.whl (28.9 MB view details)

Uploaded Oct 13, 2025 Python 3

File details

Details for the file gembench-1.0.9-py3-none-any.whl.

File metadata

Download URL: gembench-1.0.9-py3-none-any.whl
Upload date: Oct 13, 2025
Size: 28.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for gembench-1.0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a3d16a8c7667044edd77a48827675937920c0a9b179d67aa6926d18d684e43c4`
MD5	`68d3de8de60ce6cfd434f7466440c5cb`
BLAKE2b-256	`3e73cb1874682c1603515d5017589aa4925880fae50f733c2091e1c55717fcb3`

See more details on using hashes here.

gembench 1.0.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GEM-BENCH

📋 Table of Contents

🔧 Installation

Prerequisites

Setup

Environment Configuration

🚀 Getting Started

Available Datasets

Evaluation Methods

Supported Solutions

📖 Citation

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes