Skip to main content

Benchmarking the guided infilling models.

Project description

GIMBench

GIMBench is a benchmarking framework for evaluating Guided Infilling Models (GIM).

Overview

This project provides tools and benchmarks to evaluate models' ability to perform guided infilling tasks - generating text that follows specific constraints and patterns.

Installation

Install GIMBench using pip:

pip install gimbench

For development:

make install-dev

Usage

GIMBench provides several benchmark types:

  • CV Parsing: Evaluate models on structured information extraction from CVs
  • Regex Matching: Test models' ability to generate text matching specific patterns
  • Multiple Choice QA: Assess guided generation in question-answering contexts
  • Perplexity: Measure language modeling quality with constraints
  • Code Infilling: Evaluate code infilling via unit-test execution (pass@k)

Example Commands

Run MMLU-Pro benchmark:

python -m gimbench.mcqa.mmlu_pro \
    --model_type vllm \
    --model_name meta-llama/Llama-3.1-8B-Instruct \
    --base_url http://localhost:8000/v1

Run GPQA Diamond benchmark:

python -m gimbench.mcqa.gpqa_diamond \
    --model_type openai \
    --model_name gpt-4 \
    --api_key YOUR_API_KEY

Run GIM-SFT perplexity evaluation:

python -m gimbench.ppl.gim_sft \
    --model_type vllm-offline \
    --model_name meta-llama/Llama-3.1-8B-Instruct

Run HumanEval Infilling benchmark (code generation + unit-test execution, pass@k):

# GIM-guided infilling (default), pass@1
python -m gimbench.code.humaneval_infilling \
    --model_type vllm \
    --model_name meta-llama/Llama-3.1-8B-Instruct \
    --base_url http://localhost:8000/v1

# Sample 20 completions per problem, report pass@1 and pass@10
python -m gimbench.code.humaneval_infilling \
    --model_type vllm \
    --model_name meta-llama/Llama-3.1-8B-Instruct \
    --base_url http://localhost:8000/v1 \
    --temperature 0.8 \
    --num_samples 20 \
    --pass_k 1 10

# Plain LLM (no GIMKit)
python -m gimbench.code.humaneval_infilling \
    --model_type vllm \
    --model_name meta-llama/Llama-3.1-8B-Instruct \
    --base_url http://localhost:8000/v1 \
    --no_gimkit

Development

Run linting:

make lint

Fix linting issues automatically:

make lint-fix

Run pre-commit hooks:

make pre-commit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gimbench-0.4.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gimbench-0.4.0-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file gimbench-0.4.0.tar.gz.

File metadata

  • Download URL: gimbench-0.4.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gimbench-0.4.0.tar.gz
Algorithm Hash digest
SHA256 1b5ec2e58a2f0ce325be1eec350a617b7a2228c05023998e1c08c811c4c8b953
MD5 a4b1421eddb891f2508cb3e465bf43f1
BLAKE2b-256 1287e1dd328e733b33c932fdafd216c225404c34caee385141d7f2eb95bacbaa

See more details on using hashes here.

File details

Details for the file gimbench-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: gimbench-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for gimbench-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dca1cb3aa50582b0fa5d562370b0f92469f0bc124cfc00623b8f05da7efad7ec
MD5 a0a12ddad203e22374dcd87f6bb5ac5d
BLAKE2b-256 9426023d4e232c289de35296c1d6b051d01ed421e5e54ec459941e6aae51f3aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page