Benchmarking the guided infilling models.
Project description
GIMBench
GIMBench is a benchmarking framework for evaluating Guided Infilling Models (GIM).
Overview
This project provides tools and benchmarks to evaluate models' ability to perform guided infilling tasks - generating text that follows specific constraints and patterns.
Installation
Install GIMBench using pip:
pip install gimbench
For development:
make install-dev
Usage
GIMBench provides several benchmark types:
- CV Parsing: Evaluate models on structured information extraction from CVs
- Regex Matching: Test models' ability to generate text matching specific patterns
- Multiple Choice QA: Assess guided generation in question-answering contexts
- Perplexity: Measure language modeling quality with constraints
- Code Infilling: Evaluate code infilling via unit-test execution (pass@k)
Example Commands
Run MMLU-Pro benchmark:
python -m gimbench.mcqa.mmlu_pro \
--model_type vllm \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--base_url http://localhost:8000/v1
Run GPQA Diamond benchmark:
python -m gimbench.mcqa.gpqa_diamond \
--model_type openai \
--model_name gpt-4 \
--api_key YOUR_API_KEY
Run GIM-SFT perplexity evaluation:
python -m gimbench.ppl.gim_sft \
--model_type vllm-offline \
--model_name meta-llama/Llama-3.1-8B-Instruct
Run HumanEval Infilling benchmark (code generation + unit-test execution, pass@k):
# GIM-guided infilling (default), pass@1
python -m gimbench.code.humaneval_infilling \
--model_type vllm \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--base_url http://localhost:8000/v1
# Sample 20 completions per problem, report pass@1 and pass@10
python -m gimbench.code.humaneval_infilling \
--model_type vllm \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--base_url http://localhost:8000/v1 \
--temperature 0.8 \
--num_samples 20 \
--pass_k 1 10
# Plain LLM (no GIMKit)
python -m gimbench.code.humaneval_infilling \
--model_type vllm \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--base_url http://localhost:8000/v1 \
--no_gimkit
Development
Run linting:
make lint
Fix linting issues automatically:
make lint-fix
Run pre-commit hooks:
make pre-commit
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gimbench-0.4.0.tar.gz.
File metadata
- Download URL: gimbench-0.4.0.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b5ec2e58a2f0ce325be1eec350a617b7a2228c05023998e1c08c811c4c8b953
|
|
| MD5 |
a4b1421eddb891f2508cb3e465bf43f1
|
|
| BLAKE2b-256 |
1287e1dd328e733b33c932fdafd216c225404c34caee385141d7f2eb95bacbaa
|
File details
Details for the file gimbench-0.4.0-py3-none-any.whl.
File metadata
- Download URL: gimbench-0.4.0-py3-none-any.whl
- Upload date:
- Size: 33.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dca1cb3aa50582b0fa5d562370b0f92469f0bc124cfc00623b8f05da7efad7ec
|
|
| MD5 |
a0a12ddad203e22374dcd87f6bb5ac5d
|
|
| BLAKE2b-256 |
9426023d4e232c289de35296c1d6b051d01ed421e5e54ec459941e6aae51f3aa
|