Benchmarking the guided infilling models.
Project description
GIMBench
GIMBench is a benchmarking framework for evaluating Guided Infilling Models (GIM).
Overview
This project provides tools and benchmarks to evaluate models' ability to perform guided infilling tasks - generating text that follows specific constraints and patterns.
Installation
Install GIMBench using pip:
pip install gimbench
For development:
make install-dev
Usage
GIMBench provides several benchmark types:
- CV Parsing: Evaluate models on structured information extraction from CVs
- Regex Matching: Test models' ability to generate text matching specific patterns
- Multiple Choice QA: Assess guided generation in question-answering contexts
- Perplexity: Measure language modeling quality with constraints
- Code Infilling: Evaluate code infilling via unit-test execution (pass@k)
- SciERC Relation Extraction: Evaluate scientific relation extraction on the Hugging Face dataset
Sculpt-AI/GIMBench-sci-erc
Example Commands
Run MMLU-Pro benchmark:
python -m gimbench.mcqa.mmlu_pro \
--model_type vllm \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--base_url http://localhost:8000/v1
Run GPQA Diamond benchmark:
python -m gimbench.mcqa.gpqa_diamond \
--model_type openai \
--model_name gpt-4 \
--api_key YOUR_API_KEY
Run GIM-SFT perplexity evaluation:
python -m gimbench.ppl.gim_sft \
--model_type vllm-offline \
--model_name meta-llama/Llama-3.1-8B-Instruct
Run HumanEval Infilling benchmark (code generation + unit-test execution, pass@k):
# GIM-guided infilling (default), pass@1
python -m gimbench.code.humaneval_infilling \
--model_type vllm \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--base_url http://localhost:8000/v1
# Sample 20 completions per problem, report pass@1 and pass@10
python -m gimbench.code.humaneval_infilling \
--model_type vllm \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--base_url http://localhost:8000/v1 \
--temperature 0.8 \
--num_samples 20 \
--pass_k 1 10
# Plain LLM (no GIMKit)
python -m gimbench.code.humaneval_infilling \
--model_type vllm \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--base_url http://localhost:8000/v1 \
--no_gimkit
Run SciERC relation extraction benchmark (Hugging Face dataset):
```bash
python -m gimbench.scierc.scierc \
--model_type vllm \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--base_url http://localhost:8000/v1 \
--scierc_split test
# Plain LLM (no GIMKit)
python -m gimbench.scierc.scierc \
--model_type vllm \
--model_name meta-llama/Llama-3.1-8B-Instruct \
--base_url http://localhost:8000/v1 \
--scierc_split dev \
--no_gimkit
If you need to rebuild and upload the dataset, use:
python benchmarks/GIMBench-sci-erc/1_build_dataset.py \
--raw_dir benchmarks/GIMBench-sci-erc/data/raw_data \
--repo_id Sculpt-AI/GIMBench-sci-erc \
--push_to_hub
## Development
Run linting:
```bash
make lint
Fix linting issues automatically:
make lint-fix
Run pre-commit hooks:
make pre-commit
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gimbench-0.5.0.tar.gz.
File metadata
- Download URL: gimbench-0.5.0.tar.gz
- Upload date:
- Size: 24.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fa1ad40663a15a68a1003bac89508340de94d33c8a2efed59fbd85d2057b27c
|
|
| MD5 |
d1f1c757859b041b7d998aa27e3f50bf
|
|
| BLAKE2b-256 |
8d2edd5731f2adb552cf0661c29998be31800e13cae852a95fff1a79fc72a221
|
File details
Details for the file gimbench-0.5.0-py3-none-any.whl.
File metadata
- Download URL: gimbench-0.5.0-py3-none-any.whl
- Upload date:
- Size: 37.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c69ae3e615ffd1a484cdc4718f77cc8d537fddac5cf59b3b6cbc2924f0219aa3
|
|
| MD5 |
b48d4989fd5697c404ea8a62cd3de559
|
|
| BLAKE2b-256 |
d4ae18dfe7d50b660bf3ae9f98e42596104719a56cb2d03eadbbfdeec675e428
|