"RepoQA for Evaluating Long-Context Code Understanding"
Project description
RepoQA: Evaluating Long-Context Code Understanding
🏠 Homepage: https://evalplus.github.io/repoqa.html
🚀 Installation
# without vLLM (can run openai, anthropic, and huggingface backends)
pip install --upgrade repoqa
# To enable vLLM
pip install --upgrade "repoqa[vllm]"
⏬ Install nightly version :: click to expand ::
pip install --upgrade "git+https://github.com/evalplus/repoqa.git" # without vLLM
pip install --upgrade "repoqa[vllm] @ git+https://github.com/evalplus/repoqa@main" # with vLLM
⏬ Using RepoQA as a local repo? :: click to expand ::
git clone https://github.com/evalplus/repoqa.git
cd repoqa
export PYTHONPATH=$PYTHONPATH:$(pwd)
pip install -r requirements.txt
🏁 Search Needle Function (SNF)
Search Needle Function is the first and base RepoQA task which aims to practice LLMs' ability of long-context code understanding and retrieval. Its corresponding real-life scenario is to perform precise code search from function description.
🔎 More dataset details :: click to expand ::
[!Note]
SNF includes 500 tests (5 programming languages x 10 repos x 10 needle functions) where an LLM is given:
- A large code context sorted in file dependency
- A NL description of the needle function without revealing keywords like function names
- An instruction to retrieve the described function
The evaluator passes a test if the searched function is syntactically closest to the ground-truth compared against other functions (systematically parsed by
treesitter) and the similarity is greater than a user defined threshold (by default 0.8).
You can run the SNF evaluation using various backends:
OpenAI Compatible Servers
repoqa.search_needle_function --model "gpt4-turbo" --backend openai
# 💡 If you use openai API compatible server such as vLLM servers:
# repoqa.search_needle_function --base-url "http://localhost:[PORT]/v1" \
# --model "Qwen/CodeQwen1.5-7B-Chat" --backend openai
Anthropic Compatible Servers
repoqa.search_needle_function --model "claude-3-haiku-20240307" --backend anthropic
vLLM
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend vllm
🔎 Context extension for small-ctx models :: click to expand ::
There are two ways to unlock a model's context at inference time:
- Direct Extension: Edit
max_positional_embeddingof the model'sconfig.json(e.g.,hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/[hash]/config.json) to something like22528.- Dynamic RoPE Scaling: To extend
Meta-Llama-3-8B-Instructfrom 8k to 32k (4x), edit theconfig.json:
"rope_scaling": {"type": "dynamic", "factor": 4.0}Note: This works for vLLM
<0.4.3and HuggingFace transformers. RepoQA will automatically configure dynamic RoPE for vLLM>= 0.4.3
[!Note]
Reference evaluation time:
- Llama3-8B-Instruct: 45 minutes on 2xA6000 (PCIe NVLink)
- Llama3-70B-Instruct: 100 minutes on 4xA100 (PCIe NVLink)
HuggingFace transformers
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --backend hf --trust-remote-code
[!Tip]
Installing flash-attn and additionally set
--attn-implementation "flash_attention_2"can largely lower the memory requirement.
🔨 Having trouble installing `flash-attn`? :: click to expand ::
If you have trouble with
pip install flash-attn --no-build-isolation, you can try to directly use pre-built wheels:export FLASH_ATTN_VER=2.5.8 # check latest version at https://github.com/Dao-AILab/flash-attention/releases export CUDA_VER="cu122" # check available ones at https://github.com/Dao-AILab/flash-attention/releases export TORCH_VER=$(python -c "import torch; print('.'.join(torch.__version__.split('.')[:2]))") export PY_VER=$(python -c "import platform; print(''.join(platform.python_version().split('.')[:2]))") export OS_ARCH=$(python -c "import platform; print(f'{platform.system().lower()}_{platform.machine()}')") export WHEEL=flash_attn-${FLASH_ATTN_VER}+${CUDA_VER}torch${TORCH_VER}cxx11abiFALSE-cp${PY_VER}-cp${PY_VER}-${OS_ARCH}.whl wget https://github.com/Dao-AILab/flash-attention/releases/download/v${FLASH_ATTN_VER}/${WHEEL} pip install ${WHEEL}
Google Generative AI API (Gemini)
repoqa.search_needle_function --model "gemini-1.5-pro-latest" --backend google
CLI Usage
- Input:
--model: Hugging-Face model ID, such asise-uiuc/Magicoder-S-DS-6.7B--backend:vllm(default) oropenai--base-url: OpenAI API base URL--code-context-size(default: 16384): #tokens (by DeepSeekCoder tokenizer) of repository context--caching(default: True): accelerate subsequent runs by caching preprocessing;--nocachingto disable--max-new-tokens(default: 1024): Maximum #new tokens to generate--system-message(default: None): system message (note it's not supported by some models)--tensor-parallel-size: #GPUS for doing tensor parallelism (only for vLLM)--languages(default: None): List of languages to evaluate (None means all)--result-dir(default: "results"): Directory to save the model outputs and evaluation results--ignore-comments(default: False): During evaluation, ignore groundtruth and model comments--trust-remote-code(default: False): allow remote code (for HuggingFace transformers and vLLM)--attn-implementation(default: None): Use "flash_attention_2" if your HF hits OOM
- Output:
results/ntoken_{code-context-size}/{model}.jsonl: Model generated outputsresults/ntoken_{code-context-size}/{model}-SCORE.json: Evaluation results
Compute Scores
By default, the repoqa.search_needle_function command will evaluate model outputs and compute scores after text generation.
However, you can also separately compute scores using the following command:
repoqa.compute_score --model-output-path={model-output}.jsonl
[!Tip]
- Input: Path to the model generated outputs.
- Output: The evaluation scores would be stored in
{model-output}-SCORES.json
📚 Read More
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repoqa-0.1.2.tar.gz.
File metadata
- Download URL: repoqa-0.1.2.tar.gz
- Upload date:
- Size: 5.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c08efb7f700c40c4d2e6f89d748049c3d14eaea3f14fda7192bcd1392d16cd60
|
|
| MD5 |
387d7403add90106a4762c91e0156e5f
|
|
| BLAKE2b-256 |
1363c8154ab5680cdd6d8d56408a57af8e64cb95c6d52dbdf5e19f5866722777
|
File details
Details for the file repoqa-0.1.2-py3-none-any.whl.
File metadata
- Download URL: repoqa-0.1.2-py3-none-any.whl
- Upload date:
- Size: 28.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f18e82dc678ff90478a80473a171a63fffc49563283a0714a942dabdc1be35ab
|
|
| MD5 |
46f4617c338f380875d0fcf8baa43c89
|
|
| BLAKE2b-256 |
e30de85e54061821c76af16fcc9264b6019a0c1391ce7024d6b6d5a96154cd4b
|