"RepoQA for Evaluating Long-Context Code Understanding"
Project description
RepoQA: Evaluating Long-Context Code Understanding
🚀 Installation • 🏁 Search Needle Function • 📚 Read More
🚀 Installation
# without vLLM (can run openai, anthropic, and huggingface backends)
pip install --upgrade repoqa
# with vLLM
pip install --upgrade "repoqa[vllm]"
⏬ Install nightly version :: click to expand ::
pip install --upgrade "git+https://github.com/evalplus/repoqa.git" # without vLLM
pip install --upgrade "repoqa[vllm] @ git+https://github.com/evalplus/repoqa@main" # with vLLM
⏬ Using RepoQA as a local repo? :: click to expand ::
git clone https://github.com/evalplus/repoqa.git
cd repoqa
export PYTHONPATH=$PYTHONPATH:$(pwd)
pip install -r requirements.txt
🏁 Search Needle Function
Inference with OpenAI Compatible Servers
repoqa.search_needle_function --model "gpt4-turbo" --caching --backend openai
# 💡 If you use customized server such vLLM:
# repoqa.search_needle_function --base-url "http://url.to.vllm.server/v1" \
# --model "gpt4-turbo" --caching --backend openai
Inference with Anthropic Compatible Servers
repoqa.search_needle_function --model "claude-3-haiku-20240307" --caching --backend anthropic
Inference with vLLM
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" \
--caching --backend vllm
Inference with HuggingFace transformers
repoqa.search_needle_function --model "gpt2" "Qwen/CodeQwen1.5-7B-Chat" \
--caching --backend hf --trust-remote-code
Usage
[!Tip]
- Input:
--model
: Hugging-Face model ID, such asise-uiuc/Magicoder-S-DS-6.7B
--backend
:vllm
(default) oropenai
--base-url
: OpenAI API base URL--code-context-size
(default: 16384): Number of tokens (using DeepSeekCoder tokenizer) of code in the long context--caching
(default: False): if enabled, the tokenization and chuncking results will be cached to accelerate subsequent runs--max-new-tokens
(default: 1024): Maximum number of new tokens to generate--system-message
(default: None): if given, the model use a system message (but note some models don't support system message)--tensor-parallel-size
: Number of tensor parallelism (only for vLLM)--languages
(default: None): List of languages to evaluate (None means all)--result-dir
(default: "results"): Directory to save the model outputs and evaluation results- Output:
results/ntoken_{code-context-size}/{model}.jsonl
: Model generated outputsresults/ntoken_{code-context-size}/{model}-SCORE.json
: Evaluation scores (also see Compute Scores)
Compute Scores
By default, the repoqa.search_needle_function
command will also compute scores after producing model outputs.
However, you can also compute scores separately using the following command:
repoqa.compute_score --model-output-path={model-output}.jsonl
[!Tip]
- Input: Path to the model generated outputs.
- Output: The evaluation scores would be stored in
{model-output}-SCORES.json
📚 Read More
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
repoqa-0.1.0rc2.tar.gz
(5.4 MB
view hashes)
Built Distribution
repoqa-0.1.0rc2-py3-none-any.whl
(23.2 kB
view hashes)
Close
Hashes for repoqa-0.1.0rc2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04cca708ff5950d659b3e9afe103411b3d2f1e4be5b9e2860a6bf144f2783286 |
|
MD5 | afb07b7bb949bb102bca34f3ffce1362 |
|
BLAKE2b-256 | 46b5dd3255147a464e52e7bf8a155e11189399fed31e1a4bed6ab2d57a944610 |