"RepoQA for Evaluating Long-Context Code Understanding"
Project description
RepoQA: Evaluating Long-Context Code Understanding
🚀 Installation • 🏁 Search Needle Function • 📚 Read More
🚀 Installation
pip install repoqa
⏬ Install nightly version :: click to expand ::
pip install "git+https://github.com/evalplus/repoqa.git" --upgrade
⏬ Using RepoQA as a local repo? :: click to expand ::
git clone https://github.com/evalplus/repoqa.git
cd repoqa
export PYTHONPATH=$PYTHONPATH:$(pwd)
pip install -r requirements.txt
🏁 Search Needle Function
Inference with vLLM
repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --caching --backend vllm
Inference with OpenAI Compatible Servers
repoqa.search_needle_function --base-url "http://api.openai.com/v1" \
--model "gpt4-turbo" --caching --backend openai
Inference with HuggingFace transformers
repoqa.search_needle_function --model "gpt2" "Qwen/CodeQwen1.5-7B-Chat" --caching --backend hf
Usage
[!Tip]
- Input:
--model
: Hugging-Face model ID, such asise-uiuc/Magicoder-S-DS-6.7B
--backend
:vllm
(default) oropenai
--base-url
: OpenAI API base URL--code-context-size
(default: 16384): Number of tokens (using DeepSeekCoder tokenizer) of code in the long context--caching
(default: False): if enabled, the tokenization and chuncking results will be cached to accelerate subsequent runs--max-new-tokens
(default: 1024): Maximum number of new tokens to generate--system-message
(default: None): if given, the model use a system message (but note some models don't support system message)--tensor-parallel-size
: Number of tensor parallelism (only for vLLM)--languages
(default: None): List of languages to evaluate (None means all)--result-dir
(default: "results"): Directory to save the model outputs and evaluation results- Output:
results/ntoken_{code-context-size}/{model}.jsonl
: Model generated outputsresults/ntoken_{code-context-size}/{model}-SCORE.json
: Evaluation scores (also see Compute Scores)
Compute Scores
By default, the repoqa.search_needle_function
command will also compute scores after producing model outputs.
However, you can also compute scores separately using the following command:
repoqa.compute_score --model-output-path={model-output}.jsonl
[!Tip]
- Input: Path to the model generated outputs.
- Output: The evaluation scores would be stored in
{model-output}-SCORES.json
📚 Read More
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
repoqa-0.1.0rc1.tar.gz
(5.4 MB
view hashes)
Built Distribution
repoqa-0.1.0rc1-py3-none-any.whl
(21.5 kB
view hashes)
Close
Hashes for repoqa-0.1.0rc1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9872bf55374d41e9eb2d736f2b97f99ee55363cf9d46111156eb17dbd1af7a4c |
|
MD5 | 44ed0577ada97c828c5e72429ab463c8 |
|
BLAKE2b-256 | 3bd2bf7ccb9facb5a025cf88cfc5269e8f3521e65a6709ba5e6d3edd8ea26b74 |