Skip to main content

"RepoQA for Evaluating Long-Context Code Understanding"

Project description

RepoQA: Evaluating Long-Context Code Understanding

🚀 Installation🏁 Search Needle Function📚 Read More

🚀 Installation

pip install repoqa
⏬ Install nightly version :: click to expand ::
pip install "git+https://github.com/evalplus/repoqa.git" --upgrade
⏬ Using RepoQA as a local repo? :: click to expand ::
git clone https://github.com/evalplus/repoqa.git
cd repoqa
export PYTHONPATH=$PYTHONPATH:$(pwd)
pip install -r requirements.txt

🏁 Search Needle Function

Inference with vLLM

repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --caching --backend vllm

Inference with OpenAI Compatible Servers

repoqa.search_needle_function --base-url "http://api.openai.com/v1" \
                              --model "gpt4-turbo" --caching --backend openai

Inference with HuggingFace transformers

repoqa.search_needle_function --model "gpt2" "Qwen/CodeQwen1.5-7B-Chat" --caching --backend hf

Usage

[!Tip]

  • Input:
    • --model: Hugging-Face model ID, such as ise-uiuc/Magicoder-S-DS-6.7B
    • --backend: vllm (default) or openai
    • --base-url: OpenAI API base URL
    • --code-context-size (default: 16384): Number of tokens (using DeepSeekCoder tokenizer) of code in the long context
    • --caching (default: False): if enabled, the tokenization and chuncking results will be cached to accelerate subsequent runs
    • --max-new-tokens (default: 1024): Maximum number of new tokens to generate
    • --system-message (default: None): if given, the model use a system message (but note some models don't support system message)
    • --tensor-parallel-size: Number of tensor parallelism (only for vLLM)
    • --languages (default: None): List of languages to evaluate (None means all)
    • --result-dir (default: "results"): Directory to save the model outputs and evaluation results
  • Output:
    • results/ntoken_{code-context-size}/{model}.jsonl: Model generated outputs
    • results/ntoken_{code-context-size}/{model}-SCORE.json: Evaluation scores (also see Compute Scores)

Compute Scores

By default, the repoqa.search_needle_function command will also compute scores after producing model outputs. However, you can also compute scores separately using the following command:

repoqa.compute_score --model-output-path={model-output}.jsonl

[!Tip]

  • Input: Path to the model generated outputs.
  • Output: The evaluation scores would be stored in {model-output}-SCORES.json

📚 Read More

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repoqa-0.1.0rc1.tar.gz (5.4 MB view hashes)

Uploaded Source

Built Distribution

repoqa-0.1.0rc1-py3-none-any.whl (21.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page