"RepoQA for Evaluating Long-Context Code Understanding"

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Project description

RepoQA: Evaluating Long-Context Code Understanding

🚀 Installation • 🏁 Search Needle Function • 📚 Read More

🚀 Installation

pip install repoqa

⏬ Install nightly version :: click to expand ::

pip install "git+https://github.com/evalplus/repoqa.git" --upgrade

⏬ Using RepoQA as a local repo? :: click to expand ::

git clone https://github.com/evalplus/repoqa.git
cd repoqa
export PYTHONPATH=$PYTHONPATH:$(pwd)
pip install -r requirements.txt

🏁 Search Needle Function

Inference with vLLM

repoqa.search_needle_function --model "Qwen/CodeQwen1.5-7B-Chat" --caching --backend vllm

Inference with OpenAI Compatible Servers

repoqa.search_needle_function --base-url "http://api.openai.com/v1" \
                              --model "gpt4-turbo" --caching --backend openai

Inference with HuggingFace transformers

repoqa.search_needle_function --model "gpt2" "Qwen/CodeQwen1.5-7B-Chat" --caching --backend hf

Usage

[!Tip]

Input:

--model: Hugging-Face model ID, such as ise-uiuc/Magicoder-S-DS-6.7B

--backend: vllm (default) or openai

--base-url: OpenAI API base URL

--code-context-size (default: 16384): Number of tokens (using DeepSeekCoder tokenizer) of code in the long context

--caching (default: False): if enabled, the tokenization and chuncking results will be cached to accelerate subsequent runs

--max-new-tokens (default: 1024): Maximum number of new tokens to generate

--system-message (default: None): if given, the model use a system message (but note some models don't support system message)

--tensor-parallel-size: Number of tensor parallelism (only for vLLM)

--languages (default: None): List of languages to evaluate (None means all)

--result-dir (default: "results"): Directory to save the model outputs and evaluation results

Output:

results/ntoken_{code-context-size}/{model}.jsonl: Model generated outputs

results/ntoken_{code-context-size}/{model}-SCORE.json: Evaluation scores (also see Compute Scores)

Compute Scores

By default, the repoqa.search_needle_function command will also compute scores after producing model outputs. However, you can also compute scores separately using the following command:

repoqa.compute_score --model-output-path={model-output}.jsonl

[!Tip]

Input: Path to the model generated outputs.

Output: The evaluation scores would be stored in {model-output}-SCORES.json

📚 Read More

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Release history Release notifications | RSS feed

0.1.2

May 25, 2024

0.1.1

May 19, 2024

0.1.0

Apr 26, 2024

0.1.0rc2 pre-release

Apr 24, 2024

This version

0.1.0rc1 pre-release

Apr 24, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repoqa-0.1.0rc1.tar.gz (5.4 MB view hashes)

Uploaded Apr 24, 2024 Source

Built Distribution

repoqa-0.1.0rc1-py3-none-any.whl (21.5 kB view hashes)

Uploaded Apr 24, 2024 Python 3

Hashes for repoqa-0.1.0rc1.tar.gz

Hashes for repoqa-0.1.0rc1.tar.gz
Algorithm	Hash digest
SHA256	`de3648303cc388cf2f079fa7d6b34a8fdc7f55929c770e9fa49dd7a0cf68ceb4`
MD5	`6b5676e251f1cc7e9493fcc9c9609e94`
BLAKE2b-256	`43e5e44200e420e9471b6ab56c1421524b5393fc3affdf36f4a682570d56e4b6`

Hashes for repoqa-0.1.0rc1-py3-none-any.whl

Hashes for repoqa-0.1.0rc1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9872bf55374d41e9eb2d736f2b97f99ee55363cf9d46111156eb17dbd1af7a4c`
MD5	`44ed0577ada97c828c5e72429ab463c8`
BLAKE2b-256	`3bd2bf7ccb9facb5a025cf88cfc5269e8f3521e65a6709ba5e6d3edd8ea26b74`