Code Model Trust Evaluation
Project description
IdentityChain
Installation
Create and Activate a Conda Environment.
conda create -n idchain python=3.10
conda activate idchain
Clone this Repository to your Local Environment.
git clone https://github.com/marcusm117/IdentityChain.git
Install the Library with all Dependencies.
cd IdentityChain
make develop
Usage
Before the self-consistency evaluation, you need to make sure that one of the followings is satisfied:
- Your model is an Instruction-tuned Code LLM, and it's trained on both NL-to-PL and PL-to-NL tasks.
- Your model is a Foundation Code LLM, and it's trained on both completion and fill-in-the-middle tasks.
To evaluate your model using IdentityChain, you need to prepare the followings:
- An evaluation dataset in the format of one of the followings (you can also use these two directly):
- An NL-to-PL prompt for your model
- A PL-to-NL prompt for your model
- An NL-to-PL generation function for your model
- A PL-to-NL generation function for your model
See run_identity_chain_openai.py for an example of how to use IdentityChain to evaluate OpenAI models.
See run_identity_chain_huggingface.py for an example of how to use IdentityChain to evaluate HuggingFace open-source models. This example script already includes the following models:
- CodeLlama-Instruct-hf (7B, 13B, 34B)
- CodeLlama-hf (7B, 13B, 34B)
- starchat-beta
- starcoder
- starcoderplus
- starcoderbase (1B, 3B, 7B, 15B)
Example
Use run_identity_chain.sh to execute scripts run_identity_chain_openai.py or run_identity_chain_huggingface.py, which conducts several IdentityChain evaluation in a batch. Make sure that you modify the followings before running the script:
export CUDA_VISIBLE_DEVICES=0
to specify the local GPU device you want to useexport HF_HOME=YOUR_OWN_PATH/huggingface
to specify your own huggingface home path, where the model checkpoints will be cachedexport IDENTITY_CHAIN_HOME=YOUR_OWN_PATH/IdentityChain
to your own IdentityChain home path- other parameters in the script for your own needs
Then run the script:
cd examples
bash run_identity_chain.sh
This script will create a temporary folder tmp
under your IdentityChain home path, and store the results of IdentityChain evaluation in this folder, which will be a jsonl
file. For example, tmp/starcoderbase-1b/IDChain_starcoderbase-1b_tmp0.0g_len5_pb_all_m_v1_EvalPlus-Mini-v0.1.6_reformatted.jsonl
.
Use analyze_results.py to analyze the results of IdentityChain evaluation. It will geneartes an xlsx
file, which contains the following information:
- The SC (Self-Consistency) and SSC (Strong Self-Consistency) scores of the model at each self-iteration step. Note that SSC_0 is just Pass@1
- The aggregated TOM score (also BLEU and CodeBLEU) information at each step for the following 4 types of resulsts: Pass-Pass, Pass-Fail, Fail-Fail, Fail-Pass
- The TOM score (also BLEU and CodeBLEU) trajectory at each self-iteration step for each sample in the eavluation set.
- The raw test case outputs at each self-iteration step
cd ../scripts
python analyze_results.py --input_path ../tmp/starcoderbase-1b/IDChain_starcoderbase-1b_tmp0.0g_len5_pb_all_m_v1_EvalPlus-Mini-v0.1.6_reformatted.jsonl --chain_length 5
The analyzed results will give you a sense of the model's overall performance, and the TOM score trajectory will help you pinpoint the exact step where the model makes a mistake.
Use browse_results.py to browse the results of IdentityChain evaluation. You can use this script to manually examine and study the mistakes made by the model for specific samples.
cd ../scripts
python browse_results.py --input_path ../tmp/starcoderbase-1b/IDChain_starcoderbase-1b_tmp0.0g_len5_pb_all_m_v1_EvalPlus-Mini-v0.1.6_reformatted.jsonl --chain_length 5 --start 0
Linting & Testing
We use a Makefile
as a command registry:
make format
: autoformat this library withblack
make lint
: perform static analysis of this library withblack
andflake8
make annotate
: run type checking usingmypy
make test
: run automated testsmake check
: check assets for packaging
Make sure that make lint
, make test
, and make check
all pass locally before submitting a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file identitychain-0.0.1.tar.gz
.
File metadata
- Download URL: identitychain-0.0.1.tar.gz
- Upload date:
- Size: 47.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e754b2b1af274c31f0a1216b5057fa7fbcdb44140daba8b86350aea8573b4ad5 |
|
MD5 | 0e8fb48c4be18383bc8474ac7045c7a4 |
|
BLAKE2b-256 | 6da0ecdeea9f291f6d8234030a9602847223c47d528fa323fe287be082b1157a |