Skip to main content

Code Model Trust Evaluation

Project description

IdentityChain

Installation

Create and Activate a Conda Environment.

conda create -n idchain python=3.10
conda activate idchain

Clone this Repository to your Local Environment.

git clone https://github.com/marcusm117/IdentityChain.git

Install the Library with all Dependencies.

cd IdentityChain
make develop

Usage

Before the self-consistency evaluation, you need to make sure that one of the followings is satisfied:

  1. Your model is an Instruction-tuned Code LLM, and it's trained on both NL-to-PL and PL-to-NL tasks.
  2. Your model is a Foundation Code LLM, and it's trained on both completion and fill-in-the-middle tasks.

To evaluate your model using IdentityChain, you need to prepare the followings:

  1. An evaluation dataset in the format of one of the followings (you can also use these two directly):
  2. An NL-to-PL prompt for your model
  3. A PL-to-NL prompt for your model
  4. An NL-to-PL generation function for your model
  5. A PL-to-NL generation function for your model

See run_identity_chain_openai.py for an example of how to use IdentityChain to evaluate OpenAI models.

See run_identity_chain_huggingface.py for an example of how to use IdentityChain to evaluate HuggingFace open-source models. This example script already includes the following models:

  1. CodeLlama-Instruct-hf (7B, 13B, 34B)
  2. CodeLlama-hf (7B, 13B, 34B)
  3. starchat-beta
  4. starcoder
  5. starcoderplus
  6. starcoderbase (1B, 3B, 7B, 15B)

Example

Use run_identity_chain.sh to execute scripts run_identity_chain_openai.py or run_identity_chain_huggingface.py, which conducts several IdentityChain evaluation in a batch. Make sure that you modify the followings before running the script:

  1. export CUDA_VISIBLE_DEVICES=0 to specify the local GPU device you want to use
  2. export HF_HOME=YOUR_OWN_PATH/huggingface to specify your own huggingface home path, where the model checkpoints will be cached
  3. export IDENTITY_CHAIN_HOME=YOUR_OWN_PATH/IdentityChain to your own IdentityChain home path
  4. other parameters in the script for your own needs

Then run the script:

cd examples
bash run_identity_chain.sh

This script will create a temporary folder tmp under your IdentityChain home path, and store the results of IdentityChain evaluation in this folder, which will be a jsonl file. For example, tmp/starcoderbase-1b/IDChain_starcoderbase-1b_tmp0.0g_len5_pb_all_m_v1_EvalPlus-Mini-v0.1.6_reformatted.jsonl.

Use analyze_results.py to analyze the results of IdentityChain evaluation. It will geneartes an xlsx file, which contains the following information:

  1. The SC (Self-Consistency) and SSC (Strong Self-Consistency) scores of the model at each self-iteration step. Note that SSC_0 is just Pass@1
  2. The aggregated TOM score (also BLEU and CodeBLEU) information at each step for the following 4 types of resulsts: Pass-Pass, Pass-Fail, Fail-Fail, Fail-Pass
  3. The TOM score (also BLEU and CodeBLEU) trajectory at each self-iteration step for each sample in the eavluation set.
  4. The raw test case outputs at each self-iteration step
cd ../scripts
python analyze_results.py --input_path ../tmp/starcoderbase-1b/IDChain_starcoderbase-1b_tmp0.0g_len5_pb_all_m_v1_EvalPlus-Mini-v0.1.6_reformatted.jsonl --chain_length 5

The analyzed results will give you a sense of the model's overall performance, and the TOM score trajectory will help you pinpoint the exact step where the model makes a mistake.

Use browse_results.py to browse the results of IdentityChain evaluation. You can use this script to manually examine and study the mistakes made by the model for specific samples.

cd ../scripts
python browse_results.py --input_path ../tmp/starcoderbase-1b/IDChain_starcoderbase-1b_tmp0.0g_len5_pb_all_m_v1_EvalPlus-Mini-v0.1.6_reformatted.jsonl --chain_length 5 --start 0

Linting & Testing

We use a Makefile as a command registry:

  • make format: autoformat this library with black
  • make lint: perform static analysis of this library with black and flake8
  • make annotate: run type checking using mypy
  • make test: run automated tests
  • make check: check assets for packaging

Make sure that make lint, make test, and make check all pass locally before submitting a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

identitychain-0.0.1.tar.gz (47.8 kB view details)

Uploaded Source

File details

Details for the file identitychain-0.0.1.tar.gz.

File metadata

  • Download URL: identitychain-0.0.1.tar.gz
  • Upload date:
  • Size: 47.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for identitychain-0.0.1.tar.gz
Algorithm Hash digest
SHA256 e754b2b1af274c31f0a1216b5057fa7fbcdb44140daba8b86350aea8573b4ad5
MD5 0e8fb48c4be18383bc8474ac7045c7a4
BLAKE2b-256 6da0ecdeea9f291f6d8234030a9602847223c47d528fa323fe287be082b1157a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page