Skip to main content

Code What-If-Tool

Project description

What-If-Code-Tool

Visualization Tool for Code Generation Model Analysis

demo-gif

Main Idea

Google WIT was the main inspiration for this project. Our goal is to create a similar tool purely for focusing on ML models revolving around software engineering design and principles, such as code completion and code generation.

BertViz is a good first example for where our tool will go. We hope to support a full dashboard of several views that researchers would find helpful in order to analyze their models. This would probably include newly generated word count charts, probability distributions for new tokens, and attention views.

Development

  • Pip tool: user can install this tool from pip/conda and utilize with their NLP model
  • Python Backend: user designates dataset and model as parameters for our tool. Our tool then runs the model and produces some vector dataset in its object.
  • Jupyter-Dash Frontend: Jupyter-Dash allows for easy creation for data dashboard. Provides ability for easy callback methods with just Python.

Development Plans

  • Code concept groupings view: categorize each of the tokens generated in output based on what type they are in code language (declaration, assignment, functions, etc.)
  • Display some statistics about the generated output with specific model (median, max, min, etc.)
  • Dynamics re-execution of pipeline when:
    • User edits # of tokens
    • User edits # of input sequences
    • User changes model
    • User selects new descriptive statistic
  • Implement bertviz attention models inside app with Dash if possible

Current Diagrams

Components UML

Components

Sequence Diagram

Sequence Diagram

Supported Features

  • 4 different views to visually classify code generation models (ind. token, token distrubtion, python token types, token type distribtuion)
  • 4 pre-trained models for code generation from Hugging Face (GPT2, CodeGen, CodeParrot, GPT-Neo)
  • Descriptive stats for datasets with many input sequences
  • Dynamic re-execution on user inputs

Installation

First prototype is currently available on PyPi. User will need to generate their own Hugging Face API token.

%pip install codewit-semeru
%load_ext autoreload
%autoreload 2
%pip install datasets

from datasets.load import load_dataset
import pandas as pd

DATA_LEN = 1024
NUM_DATA = 20

dataset = load_dataset("code_x_glue_cc_code_completion_line", "python", split="train")

pruned_dataset = []
for i, input_seq in enumerate(dataset):
    temp = input_seq["input"]  # type: ignore
    if len(temp) <= DATA_LEN:
        pruned_dataset.append(temp)
    if len(pruned_dataset) >= NUM_DATA:
        break
pd.DataFrame(pruned_dataset).describe()
import os

os.environ["HF_API_TOKEN"] = "{Insert token here}"

from codewit_semeru import WITCode
WITCode("codeparrot/codeparrot-small", pruned_dataset)

These lines can be run directly from your notebook. Python 3.8 is required. First chunk installs pip module, load auto-reload function. Second chunk loads up the CodeXGlue Code Completion dataset to be utilized with our tool. The last block is the actual implementaion in notebook to run our tool. User needs to supply their own api token to query HF models.

Build and Run Docker Image

Start docker

Navigate to project folder and run docker-compose up -d --build to build image

Navigate to localhost:8888 to run jupyter notebook. password is wit

To stop docker container run docker-compose down

Build and Run Docker Image

Start docker

Navigate to project folder and run docker-compose up -d --build to build image

Navigate to localhost:8888 to run jupyter notebook. password is wit

To stop docker container run docker-compose down

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codewit_semeru-0.1.4.tar.gz (76.5 MB view hashes)

Uploaded Source

Built Distribution

codewit_semeru-0.1.4-py3-none-any.whl (12.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page