Skip to main content

Code What-If-Tool

Project description

What-If-Code-Tool

Visualization Tool for Code Generation Model Analysis

demo-gif

Main Idea

Google WIT was the main inspiration for this project. Our goal is to create a similar tool purely for focusing on ML models revolving around software engineering design and principles, such as code completion and code generation.

BertViz is a good first example for where our tool will go. We hope to support a full dashboard of several views that researchers would find helpful in order to analyze their models. This would probably include newly generated word count charts, probability distributions for new tokens, and attention views.

Development

  • Pip tool: user can install this tool from pip/conda and utilize with their NLP model
  • Python Backend: user designates dataset and model as parameters for our tool. Our tool then runs the model and produces some vector dataset in its object.
  • Jupyter-Dash Frontend: Jupyter-Dash allows for easy creation for data dashboard. Provides ability for easy callback methods with just Python.

Development Plans

  • Code concept groupings view: categorize each of the tokens generated in output based on what type they are in code language (declaration, assignment, functions, etc.)
  • Display some statistics about the generated output with specific model (median, max, min, etc.)
  • Dynamics re-execution of pipeline when:
    • User edits # of tokens
    • User edits # of input sequences
    • User changes model
    • User selects new descriptive statistic
  • Implement bertviz attention models inside app with Dash if possible

Current Diagrams

Components UML

Components

Sequence Diagram

Sequence Diagram

Supported Features

  • 4 different views to visually classify code generation models (ind. token, token distrubtion, python token types, token type distribtuion)
  • 4 pre-trained models for code generation from Hugging Face (GPT2, CodeGen, CodeParrot, GPT-Neo)
  • Descriptive stats for datasets with many input sequences
  • Dynamic re-execution on user inputs

Installation

First prototype is currently available on PyPi. User will need to generate their own Hugging Face API token.

%pip install codewit-semeru
%load_ext autoreload
%autoreload 2
%pip install datasets

from datasets.load import load_dataset
import pandas as pd

DATA_LEN = 1024
NUM_DATA = 20

dataset = load_dataset("code_x_glue_cc_code_completion_line", "python", split="train")

pruned_dataset = []
for i, input_seq in enumerate(dataset):
    temp = input_seq["input"]  # type: ignore
    if len(temp) <= DATA_LEN:
        pruned_dataset.append(temp)
    if len(pruned_dataset) >= NUM_DATA:
        break
pd.DataFrame(pruned_dataset).describe()
import os

os.environ["HF_API_TOKEN"] = "{Insert token here}"

from codewit_semeru import WITCode
WITCode("codeparrot/codeparrot-small", pruned_dataset)

These lines can be run directly from your notebook. Python 3.8 is required. First chunk installs pip module, load auto-reload function. Second chunk loads up the CodeXGlue Code Completion dataset to be utilized with our tool. The last block is the actual implementaion in notebook to run our tool. User needs to supply their own api token to query HF models.

Build and Run Docker Image

Start docker

Navigate to project folder and run docker-compose up -d --build to build image

Navigate to localhost:8888 to run jupyter notebook. password is wit

To stop docker container run docker-compose down

Build and Run Docker Image

Start docker

Navigate to project folder and run docker-compose up -d --build to build image

Navigate to localhost:8888 to run jupyter notebook. password is wit

To stop docker container run docker-compose down

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codewit_semeru-0.1.4.tar.gz (76.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codewit_semeru-0.1.4-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file codewit_semeru-0.1.4.tar.gz.

File metadata

  • Download URL: codewit_semeru-0.1.4.tar.gz
  • Upload date:
  • Size: 76.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for codewit_semeru-0.1.4.tar.gz
Algorithm Hash digest
SHA256 424b5b77ae2412d3260c0692692c029f47300c58dd0d62dfbf83b66ea594519d
MD5 44092ecd35cca258d8f20019fa505148
BLAKE2b-256 ca3b433ef0439f21fa7a945a26a9a372bd51b034283ebf6db2adfc6a060e0265

See more details on using hashes here.

File details

Details for the file codewit_semeru-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: codewit_semeru-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for codewit_semeru-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7b8f4fe0fd568e13c0a36c51caddbb76dcffb1a7025eea043b4448072d9708bf
MD5 999b8dd801dcfa14a2bbe351aaab0cdb
BLAKE2b-256 ee450f7c6a68522a52f4061668068ea23028495748f5232fb093d307cb6e66f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page