Skip to main content

HeaderGen: Automated cell header generator

Project description

HeaderGen

HeaderGen

HeaderGen is a tool-based approach to enhance the comprehension and navigation of undocumented Python based Jupyter notebooks by automatically creating a narrative structure in the notebook.

Data scientists build an ML-based solution notebook by first preparing the data, then extracting key features, and then creating and training the model. HeaderGen leverages the implicit narrative structure of an ML notebook to add structural headers as annotations to the notebook.

Features

  • Automated Markdown Header Insertion: Through a taxonomy for machine-learning operations, HeaderGen annotates code cells with relevant markdown headers.

  • Function Call Taxonomy: Methodically classifies function calls based on a machine-learning operations taxonomy.

  • Advanced Call Graph Analysis: Enhances PyCG framework with flow-sensitivity and external library return-type resolution.

  • Precision in External Libraries: capability to accurately resolve function return types from external libraries using typestubs.

  • Syntax Pattern Matching: Employs type data for pattern matching.

Folder Structure

  • callsites-jupyternb-micro-benchmark: Micro benchmark
  • callsites-jupyternb-real-world-benchmark: Real-world benchmark
  • evaluation: Contains manual header annotation and user study results
  • framework_models: Function calls to ML Taxonomy mapping
  • typestub-database: Type-stbs for ML libraries
  • headergen: Source code of HeaderGen
  • pycg_extended: Source code of extended PyCG
  • headergen-extension: Jupyter notebook plugin for HG
  • headergen_output: Folder where the generated notebooks from the docker container are stored

1. Build container

  • Get source files

    git clone --recursive
    git submodule update --init --recursive
    git pull --recurse-submodules
    
  • Linux

    docker build -t headergen .
    docker run -v {$PWD}/headergen_output:/results -it headergen bash
    
  • Windows

    docker build -t headergen .
    docker run -v "%cd%"/headergen_output:/results -it headergen bash
    

2. Run HeaderGen benchmarks from inside contatiner

Output generated from the following commands, such as annotated notebooks, reports, callsites, headers, etc, are stored in the local folder headergen_output after the following commands are done executing.

  • Micro Benchmark (generates a csv file with results)

    make microbench
    
  • Real-world Benchmark (generates annotated notebooks and csv file that reproduce table 2)

    make realworldbench
    
  • Both Benchmarks

    make all
    
  • Clean generated output

    make clean
    

This repo contains code for the paper "Enhancing Comprehension and Navigation in Jupyter Notebooks with Static Analysis" published at the SANER Conference 2023.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

headergen-1.0.0.tar.gz (6.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

headergen-1.0.0-py3-none-any.whl (14.0 MB view details)

Uploaded Python 3

File details

Details for the file headergen-1.0.0.tar.gz.

File metadata

  • Download URL: headergen-1.0.0.tar.gz
  • Upload date:
  • Size: 6.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.31.0 rfc3986/1.5.0 tqdm/4.65.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for headergen-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3ef43590101f61b92fbe4076c3eabefa7cc45cd573b2eaf3959496d32cf8ba28
MD5 20066f0fbf30752a6e677f85f6babe04
BLAKE2b-256 8953efc30b64f85016ffbe2bb29ae6f8c809603a2e367ea1ca4a860be7f051a3

See more details on using hashes here.

File details

Details for the file headergen-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: headergen-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 14.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.31.0 rfc3986/1.5.0 tqdm/4.65.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for headergen-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 06fc012d079e56650fea720e8680fd54eb6a984ce352324ad37003ec8d29a1db
MD5 c72a7f6e5040e5c000ab0b29739fee96
BLAKE2b-256 74382a6c2520420998b21e1eed14218a85cb6af5fa2c283e34fd135781f1d6bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page