Skip to main content

LibSA4Py: Light-weight static analysis for extracting type hints and features

Project description

Intro

PyPI version GH Workflow codecov

LibSA4Py is a static analysis library for Python, which extracts type hints and features for training ML-based type inference models.

Requirements

  • Python 3.7 or newer (Python 3.8 is recommended)
  • Watchman (for running pyre) [Optional]
  • MacOS or Linux systems

Quick Installation

git clone https://github.com/saltudelft/libsa4py.git
cd libsa4py && pip install .

Usage

Processing projects

Given Python repositories, run the following command to process source code files and generate JSON-formatted outputs:

libsa4py process --p $REPOS_PATH --o $OUTPUT_PATH --d $DUPLICATE_PATH --j $WORKERS_COUNT --l $LIMIT --c --no-nlp --pyre

Description:

  • --p $REPOS_PATH: The path to the Python corpus or dataset.
  • --o $OUTPUT_PATH: Path to store processed projects.
  • --d $DUPLICATE_PATH: Path to duplicate files of the given dataset (i.e. jsonl.gz file produced by the CD4Py tool). [Optional]
  • --s: Path to the CSV file for splitting the given dataset. [Optional]
  • --j $WORKERS_COUNT: Number of workers for processing projects. [Optional, default=no. of available CPU cores]
  • --l $LIMIT: Number of projects to be processed. [Optional]
  • --c: Whether to ignore processed projects. [Optional, default=False]
  • --no-nlp: Whether to apply standard NLP techniques to extracted identifiers. [Optional, default=True]
  • --pyre: Whether to run pyre to infer the types of variables for given projects. [Optional, default=False]
  • --tc: Whether to type-check type annotations in projects. [Optional, default=False]

Merging projects

To merge all the processed JSON-formatted projects into a single dataframe, run the following command:

libsa4py merge --o $OUTPUT_PATH --l $LIMIT

Description:

  • --o $OUTPUT_PATH: Path to the processed projects, used in the previous processing step.
  • --l $LIMIT: Number of projects to be merged. [Optional]

Applying types

To apply Pyre's inferred types to projects, run the following command:

libsa4py apply --p $REPOS_PATH --o $OUTPUT_PATH

Description:

  • --p $REPOS_PATH: The path to the Python corpus or dataset.
  • --o $OUTPUT_PATH: Path to the processed projects, used in the previous processing step.

JSON Output

After processing each project, a JSON-formatted file is produced, which is described here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libsa4py-0.4.0.tar.gz (88.5 kB view hashes)

Uploaded Source

Built Distribution

libsa4py-0.4.0-py3-none-any.whl (41.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page