Skip to main content

LibSA4Py: Light-weight static analysis for extracting type hints and features

Project description

Intro

PyPI version GH Workflow

LibSA4Py is a static analysis library for Python, which extracts type hints and features for training ML-based type inference models.

Requirements

  • Python 3.6 or newer (Python 3.8 is recommended)
  • Watchman (for running pyre) [Optional]
  • MacOS or Linux systems

Quick Installation

git clone https://github.com/saltudelft/libsa4py.git
cd libsa4py && pip install .

Usage

Processing projects

Given Python repositories, run the following command to process source code files and generate JSON-formatted outputs:

libsa4py process --p $REPOS_PATH --o $OUTPUT_PATH --d $DUPLICATE_PATH --j $WORKERS_COUNT --l $LIMIT --c --no-nlp --pyre

Description:

  • --p $REPOS_PATH: The path to the Python corpus or dataset.
  • --o $OUTPUT_PATH: Path to store processed projects.
  • --d $DUPLICATE_PATH: Path to duplicate files of the given dataset (i.e. jsonl.gz file produced by the CD4Py tool). [Optional]
  • --s: Path to the CSV file for splitting the given dataset. [Optional]
  • --j $WORKERS_COUNT: Number of workers for processing projects. [Optional, default=no. of available CPU cores]
  • --l $LIMIT: Number of projects to be processed. [Optional]
  • --c: Whether to ignore processed projects. [Optional, default=False]
  • --no-nlp: Whether to apply standard NLP techniques to extracted identifiers. [Optional, default=True]
  • --pyre: Whether to run pyre to infer the types of variables for given projects. [Optional, default=False]

Merging projects

To merge all the processed JSON-formatted projects into a single dataframe, run the following command:

libsa4py merge --o $OUTPUT_PATH --l $LIMIT

Description:

  • --o $OUTPUT_PATH: Path to the processed projects, used in the previous processing step.
  • --l $LIMIT: Number of projects to be merged. [Optional]

Applying types

To apply Pyre's inferred types to projects, run the following command:

libsa4py apply --p $REPOS_PATH --o $OUTPUT_PATH

Description:

  • --p $REPOS_PATH: The path to the Python corpus or dataset.
  • --o $OUTPUT_PATH: Path to the processed projects, used in the previous processing step.

JSON Output

After processing each project, a JSON-formatted file is produced, which is described here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libsa4py-0.2.0.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

libsa4py-0.2.0-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file libsa4py-0.2.0.tar.gz.

File metadata

  • Download URL: libsa4py-0.2.0.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.5

File hashes

Hashes for libsa4py-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e13a938614412c03b128ee913df5be4fff17cef1140b67b55b58ebb9c457fefe
MD5 a0b9b6cfa2195a12db9b412ff52ee4d2
BLAKE2b-256 d3c971c2797760a19d7355eaa0154d1998eef1a1d3122dbb63abfa78764505e8

See more details on using hashes here.

File details

Details for the file libsa4py-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: libsa4py-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.5

File hashes

Hashes for libsa4py-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 538b36a6dc32cc2aa8cf326abbdda472dcbe0575ba0f4bbcae0234ee604a7845
MD5 b0028bf1808691350c97e6deb490d819
BLAKE2b-256 868f21891f1dc62f48843ebf659c4f19319d15c37dfa0317f1e1c824973f89a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page