Skip to main content

LibSA4Py: Light-weight static analysis for extracting type hints and features

Project description

Intro

PyPI version GH Workflow codecov

LibSA4Py is a static analysis library for Python, which extracts type hints and features for training ML-based type inference models.

Requirements

  • Python 3.7 or newer (Python 3.8 is recommended)
  • Watchman (for running pyre) [Optional]
  • MacOS or Linux systems

Quick Installation

git clone https://github.com/saltudelft/libsa4py.git
cd libsa4py && pip install .

Usage

Processing projects

Given Python repositories, run the following command to process source code files and generate JSON-formatted outputs:

libsa4py process --p $REPOS_PATH --o $OUTPUT_PATH --d $DUPLICATE_PATH --j $WORKERS_COUNT --l $LIMIT --c --no-nlp --pyre

Description:

  • --p $REPOS_PATH: The path to the Python corpus or dataset.
  • --o $OUTPUT_PATH: Path to store processed projects.
  • --d $DUPLICATE_PATH: Path to duplicate files of the given dataset (i.e. jsonl.gz file produced by the CD4Py tool). [Optional]
  • --s: Path to the CSV file for splitting the given dataset. [Optional]
  • --j $WORKERS_COUNT: Number of workers for processing projects. [Optional, default=no. of available CPU cores]
  • --l $LIMIT: Number of projects to be processed. [Optional]
  • --c: Whether to ignore processed projects. [Optional, default=False]
  • --no-nlp: Whether to apply standard NLP techniques to extracted identifiers. [Optional, default=True]
  • --pyre: Whether to run pyre to infer the types of variables for given projects. [Optional, default=False]
  • --tc: Whether to type-check type annotations in projects. [Optional, default=False]

Merging projects

To merge all the processed JSON-formatted projects into a single dataframe, run the following command:

libsa4py merge --o $OUTPUT_PATH --l $LIMIT

Description:

  • --o $OUTPUT_PATH: Path to the processed projects, used in the previous processing step.
  • --l $LIMIT: Number of projects to be merged. [Optional]

Applying types

To apply Pyre's inferred types to projects, run the following command:

libsa4py apply --p $REPOS_PATH --o $OUTPUT_PATH

Description:

  • --p $REPOS_PATH: The path to the Python corpus or dataset.
  • --o $OUTPUT_PATH: Path to the processed projects, used in the previous processing step.

JSON Output

After processing each project, a JSON-formatted file is produced, which is described here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libsa4py-0.4.0.tar.gz (88.5 kB view details)

Uploaded Source

Built Distribution

libsa4py-0.4.0-py3-none-any.whl (41.8 kB view details)

Uploaded Python 3

File details

Details for the file libsa4py-0.4.0.tar.gz.

File metadata

  • Download URL: libsa4py-0.4.0.tar.gz
  • Upload date:
  • Size: 88.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for libsa4py-0.4.0.tar.gz
Algorithm Hash digest
SHA256 953f4b2e2204af2bd5901f88779c0ed41287ba43e5f26096377b3507a0161163
MD5 ab5defaa45c52a10acfaef05f7723344
BLAKE2b-256 c843bed5e39db0ff9b18354c2764be54155e668a8acf0aca5b216bdbe2a92943

See more details on using hashes here.

File details

Details for the file libsa4py-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: libsa4py-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 41.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for libsa4py-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c89559d5bcbedf75741b17c0c60ee2eb8ecf3d8273c54e61f19641ad94d160d
MD5 b09e7152f00bcbeae78ad6364daed973
BLAKE2b-256 60b5d68becae77e0e7c3417af0ea3a03ecfaa6d24af643e1712130bba4eee697

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page