LibSA4Py: Light-weight static analysis for extracting type hints and features
Project description
Intro
LibSA4Py
is a static analysis library for Python, which extracts type hints and features for training ML-based type inference models.
Requirements
- Python 3.6 or newer (Python 3.8 is recommended)
- Watchman (for running pyre) [Optional]
- MacOS or Linux systems
Quick Installation
git clone https://github.com/saltudelft/libsa4py.git
cd libsa4py && pip install .
Usage
Processing projects
Given Python repositories, run the following command to process source code files and generate JSON-formatted outputs:
libsa4py process --p $REPOS_PATH --o $OUTPUT_PATH --d $DUPLICATE_PATH --j $WORKERS_COUNT --l $LIMIT --c --no-nlp --pyre
Description:
--p $REPOS_PATH
: The path to the Python corpus or dataset.--o $OUTPUT_PATH
: Path to store processed projects.--d $DUPLICATE_PATH
: Path to duplicate files of the given dataset (i.e. jsonl.gz file produced by the CD4Py tool). [Optional]--s
: Path to the CSV file for splitting the given dataset. [Optional]--j $WORKERS_COUNT
: Number of workers for processing projects. [Optional, default=no. of available CPU cores]--l $LIMIT
: Number of projects to be processed. [Optional]--c
: Whether to ignore processed projects. [Optional, default=False]--no-nlp
: Whether to apply standard NLP techniques to extracted identifiers. [Optional, default=True]--pyre
: Whether to runpyre
to infer the types of variables for given projects. [Optional, default=False]
Merging projects
To merge all the processed JSON-formatted projects into a single dataframe, run the following command:
libsa4py merge --o $OUTPUT_PATH --l $LIMIT
Description:
--o $OUTPUT_PATH
: Path to the processed projects, used in the previous processing step.--l $LIMIT
: Number of projects to be merged. [Optional]
Applying types
To apply Pyre's inferred types to projects, run the following command:
libsa4py apply --p $REPOS_PATH --o $OUTPUT_PATH
Description:
--p $REPOS_PATH
: The path to the Python corpus or dataset.--o $OUTPUT_PATH
: Path to the processed projects, used in the previous processing step.
JSON Output
After processing each project, a JSON-formatted file is produced, which is described here.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file libsa4py-0.2.0.tar.gz
.
File metadata
- Download URL: libsa4py-0.2.0.tar.gz
- Upload date:
- Size: 30.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e13a938614412c03b128ee913df5be4fff17cef1140b67b55b58ebb9c457fefe |
|
MD5 | a0b9b6cfa2195a12db9b412ff52ee4d2 |
|
BLAKE2b-256 | d3c971c2797760a19d7355eaa0154d1998eef1a1d3122dbb63abfa78764505e8 |
File details
Details for the file libsa4py-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: libsa4py-0.2.0-py3-none-any.whl
- Upload date:
- Size: 37.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 538b36a6dc32cc2aa8cf326abbdda472dcbe0575ba0f4bbcae0234ee604a7845 |
|
MD5 | b0028bf1808691350c97e6deb490d819 |
|
BLAKE2b-256 | 868f21891f1dc62f48843ebf659c4f19319d15c37dfa0317f1e1c824973f89a7 |