Polyglot Piranha is a library for performing structural find and replace with deep cleanup.
Project description
Polyglot Piranha
Polyglot Piranha is a flexible multilingual structural search/replace engine that allows users to apply chains of interdependent structural search/replace rules for deeper cleanups. Polyglot Piranha builds upon tree-sitter queries for expressing the structural search/replace rules.
This repository contains the Polyglot Piranha framework and pre-built cleanup rules that can be leveraged for deleting code related to stale feature flags.
Table of Contents
Overview
This is the higher level architecture of Polyglot Piranha.
At its heart, Polyglot Piranha is a structural find/replacement (rewrite) engine and pre-build language specific cleanup rules like - like simplifying boolean expressions, simplifying if-else
statements, deleting empty class, deleting files with no type declarations, inline local variables, and many more.
A user provides :
- A set (or, a graph) of structural find/replace rules
- Path to the code base
- Arguments to modify Piranha's behavior (like deleting associated comments) When Piranha applies the set (or graph) of user defined rules, it triggers the pre-built language specific cleanup rules to do a deep cleanup.
When is Polyglot Piranha useful?
Example 1 (Stale Feature Flag Cleanup)
Let's take an example, where we know for a fact that the expression `exp.isTreated("SHOW_MENU") always returns `true` (i.e. the feature *Show Menu* is treated) ``` public String fooBar(boolean x) { if(exp.isTreated("SHOW_MENU")|| x){ String menu = getMenu(); return menu; } return ""; } ``` To cleanup this code with Piranha, a user would have to write *one* rule to update the expressions like `exp.isTreated("SHOW_MENU")` to `true` and hook it to the pre-built boolean simplification rules. It would result in : ``` public String fooBar(boolean x) { String menu = getMenu(); return menu; } ``` Note how, user only specified the seed rule to update the expression to true, and Piranha simplified the disjunction (`exp.isTreated("SHOW_MENU")|| x` => `true`), then removed the stale if condition and finally deleted the unreachable return statement (`return "";`).Example 2 (Structural Find/Replace with built-in cleanup)
Let's say a user writes a piranha rule to delete an unused enum case (let's say `LOW`). However, this enum case "co-incidentally" is the only enum case in this enum declaration. ``` enum Level { LOW, } ``` If the user hooks up this *enum case deletion* rule to the pre-built rules, it would not only delete the enum case (`LOW`), but also the consequent empty enum declaration and also optionally delete the consequently empty compilation unit.Example 3 (Structural Find/Replace with custom cleanup)
Let's take a canonical example of replacing Arrays.asList
with Collections.singletonList
, when possible.
This task involves two steps (i) Replacing the expression (ii) Adding the import statement for Collections
if absent (Assuming google java format takes care of the unused imports :))
However, Piranha does not contain pre-built rules to add such a custom import statements.
import java.util.ArrayList;
import java.util.Arrays;
+ import java.util.Collections;
class Character{
String name;
List<String> friends;
List<String> enemies;
Character(String name) {
this.name = name;
this.friends = new ArrayList<>();
- this.enemies = Arrays.asList(this.name);
+ this.enemies = Collections.singletonList(this.name);
}
}
For such a scenario a developer could first write a seed rule for replacing the expression and then craft a custom "cleanup" rule (that would be triggered by the seed rule) to add the import statement if absent within the same file.
Note a user can also craft a set of rules that trigger no other rule, i.e. use piranha as a simple structural find/replace tool
If you end up implementing a cleanup rule that could be useful for the community, feel free to make a PR to add it into the pre-built language specific rules
Using Polyglot Piranha
Polyglot Piranha can be used as a python library or as a command line tool.
:snake: Python API
Installing the Python API
pip install polyglot_piranha
Currently, we support one simple API (run_piranha_cli
) that wraps the command line usage of Polyglot Piranha. We believe this makes it easy to incorporate Piranha in "pipelining".
`run_piranha_cli`
from polyglot_piranha import run_piranha_cli
path_to_codebase = "..."
path_to_configurations = "..."
piranha_summary = run_piranha_cli(path_to_codebase,
path_to_configurations,
should_rewrite_files=True)
Arguments
- `path_to_codebase` : Path to source code folder - `path_to_configuration` : A directory containing files named `piranha_arguments.toml`, `rules.toml` and optionally `edges.toml` * `piranha_arguments.toml`: Allows a user to choose language (`java`, `kotlin`, ...), opt-in/out of other features like cleaning up comments, or even provide arguments to the piranha rules [reference](#piranha-arguments) * `rules.toml`: *piranha rules* expresses the specific AST patterns to match and __replacement patterns__ for these matches (in-place). These rules can also specify the pre-built language specific cleanups to trigger. * `edges.toml` (_optional_): expresses the flow between the rules - `should_rewrite_files` : Enables in-place rewriting of codeReturns
[Piranha_Output]
: a PiranhaOutputSummary
for each file touched or analyzed by Piranha. It contains useful information like, matches found (for match-only rules), rewrites performed, and content of the file after the rewrite. The content is particularly useful when should_rewrite_files
is passed as false
.
:computer: Command-line Interface
Get platform-specific binary from releases or build it from source following the below steps:
- Install Rust
git clone https://github.com/uber/piranha.git
cd piranha/polyglot/piranha
cargo build --release
(cargo build --release --no-default-features
for macOS)- Binary will be generated under
target/release
Polyglot Piranha
A refactoring tool that eliminates dead code related to stale feature flags.
USAGE:
polyglot_piranha --path-to-codebase <PATH_TO_CODEBASE> --path-to-configurations <PATH_TO_CONFIGURATIONS>
OPTIONS:
-c, --path-to-codebase <PATH_TO_CODEBASE>
Path to source code folder
-f, --path-to-configurations <PATH_TO_CONFIGURATIONS>
Directory containing the configuration files - `piranha_arguments.toml`, `rules.toml`,
and `edges.toml` (optional)
-h, --help
Print help information
-j, --path-to-output-summary <PATH_TO_OUTPUT_SUMMARY>
Path to output summary json
The output JSON is the serialization of- PiranhaOutputSummary
produced for each file touched or analyzed by Piranha.
It can be seen that the Python API is basically a wrapper around this command line interface.
Languages supported
Language | Structural Find-Replace |
Chaining Structural Find Replace |
Stale Feature Flag Cleanup |
---|---|---|---|
Java | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
Kotlin | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
Java + Kotlin | :x: | :calendar: | :calendar: |
Swift | :heavy_check_mark: | :construction: | :construction: |
Go | :construction: | :construction: | :construction: |
Python | :calendar: | :calendar: | :calendar: |
TypeScript | :calendar: | :calendar: | :calendar: |
C# | :calendar: | :calendar: | :calendar: |
JavaScript | :calendar: | :calendar: | :calendar: |
Strings Resource | :heavy_check_mark: | :x: | :x: |
Contributions for the :calendar: (planned
) languages or any other languages are welcome :)
Getting Started with demos
Running the Demos
We believe, the easiest way to get started with Piranha is to build upon the demos.
To setup the demo please follow the below steps:
git clone https://github.com/uber/piranha.git
cd polyglot/piranha
- Create a virtual environment:
python3 -m venv .env
source .env/bin/activate
- Install Polyglot Piranha
pip install .
to run demo against current source code (please install Rust, it takes less than a minute)- Or,
pip install polyglot_piranha
to run demos against the latest release.
Currently, we have demos for the following :
Stale Feature Flag Cleanup:
- run
python3 demo/stale_feature_flag_cleanup_demos.py
. It will execute the scenarios listed under demo/java/ff and demo/kt/ff. These scenarios use simple feature flag API. - In these demos the
configurations
contain :rules.toml
: expresses how to capture different feature flag APIs (isTreated
,enum constant
)piranha_arguments.toml
: expresses the flag behavior, i.e. the flag name and whether it is treated or not. Basically thesubstitutions
provided in thepiranha_arguments.toml
can be used to instantiate the rules reference.
Match-only rules:
- run
python3 demo/match_only_demos.py
- This demo also shows how the piranha summary output can be used.
rules.toml
: express how to capture two patterns - (i) invocation of the methodfooBar("...")
and invocation of the methodbarFoo("...")
(but only in static methods)
Structural Find/Replace
- run
python3 demo/find_replace_demos.py
- This demo shows how to use Piranha as a simple structural find/replace tool (that optionally hooks up to built-in cleanup rules)
Structural Find/Replace with Custom Cleanup
- run
python3 demo/find_replace_custom_cleanup_demos.py
- This demo shows how to replace
new ArrayList<>()
withCollections.emptyList()
. Note it also adds the required import statement.
Please refer to our test cases at /polyglot/piranha/test-resources/<language>/
as a reference for handling complicated scenarios
Building upon the stale feature flag cleanup demo
First, check if Polyglot Piranha supports Stale feature flag cleanup for the required language.
Then see if your API usage is similar to the ones shown in the demo (java-demo) or in the test resources (java-ff_system1, java-ff_system2, kt-ff_system1, kt-ff_system2).
If not :|, try to adapt these examples to your requirements. Further, you can study the tree-sitter query documentation to understand how tree-sitter queries work. It is recommended to read the section- Adding support for a new feature flag system
Then adapt the argument file as per your requirements. For instance, you may want to update the value corresponding to the @stale_flag_name
and @treated
. If your rules do not contain require other tags feel free to remove them from your arguments file. In most cases edges file is not required, unless your feature flag system API rules are inter-dependent.
More details for configuring Piranha - Adding support for a new feature flag system and Adding Cleanup Rules.
One can similarly build upon the other demos too.
Stale Feature Flag Cleanup in depth
Adding support for a new feature flag system
To onboard a new feature flag system users will have to specify the `/rules.toml` and `/edges.toml` files (look [here](/polyglot/piranha/src/cleanup_rules/java)). The `rules.toml` will contain rules that identify the usage of a feature flag system API. Defining `edges.toml` is required if your feature flag system API rules are inter-dependent. For instance, you want to delete a method declaration with specific annotations and then update its usages with some boolean value. Please refer to the `test-resources/java` for detailed examples.Adding a new API usage
The example below shows a usage of a feature flag API (experiment.isTreated(STALE_FLAG)
), in a if_statement
.
class PiranhaDemo {
void demoMethod(ExperimentAPI experiment){
// Some code
if (experiment.isTreated(STALE_FLAG)) {
// Do something
} else {
// Do something else
}
// Do other things
}
}
In the case when STALE_FLAG is treated, we would expect Piranha to refactor the code as shown below (assuming that STALE_FLAG
is treated) :
class PiranhaDemo {
void demoMethod(ExperimentAPI experiment){
// Some code
// Do something
// Do other things
}
}
This can be achieved by adding a rule in the input_rules.toml
file (as shown below) :
[[rules]]
name = "Enum Based, toggle enabled"
query = """((
(method_invocation
name : (_) @n1
arguments: ((argument_list
([
(field_access field: (_)@f)
(_) @f
])) )
) @mi
)
(#eq? @n1 "isTreated")
(#eq? @f "@stale_flag_name")
)"""
replace_node = "mi"
replace = "@treated"
groups = [ "replace_expression_with_boolean_literal"]
holes = ["treated", "stale_flag_name"]
This specifies a rule that matches against expressions like exp.isTreated(SOME_FLAG_NAME)
and replaces it with true
or false
.
The query
property of the rule contains a tree-sitter query that is matched against the source code.
The node captured by the tag-name specified in the replace_node
property is replaced with the pattern specified in the replace
property.
The replace
pattern can use the tags from the query
to construct a replacement based on the match (like regex-replace).
Each rule also contains the groups
property, that specifies the kind of change performed by this rule. Based on this group, appropriate
cleanup will be performed by Piranha. For instance, replace_expression_with_boolean_literal
will trigger deep cleanups to eliminate dead code (like eliminating consequent
of a if statement
) caused by replacing an expression with a boolean literal.
Currently, Piranha provides deep clean-ups for edits that belong the groups - replace_expression_with_boolean_literal
, delete_statement
, and delete_method
. Basically, by adding an appropriate entry to the groups, a user can hook up their rules to the pre-built cleanup rules.
Adding "Cleanup Rule"
to the groups
which ensures that the user defined rule is treated as a cleanup rule not as a seed rule (For more details refer to demo/find_replace_custom_cleanup
).
A user can also define exclusion filters for a rule (rules.constraints
). These constraints allow matching against the context of the primary match. For instance, we can write a rule that matches the expression new ArrayList<>()
and exclude all instances that do not occur inside static methods (For more details, refer to the demo/match_only
).
At a higher level, we can say that - Piranha first selects AST nodes matching rules.query
, excluding those that match any of the rules.constraints.queries
(within rules.constraints.matcher
). It then replaces the node identified as rules.replace_node
with the formatted (using matched tags) content of rules.replace
.
Parameterizing the behavior of the feature flag API
The rule
contains holes
or template variables that need to be instantiated.
For instance, in the above rule @treated
and @stale_flag_name
need to be replaced with some concrete value so that the rule matches only the feature flag API usages corresponding to a specific flag, and replace it specifically with true
or false
. To specify such a behavior,
user should create a piranha_arguments.toml
file as shown below (assuming that the behavior of STALE_FLAG is treated):
language = ["java"]
substitutions = [
["stale_flag_name", "STALE_FLAG"],
["treated", "true"]
]
This file specifies that, the user wants to perform this refactoring for java
files.
The substitutions
field captures mapping between the tags and their corresponding concrete values. In this example, we specify that the tag named stale_flag_name
should be replaced with STALE_FLAG
and treated
with true
.
Adding Cleanup Rules
This section describes how to configure Piranha to support a new language. Users who do not intend to onboard a new language can skip this section.
This section will describe how to encode cleanup rules that are triggered based on the update applied to the flag API usages.
These rules should perform cleanups like simplifying boolean expressions, or if statements when the condition is constant, or deleting empty interfaces, or in-lining variables.
For instance, the below example shows a rule that simplifies a or
operation where its RHS
is true.
[[rules]]
name = "Or - right operand is True"
query = """
(
(binary_expression
left : (_)* @other
operator:"||"
right: (true)
)
@b)"""
replace_node = "b"
replace = "true"
Currently, Piranha picks up the language specific configurations from src/cleanup_rule/<language>
.
Example
Let's consider an example where we want to define a cleanup for the scenario where
Before | After |
|
|
We would first define flag API rules as discussed in the section Adding Support for a new language. Assuming this rule replaces the occurrence of the flag API corresponding to SOME_STALE_FLAG
with true
; we would have to define more cleanup rules as follows:
R0
: Deletes the enclosing variable declaration (i.e.x
) (E.g. java-rules:delete_variable_declarations
)R1
: replace the identifier with the RHS of the deleted variable declaration, within the body of the enclosing method whereR0
was applied i.e. replacex
withtrue
within the method body offoobar
. (E.g. java-rules:replace_expression_with_boolean_literal
)R2
: simplify the boolean expressions, for example replacetrue || someCondition()
withtrue
, that encloses the node whereR1
was applied. (E.g. java-rules:true_or_something
)R3
: eliminate the enclosing if statement with a constant condition whereR2
was applied (if (true) { return 100;}
→return 100;
). E.g. java-rules:simplify_if_statement_true, remove_unnecessary_nested_block
R4
: eliminate unreachable code (return 0;
inreturn 100; return 0;
) in the enclosing block whereR3
was applied. (E.g. java-rules:delete_all_statements_after_return
)
The fact that R2
has to be applied to the enclosing node where R1
was applied, is expressed by specifying the edges.toml
file.
To define how these cleanup rules should be chained, one needs to specify edges (e.g. the java-edges file) between the groups and (or) individual rules.
The edges can be labelled as Parent
, Global
or even much finer scopes like Method
or Class
(or let's say functions
in go-lang
).
- A
Parent
edge implies that after Piranha applies the"from"
rule to update the noden1
in the AST to noden2
, Piranha tries to apply"to"
rules on any ancestor of"n2"
(e.g.R1
→R2
,R2
→R3
,R3
→R4
) - A
Method
edge implies that after Piranha applies the"from"
rule to update the noden1
in the AST to noden2
, Piranha tries to apply"to"
rules within the enclosing method's body. (e.g.R0
→R1
) - A
Class
edge implies that after Piranha applies the"from"
rule to update the noden1
in the AST to noden2
, Piranha tries to apply"to"
rules within the enclosing class body. (e.g. in-lining a private field) - A
Global
edge implies that after Piranha applies the"from"
rule to update the noden1
in the AST to noden2
, Piranha tries to apply"to"
rules in the entire code base. (e.g. in-lining a public field).
scope_config.toml
file specifies how to capture these fine-grained scopes like method
, function
, lambda
, class
.
First decide, what scopes you need to capture, for instance, in Java we capture "Method" and "Class" scopes. Once, you decide the scopes construct scope query generators similar to java-scope_config. Each scope query generator has two parts - (i) matcher
is a tree-sitter query that matches the AST for the scope, and (ii) generator
is a tree-sitter query with holes that is instantiated with the code snippets corresponding to tags when matcher
is matched.
Piranha Arguments
The purpose of Piranha Arguments is determining the behavior of Piranha.
language
: The programming language used by the source codesubstitutions
: Seed substitutions for the rules (if any). In case of stale feature flag cleanup, we pass the stale feature flag name and whether it is treated or not.delete_file_if_empty
: enables delete file if it consequently becomes emptydelete_consecutive_new_lines
: enables deleting consecutive empty new linecleanup_comments
: enables cleaning up the comments associated to the deleted code elements like fields, methods or classescleanup_comments_buffer
: determines how many lines above to look up for a comment.
Contributing
Naming conventions for the rules
- We name the rules in the format - _<ast_kind>. E.g.,
delete_method_declaration
orreplace_expression with_boolean_literal
- We name the dummy rules in the format -
<ast_kind>_cleanup
E.g.statement_cleanup
orboolean_literal_cleanup
. Using dummy rules (E.g. java-rules:boolean_literal_cleanup
) makes it easier and cleaner when specifying the flow between rules.
Writing tests
Currently we maintain
- Unit tests for the internal functionality can be found under
<models|utilities>/unit_test
. - End-to-end tests for the configurations execute Piranha on the test scenarios in
test-resources/<language>/input
and check if the output is as expected (test-resources/<language>/expected_treated
andtest-resources/<language>/expected_control
).
To add new scenarios to the existing tests for a given language, you can add them to new file in the input
directory and then create similarly named files with the expected output in expected_treated
and expected_control
directory.
Update the piranha_arguments_treated.toml
and piranha_arguments_control.toml
files too.
To add tests for a new language, please add a new <language>
folder inside test-resources/
and populate the input
, expected_treated
and expected_control
directories appropriately.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file polyglot_piranha-0.1.2-cp310-cp310-macosx_11_0_arm64.whl
.
File metadata
- Download URL: polyglot_piranha-0.1.2-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.7 MB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbaa4da788941542c7df5763460e78b1b6019c6daaae1e201c292ae17f605db8 |
|
MD5 | fb298360d64eafd0a3483dbf6bec017e |
|
BLAKE2b-256 | b6c342143380fa8d03580ee655670fed4ded8ec12ce36cda5905a31834d24700 |
File details
Details for the file polyglot_piranha-0.1.2-cp39-cp39-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: polyglot_piranha-0.1.2-cp39-cp39-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 2.7 MB
- Tags: CPython 3.9, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d133ab09ac40de972fec4fea2d822c176dbc2a588dbe30cf1022993df671d759 |
|
MD5 | 135e4c385cacafebcd2eb5fef8646b5a |
|
BLAKE2b-256 | 9f17db4bf69b31a0690f400dbbabaf6b505ee06549ff1bf4962e0e71fd2213ec |
File details
Details for the file polyglot_piranha-0.1.2-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
.
File metadata
- Download URL: polyglot_piranha-0.1.2-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
- Upload date:
- Size: 3.5 MB
- Tags: CPython 3.9, macOS 10.9+ universal2 (ARM64, x86-64), macOS 10.9+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f11323e2eafbcc57cb7abb5b43701186579a82866c47681060b1d25be5affbb6 |
|
MD5 | 02aef5e172de4bd414a2e535dcf68678 |
|
BLAKE2b-256 | 23754ea345bbb6fbf4e47dcf4d4fc2d000cc72c4f59473e31a65fd50bb8ec22c |
File details
Details for the file polyglot_piranha-0.1.2-cp39-cp39-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: polyglot_piranha-0.1.2-cp39-cp39-macosx_10_7_x86_64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.9, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19ec3e9a5c9fe7b6d3f2167970843e24ffee7f1a57ad616a3f6bc0a4be59fe17 |
|
MD5 | 1651c2d2b84f75958410be20aefd6bac |
|
BLAKE2b-256 | a3997a9a4f89ab76107f874b08b4e3e3cdbcbb7663c68b77a8fad5710d4f01fe |
File details
Details for the file polyglot_piranha-0.1.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
.
File metadata
- Download URL: polyglot_piranha-0.1.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 2.7 MB
- Tags: CPython 3.7m, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8045b66369304a031bf426a5fb2b874ca13c234b9c41a01d6d9fcbad9afb2cc |
|
MD5 | f699154ba9f2e6253976c10eef9ca6a5 |
|
BLAKE2b-256 | 50d6a661e56f826841e00571e58123dc44e74d337199f078c12a3cb806bdbe25 |