Skip to main content

Bicycle Repair Man - Rewrite Python Sources

Project description

Bicycle Repair Man

BRM is a python source rewriting library with the freedom you are looking for. It gives you a chance to intervene lexing before any tree-like structre constructed. You are free to do anything; change already constructed tokens with matching them according to your patterns, put new tokens and modify the lexer rules, refactor tons of python files without losing any information (full roundtripability).

This long paragraph can be boring, let me show you some examples of what you can actually.

I hate plus operator

class DestoryAllOfThem(TokenTransformer):
    def visit_plus(self, token):
        return token._replace(string="-")

transformer = DestoryAllOfThem()
assert transformer.transform("(2p) + 2 # with my precious comment") == "(2p) - 2 # with my precious comment"

Why I have to use x ** 0.5 in order to get square root of x, can't I just use a operator?

class SquareRoot(TokenTransformer):

    def register_squareroot(self):
        return "√"

    @pattern("squareroot", "number")
    def remove_varprefix(self, operator, token):
        return self.quick_tokenize(f"int({token.string} ** 0.5)")

sqr = SquareRoot()
assert eval(sqr.transform("√9 # some more comments")) == 3
assert "some more comments" in sqr.transform("√9 # some more comments")

Making transformers permanent

If you like your transformer and use it on the real python files, you can use ~/.brm folder. Actually you shouldn't depend that folder, you can just write your transformer and do this cp transformer.py $(python -m brm). It should print out the right location for transformers. After that operation you can just add # coding: brm comment to every python file you want to use your transformers. If you already using an encoding you can keep using it with # coding: brm-<encoding> like # coding: brm-utf8 etc. Let's do an example

from brm import TokenTransformer, pattern

class AlwaysTrue(TokenTransformer):

    STRICT = False

    @pattern("name", "*any", "colon")
    def always_true_if(self, *tokens):
        statement, *_, colon = tokens
        if statement.string not in {"if", "elif", "while"}:
            return
        true, = self.quick_tokenize("True")
        return (statement, true, colon)

First of all, the STRICT state means do we care about how the result would look like. In this case, no. Because imagine this as a pre-processor, no one is going to see the result of processed text it except the interpreter it self. So we are free to swallow every location information (that won't break any code integrity).

This example's pattern starts with a that catches a name, which could be a keyword but during the lexing you can't know if it is or not. Then it catches as many things as possible until it gets a colon. The transformer function assignes first token to statement and last token to colon and swallows all of the tokens between them. The swallowed tokens constructs the actual condition but they wont be needed because we'll just replace condition with True. After setting statement, we just acess string attribute (which contains the value of token) and check if it is a valid keyword. If it is not, we return None which means this transformer didn't change anything and continue. If it is a valid statement we'll use quick_tokenize method to get tokens from a string. It is returning a sequence so we'll do some unpacking with trailing comma. At the end we'll return a sequence of tokens; the statement it self (e.g if keyword), True token and the colon (:).

After we finish our work with transformer, we'll put this to the pre-processor folder of BRM. Transformers on that folder will be executed on every python interpreter run, and transform python sources if they use special brm coding (# coding: brm).

(.venv) [  9:12ÖS ]  [ isidentical@x200:~ ]
 $ cat -n r.py
     1  # coding: brm
     2
     3  a = 2
     4  if a > 2:
     5      print("LOL")
(.venv) [  9:12ÖS ]  [ isidentical@x200:~ ]
 $ cp test.py $(python -m brm)
(.venv) [  9:12ÖS ]  [ isidentical@x200:~ ]
 $ python r.py
LOL

TA-DA!

BRM Pattern Syntax

BRM sees python source code as a stream of token types when it is searching a pattern. Imagine this code;

if a == x:
    2 + 2 # lol

the text representation of this tokens is like this;

NAME NAME EQEQUAL NAME COLON NEWLINE INDENT NUMBER PLUS NUMBER COMMENT NEWLINE DEDENT ENDMARKER

And BRM process it like this

brm pattern show gif

If you want to match binary plus operation here (2 + 2), you can create pattern with number, plus, name. If you want to match if statement's body, you can use some implicit tokens (which we can't show in the gif :D) called INDENT and DEDENT. So If there are only simple expressions or statements inside to if's body, you can create a pattern that starts with an indent token and takes anything between that indent and the dedent token, indent, *any, dedent. Any is a pattern that expands to any token, like regex's (.?*) capturing group.

brm pattern matching complex gif

Wanna try this? Just run python visualize.py --help

Some minor functions

  • quick_tokenize(source: str) => List[TokenInfo], give some source and get some tokens that can construct the same source that was inputted.
  • quick_untokenize(tokens: Sequence[TokenInfo]) => str, give some sequence of tokens and get a string form of it without any usage of positions. It helps the cases when you dont want to deal with preceding issues about token locations. If you want to get a pretty input with using token locations, call tokenize.untokenize directly.
  • directional_length(tokens: Sequence[TokenInfo]) => int, calculate the X distance between start of the sequence and end of the sequence.
  • shift_all(tokens: Sequence[TokenInfo], x_offset: int, y_offset: int) => int, shift positions of all tokens in the given sequence
  • get_type(token: TokenInfo)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

brm-0.2.7.tar.gz (9.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page