Skip to main content

Decompile python functions, from bytecode to source code!

Project description

🐍 depyf: decompile python functions, from bytecode to source code!

This is used primarily to understand the bytecode produced by PyTorch 2.0 Dynamo (PT 2.0 compiler stack).

Installation

Stable release on pypi: pip install depyf

Nightly code: pip install git+https://github.com/youkaichao/depyf.git

Usage

Simple Usage:

# obtain a callable object or codeobject
def func():
    print("hello, world!")
# import the `decompile` function
from depyf import decompile
# and decompile it into source code!
print(decompile(func))

Example output:

def func():
    print('hello, world!')
    return None

The output source code is semantically equivalent to the function, but not syntactically the same. It verbosely adds many details that are hidden in the python code. For example, the above output code explicitly returns None, which is typically ignored.

Used to understand PyTorch generated bytecode

First, run a pytorch program with torch.compile:

from typing import List
import torch
from torch import _dynamo as torchdynamo
def my_compiler(gm: torch.fx.GraphModule, example_inputs: List[torch.Tensor]):
    print("my_compiler() called with FX graph:")
    gm.graph.print_tabular()
    return gm.forward  # return a python callable

@torchdynamo.optimize(my_compiler)
def toy_example(a, b):
    x = a / (torch.abs(a) + 1)
    if b.sum() < 0:
        b = b * -1
    return x * b
for _ in range(100):
    toy_example(torch.randn(10), torch.randn(10))

Second, get compiled code and guard code from pytorch:

from torch._dynamo.eval_frame import _debug_get_cache_entry_list
cache_entries = _debug_get_cache_entry_list(toy_example._torchdynamo_orig_callable.__code__)
guard, code = cache_entries[0]

Third, decompile the code to see how the code works:

from depyf import decompile

print("guard code:")
print(decompile(guard))

print("compiled code:")
print(decompile(code))

Output on my computer:

guard code:
def guard(L):
    if not getattr(___guarded_code, 'valid'):
        return False
    else:
        _var0 = L['a']
        if not hasattr(_var0, '_dynamo_dynamic_indices') == False:
            return False
        else:
            _var1 = L['b']
            if not hasattr(_var1, '_dynamo_dynamic_indices') == False:
                return False
            elif not ___is_grad_enabled():
                return False
            elif ___are_deterministic_algorithms_enabled():
                return False
            elif not ___is_torch_function_enabled():
                return False
            elif not getattr(utils_device, 'CURRENT_DEVICE') == None:
                return False
            elif not ___check_tensors(_var0, _var1, tensor_check_names=
                tensor_check_names):
                return False
            else:
                return True

compiled code:
def toy_example(a, b):
    __temp_1 = __compiled_fn_0(a, b)
    x = __temp_1[0]
    if __temp_1[1]:
        return __resume_at_30_1(b, x)
    else:
        return __resume_at_38_2(b, x)

Hopefully, by using this package, you can understand python bytecode now!

:warning: The above example should be run using pytorch nightly. Some debug functions like _debug_get_cache_entry_list might not exist in stable releases yet.

Python Version Coverage

The following python major versions are tested:

  • Python 3.11
  • Python 3.10
  • Python 3.9
  • Python 3.8
  • Python 3.7

You can see the coverage report by simply running python python_coverage.py.

Full Python Syntax Is Not Supported

This package is intended to understand the generated pytorch bytecode, and does not aim to fully cover all the syntax of python. For example, async operations like async/await is not supported.

Support for very complicated control flow (while-loop/for-loop) is limited.

Contributions are welcome!

If you find any error in the decompilation, feel free to open issues or pull requests to fix it!

How it works

The code first analyzes the bytecode to discover basic code blocks (blocks that do not have control flow). Then it builds a control flow graph, and decompile bytecode into source code by traversing the graph.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

depyf-0.1.1.tar.gz (14.6 kB view hashes)

Uploaded Source

Built Distribution

depyf-0.1.1-py3-none-any.whl (11.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page