Skip to main content

A Python 3 decompiler

Project description

Codecov PyPI Tests Code style: black License: MIT

pygetsource

pygetsource is a decompiler for Python 3, aiming to convert compiled bytecode instructions back into Python code.

Overview

When Python reads code, it first converts the instructions into bytecode. For instance:

a = 2

is converted into

LOAD_CONST 1
STORE_FAST 0

The latter form is typically stored in .pyc files and in the __code__ attribute of function objects. The goal of pygetsource is to reverse this process.

The project takes its name from the inspect.getsource function, which returns the source code of a function, but it is not always applicable, as explained above.

pygetsource is still in development. It should be able to recover the source code of simple functions for various programs from Python 3.7 to Python 3.11. It is not yet able to recover the source code of classes, import statements, try/except/match/with blocks, and does not support Python 2. While functional, the codebase has not been optimized and is in need of significant refactoring.

Finally, this software is distributed under the permissive MIT license.

Installation

Install the package using pip:

pip install pygetsource

Usage

import pygetsource


def func():
    a = 5

    while i < 10:
        if a == 2:
            break
        elif a == 4:
            return 3

        a = i // 5

    d = 3
    e = 4
    return e + d


print(pygetsource.getsource(func.__code__))

produces the following output:

a = 5
while i < 10:
    if a == 2:
        break
    elif a == 4:
        return 3
    else:
        a = i // 5
d = 3
e = 4
return e + d

Notice how the else statement was added to the elif statement, yet the two programs are functionally equivalent.

When is this useful ?

pygetsource proves useful when you need to recover the source code from a .pyc file, or when you want to get the source code of a function created through an eval statement or a lambda syntax. Indeed, running inspect.getsource fail in the latter case since the origin file of the function is either not available, or Python does not provide the exact boundaries, which are required in the case of lambda functions.

Alternatives

uncompyle6 is a Python decompiler that supports Python 2 and 3 up to Python 3.8. It uses a grammar-based approach to rebuild code from bytecode patterns. This approach is less effective for higher versions that introduce various bytecode optimizations, especially regarding complex control structures like loops, or the example given above. At the moment, it supports a larger range of Python syntaxes (such as with blocks or try/excepts). It is also licensed under a copyleft GPL license, making it less suitable for larger projects with permissive licenses.

decompyle++ (pycdc) uses a state machine approach to build an AST iteratively by processing bytecode instructions. It's written in C++ and supports more Python versions than uncompyle6, but has more trouble decompiling complex control structures like nested loops, break patterns, comprehensions, or the example given above. It also uses the copyleft GPL license.

How does it work ?

pygetsource uses a distinct approach. The bytecode instructions are initially converted into a directed graph, representing the program's flow. This graph is then iteratively reduced, processing each node based on its opcode, argument, and position and generating the AST as it goes. This method allows us to rely more on high-level patterns and less on Python’s idiosyncrasies when recreating complex structures like nested loops or break/return statements, and handle Python versions from 3.7 to 3.11 with the same codebase.

In constrast with uncompyle6 and pycdc, pygetsource uses the ast and astunparse libraries to generate the source code from the generated AST.

Here is an example of a graph being reduced:

Graph reduction

When is a decompilation successful ?

Since the compilation process is injective, it's impossible to recover the exact original source code. Multiple Python programs can yield the same bytecode instructions. Also, the original source code is typically unavailable for comparison (why would you use this software otherwise ?).

If we recompile the generated program, we can compare the two sets of bytecode instructions to ensure functional equivalence. However, Python may introduce no-op codes (like 'NOP') that might cause this verification to fail despite the two code objects being functionally equivalent.

Instead, pygetsource compares the graph of the original code object with the graph of the recovered code object, after a pruning step. During this step, no-op codes are removed, jump instructions are pruned (while maintaining edges between source and target nodes), and dead-code is eliminated.

Contributing

Contributions are welcome. Feel free to open an issue or submit a pull request. Any issues related to the decompilation process should include version of the Python interpreter used to generate the bytecode, the source code of the function, and the bytecode instructions as printed by dis.dis(code).

To install the project in development mode, clone the repository and run pip install -e '.[dev]' in the root directory. You can then run the tests using pytest. Make sure to have graphviz installed first. If you're on MacOS and have trouble with the Python pygraphviz package, try installing it using the following command:

pip install \
    --global-option=build_ext \
    --global-option="-I$(brew --prefix graphviz)/include/" \
    --global-option="-L$(brew --prefix graphviz)/lib/" \
    pygraphviz

To inspect a failure case, use the debug=True parameter of the getsource function. This will display the graph at different stages of the reduction process, as well as various debug information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygetsource-0.3.0.tar.gz (29.9 kB view details)

Uploaded Source

Built Distribution

pygetsource-0.3.0-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file pygetsource-0.3.0.tar.gz.

File metadata

  • Download URL: pygetsource-0.3.0.tar.gz
  • Upload date:
  • Size: 29.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pygetsource-0.3.0.tar.gz
Algorithm Hash digest
SHA256 12afe0ae8169f5118b7db0424af84214bdb53bf0cc5d9ec22ccd9f368b650394
MD5 a6eca5a53974ed664affbb3f7da62927
BLAKE2b-256 dea0e74bf873d8458a6772be17545038c6eea24b58b9a5e56a1caf4272307661

See more details on using hashes here.

File details

Details for the file pygetsource-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: pygetsource-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pygetsource-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc257afe9907ef6723b259848bf80a23b227964c26cde1663a1ecd6430016be8
MD5 c3d4e2f59c4f3c944b0973183a11bbd6
BLAKE2b-256 2c29a053039e58f750e00291158218cde2d08624f1d1e2f582ab8599ebad57f8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page