Python cross-version byte-code deparser
Project description
uncompyle6
A native Python cross-version Decompiler and Fragment Decompiler. The successor to decompyle, uncompyle, and uncompyle2.
Introduction
uncompyle6 translates Python bytecode back into equivalent Python source code. It accepts bytecodes from Python version 1.5, and 2.1 to 3.7 or so, including PyPy bytecode and Dropbox’s Python 2.5 bytecode.
Why this?
There were a number of decompyle, uncompyle, uncompyle2, uncompyle3 forks around. All of them came basically from the same code base, and almost all of them no were no longer actively maintained. Only one handled Python 3, and even there, only 3.2 or 3.3 depending on which code is used. This code pulls these together and moves forward.
This project has the most complete support for Python 3.3 and above and the best all-around Pythoin support.
We are serious about testing, and use automated processes to find bugs. In the issue trackers for other decompilers, you will find a number of bugs we’ve found along the way. Very few to none of them are not fixed in the other decompilers.
Another thing that makes this different from other CPython bytecode decompilers is the ability to deparse just fragments and give source-code information around a given bytecode offset.
I use this to deparse fragments of code inside my trepan debuggers. For that, I need to record text fragments for all bytecode offsets (of interest). This purpose although largely compatible with the original intention is yet a little bit different. See this for more information.
The idea of Python fragment deparsing given an instruction offset can be used in showing stack traces or any program that wants to show a location in more detail than just a line number. It can be also used when source-code information does not exist and there is just bytecode information.
Requirements
This project requires Python 2.6 or later, PyPy 3-2.4, or PyPy-5.0.1. Python versions 2.4-2.7 are supported in the python-2.4 branch. The bytecode files it can read has been tested on Python bytecodes from versions 1.5, 2.1-2.7, and 3.0-3.6 and the above-mentioned PyPy versions.
Installation
This uses setup.py, so it follows the standard Python routine:
pip install -e . pip install -r requirements-dev.txt python setup.py install # may need sudo # or if you have pyenv: python setup.py develop
A GNU makefile is also provided so make install
(possibly as root or
sudo) will do the steps above.
Testing
make check
A GNU makefile has been added to smooth over setting running the right command, and running tests from fastest to slowest.
If you have remake installed, you can see the list of all tasks
including tests via remake --tasks
Usage
Run
$ uncompyle6 *compiled-python-file-pyc-or-pyo*
For usage help:
$ uncompyle6 -h
If you want strong verification of the correctness of the decompilation process, add the –verify option. But there are situations where this will indicate a failure, although the generated program is semantically equivalent. Using option –weak-verify will tell you if there is something definitely wrong. Generally, large swaths of code are decompiled correctly, if not the entire program.
You can also cross compare the results with pycdc . Since they work differently, bugs here often aren’t in that, and vice versa.
Known Bugs/Restrictions
The biggest known and possibly fixable (but hard) problem has to do with handling control flow. (Python has probably the most diverse and screwy set of compound statements I’ve ever seen; a number of the usual ones like else clauses on loops and try blocks I suspect most programmers don’t know aobut.)
All of the Python decompilers I have looked at have the same problem. In some cases we can detect an erroneous decompilation and report that.
Verification is the process of decompiling bytecode, compiling with a Python for that bytecode version, and then comparing the bytecode produced by the decompiled/compiled program. Some allowance is made for inessential differences. But other semantically equivalent differences are not caught. For example 1 and 0 is decompiled to the equivalent 0; remnants of the first true evaluation (1) is lost when Python compiles this. When Python next compiles 0 the resulting code is simpler.
Weak Verification on the other hand doesn’t check bytecode for equivalence but does check to see if the resulting decompiled source is a valid Python program by running the Python interpreter. Because the Python language has changed so much, for best results you should use the same Python Version in checking as used in the bytecode.
Finally, we have automated running the standard Python tests after first compiling and decompiling the test program. Results here are a bit weak (if not better than most other Python decompilers). But over time this will probably get better.
Python support is strongest in Python 2 for 2.7 and drops off as you get further away from that. Support is also probably pretty good for python 2.3-2.4 since a lot of the goodness of early the version of the decompiler from that era has been preserved (and Python compilation in that era was minimal)
Later distributions average about 200 files. There is some work to do on the lower end Python versions which is more difficult for us to handle since we don’t have a Python interpreter for versions 1.5, 1.6, and 2.0.
In the Python 3 series, Python support is is strongest around 3.4 or 3.3 and drops off as you move further away from those versions. Python 3.6 changes things drastically by using word codes rather than byte codes. That has been addressed, but then it also changes function call opcodes and its semantics and has more problems with control flow than 3.5 has. Between Python 3.5, 3.6 and 3.7 there have been major changes to the MAKE_FUNCTION and CALL_FUNCTION instructions. Those are not handled yet.
Currently not all Python magic numbers are supported. Specifically in some versions of Python, notably Python 3.6, the magic number has changes several times within a version. We support only the released magic. There are also customized Python interpreters, notably Dropbox, which use their own magic and encrypt bytcode. With the exception of the Dropbox’s old Python 2.5 interpreter this kind of thing is not handled.
We also don’t handle PJOrion obfuscated code. For that try: PJOrion Deobfuscator to unscramble the bytecode to get valid bytecode before trying this tool.
Handling pathologically long lists of expressions or statements is slow.
There is lots to do, so please dig in and help.
See Also
https://github.com/zrax/pycdc : supports all versions of Python and is written in C++. Support for later Python 3 versions is a bit lacking though.
https://code.google.com/archive/p/unpyc3/ : supports Python 3.2 only. The above projects use a different decompiling technique than what is used here.
https://github.com/figment/unpyc3/ : fork of above, but supports Python 3.3 only. Include some fixes like supporting function annotations
The HISTORY file.
https://github.com/rocky/python-xdis : Cross Python version disassembler
https://github.com/rocky/python-xasm : Cross Python version assembler
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for uncompyle6-2.14.0-py35-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11b594b10715fab724d3e3757c28adc25f302aff3dc8d40046929ee2d696373c |
|
MD5 | 6810f9f02bbd7f2da093c976d8d822c5 |
|
BLAKE2b-256 | 3b1d87f434930465893f59436d887fac9b1f3823e84a145046e0eae17db48eb4 |
Hashes for uncompyle6-2.14.0-py34-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14ece25031d8137be31cc1919d800a2c9e53f643356dbedbc001450a50315543 |
|
MD5 | 1cb1aa0702d8f62a50693bad6c32f86a |
|
BLAKE2b-256 | 6c8bfc4891626356732e7292090b37b45bde04c369fbf84e7395bd34b99469bd |
Hashes for uncompyle6-2.14.0-py33-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9c5778cd5e37113a7943e5caaabc46a2ffe2eb0734ae5472ed78d102b79e5e1 |
|
MD5 | cb41dc35fcea623d276e8b9304a042fd |
|
BLAKE2b-256 | 415b9e02d340c2e978f8a2297b830bbce461382637b555a3341c905b8a21f1c4 |
Hashes for uncompyle6-2.14.0-py26-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 979c761909a08f29492aa32a6fc03cb9d832dee66ecf349c037f6b458496fa90 |
|
MD5 | ee1c50a74ba5165172d309eb4e23d504 |
|
BLAKE2b-256 | ed404fe4ce89a301de7cfe36358efbf56c3846a30b048a6dfa545fbd086d40d2 |