Skip to main content

An LLM powered Python decompiler.

Project description

Decompile LLM

An LLM powered Python decompiler, that restores source code from .pyc files.

Introduction

This program aims to restore human-readable source code from .pyc files. decompile-llm uses the power of AI to restore source code for all Python versions, and all future Python versions.

Why

As Python evolves, decompilation has become more difficult, as other tools are no longer supporting newer Python versions. Fortunately, LLMs have an intimate understanding of Python bytecode, allowing this tool to automatically reconstruct source codes from every python version. However, it is important to note that, since we are utilizing LLMs, the accuracy of the reconstruction varies depending on your target code, and the model you choose.

I recommend using decompyle6 or decompile3 if the reconstruction is not working as expected, or you would like a more traditional way to decompile.

Since the accuracy heavily depends on the model you choose, I noted down below the current most capable model.


Requirements

  • .pyc file
  • Python 3.10 or higher
  • OpenAI API key or Google Gemini API key

Features

  • Works with .pyc files from all Python versions
  • Automatically disassembles bytecode using xdis
  • Supports multiple LLM providers
    • OpenAI (GPT-4.1 by default)
    • Google Gemini (Gemini 2.5 Flash by default; free tier available)
  • Syntax verification of decompiled code
  • Smart chunking for large files
  • Progress bar
  • Streamed outputs

Installation

The most straightforward way to install is via pypi.

pip install decompilellm

You can also install from source by cloning the repository and installing dependencies manually.

# Clone the repository
git clone https://github.com/iancdev/python-decompile-llm
cd python-decompile-llm

# Install required dependencies
pip install -r requirements.txt

Quick Start

The code is designed to run out of the box, without much configuration. Advanced arguments are provided, but you can get started with the below basic commands.

Python version is automatically detected for disassembly.

Most Capable and Cost Optimized (Free Tier Available)

Use this for the best code output for free. This option uses Gemini 2.5 Flash with reasoning effort set to high, and runs 3 iterations.

decompilellm --provider google --key <gemini_key> --verify yes --iter 3 --output decompiled.py --effort high <targetfile>.pyc

Free tier Gemini model requests may be used by Google to train their models. Always review the provider's terms of service before use.

Using GPT-4.1 (Default OpenAI Model)

decompilellm --key <openai_key> <targetfile>.pyc

Using Gemini 2.5 Flash (Free Tier Available, default Google Model)

decompilellm --provider google --key <gemini_key> <targetfile>.pyc

Save output to a file

decompilellm --key <openai_key> --output decompiled.py (targetfile>.pyc

Advanced options

For advanced users seeking additional control over the program's behavior, you may apply the following advanced options.

You may also run decompilellm --help to view all available options.

Flag Purpose Default
--model MODEL Which LLM model to use (e.g. gpt-4.1, gemini-2.5-pro, etc.). Provider-specific
--key KEY API key for the chosen provider (overrides env vars).
--systemmsg MSG Custom system prompt for the decompiler LLM. Built-in prompt
--iter N Number of times to run the model and keep the best answer. 1
--verify {yes,no} Check Python syntax of the decompiled code. yes
--retry N How many extra attempts to make if a run fails verification. 0
--output FILE Write result to FILE instead of stdout. stdout
--stream / --no-stream Stream tokens live to the terminal. Enabled by default unless writing to a file. on
--multithreaded / --no-multithreaded Run iterations in parallel threads. on
--threads N Explicit thread count when multithreading. Same as --iter (capped)
--provider {chatgpt,gpt,gemini,google,openai} Backend to hit. openai
--split N Manually break the disassembly into N equal-sized chunks. Overrides --auto-split. 0 (off)
--auto-split Automatically chunk large byte-code based on --max-tokens. off
--max-tokens N Target tokens per chunk when auto-splitting. Requires tiktoken. 10000
--max-chars N Fallback char length per chunk if token counts aren’t available. 50000
--temp FLOAT Sampling temperature (0.0–2.0 OpenAI, 0.0–1.0 Gemini). 0.5
--topp FLOAT Nucleus-sampling top_p value. 1.0
--effort {none,low,medium,high} Hint for reasoning depth; higher can improve accuracy at a cost. none

Extracting .pyc from PyInstaller executables

For PyInstaller generated binaries, you can use pyinstxtractor to retrieve the embedded .pyc file. You can use Detect-It-Easy or similar tools to detect which compilation method was used.


Environment variables

Instead of passing API keys on the command line you can set:

  • OPENAI_API_KEY - For OpenAI models
  • GEMINI_API_KEY - For Google Gemini models

This is recommended for repeat usage and is overall more secure.


FAQ + Troubleshooting

The output code does not reproduce the same functionality as the original code.

This is expected in most cases, since some information is lost during compilation. The reconstruction may include optimizations or other changes that are not in the original source code, causing some functionality to break. You are encouraged to use other tools to decompile as well for verification and getting a better understanding of the original code.

The output code is incorrect.

As mentioned above, the output code may be incorrect in cases where the information is simply lost. However, for major errors, you can try to increase accuracy by choosing a reasoning model, and setting reasoning effort to high. Additionally, you can get better output by increasing the iteration amount to >=3, and using syntax verification.

The below is a copy and paste command for Gemini-2.5 Pro with reasoning effort on high, running 5 parallel sessions.

decompilellm --provider google --key <gemini_key> --verify yes --iter 5 --effort high --multithreaded --model gemini-2.5-pro <targetfile>.pyc

The output code is incomplete.

This can happen when the automated splitting does not detect the correct model for token calculation, or no split occured even though it is supposed to. You can correct this by either using a model with higher context winddow support, or by manually specifying splits. Below is a command line for manual splitting into 5 chunks. Please note that chunks are split evenly, which may cause artifacts in the output code. This may cause syntax verification to fail. You may choose to disable verification and manually fix the final output for these cases.

decompilellm --provider google --key <gemini_key> --verify yes --iter 3 --multithreaded --split-manual 5 <targetfile>.pyc

For long projects, it's currently recommended to use Gemini models with higher context windows, and to split manually (since tiktoken does not support automated token calculation for Google Gemini models).

The output code is empty.

Check if your pyc file is valid. If your target file Python version is > 3.13, you may need to manually update xdis to the latest version.

I have another issue

You can open an issue on GitHub, and I can try to help. Please make sure to include as much detail as possible within your issue.


Notes

  • The quality of decompilation depends on both the complexity of the code, the model you're using, and the reasoning effort you chose (if any).
  • For especially complex code, consider GPT-o3 or Gemini 2.5 Pro or Flash models with reasoning effort set to high. (This can get costly!)
  • Alternatives: uncompyle6 and decompile3

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decompilellm-1.0.1.tar.gz (56.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

decompilellm-1.0.1-py3-none-any.whl (43.4 kB view details)

Uploaded Python 3

File details

Details for the file decompilellm-1.0.1.tar.gz.

File metadata

  • Download URL: decompilellm-1.0.1.tar.gz
  • Upload date:
  • Size: 56.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for decompilellm-1.0.1.tar.gz
Algorithm Hash digest
SHA256 e6dd905e709a9a13773d3970320c76e9c978ee36eaea58d3709ff8c11426d66e
MD5 7c5ff1e1aaf4823d4899a307610c21a1
BLAKE2b-256 b2dccbef103a07f1bb9c129aef2766b5172fba5c0ac756e10ee6270cfeed7c46

See more details on using hashes here.

Provenance

The following attestation bundles were made for decompilellm-1.0.1.tar.gz:

Publisher: python-publish.yml on iancdev/python-decompile-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file decompilellm-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: decompilellm-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 43.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for decompilellm-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 39f7a5da32cea4abbd65241814c515775086f90080f058cf8e08bc2211b97107
MD5 51c41618a11689af03c92356e9561772
BLAKE2b-256 9379202b40ae7c1ae66d5d45e85bb7e58fd9d0a8fa4c6211e763ecc14a32a1ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for decompilellm-1.0.1-py3-none-any.whl:

Publisher: python-publish.yml on iancdev/python-decompile-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page