A JIT compiler wrapper for CPython
Pyjion, a JIT extension for CPython that compiles your Python code into native CIL and executes it using the .NET CLR.
You can test out Pyjion now at www.trypyjion.com.
Read the full documentation at pyjion.readthedocs.io.
$ pip install pyjion
Compiling from source
- CPython 3.9.0
- CMake 3.2 +
- .NET 6 Preview 6
$ git clone firstname.lastname@example.org:tonybaloney/pyjion --recurse-submodules $ cd pyjion $ python -m pip install .
To get started, you need to have .NET installed, with Python 3.9 and the Pyjion package (I also recommend using a virtual environment).
After importing pyjion, enable it by calling
pyjion.enable() which sets a compilation threshold to 0 (the code only needs to be run once to be compiled by the JIT):
>>> import pyjion >>> pyjion.enable()
Any Python code you define or import after enabling pyjion will be JIT compiled. You don't need to execute functions in any special API, its completely transparent:
>>> def half(x): ... return x/2 >>> half(2) 1.0
Pyjion will have compiled the
half function into machine code on-the-fly and stored a cached version of that compiled function inside the function object.
You can see some basic stats by running
f is the function object:
>>> pyjion.info(half) JitInfo(failed=False, compile_result=<CompilationResult.Success: 1>, compiled=True, optimizations=<OptimizationFlags.InlineFramePushPop|InlineDecref: 10>, pgc=1, run_count=1)
You can also execute Pyjion against any script or module:
Or, for an existing Python module:
pyjion -m calendar
You can see the machine code for the compiled function by disassembling it in the Python REPL.
Pyjion has essentially compiled your small Python function into a small, standalone application.
rich first to disassemble x86-64 assembly and run
>>> import pyjion.dis >>> pyjion.dis.dis_native(half) 00000000: PUSH RBP 00000001: MOV RBP, RSP 00000004: PUSH R14 00000006: PUSH RBX 00000007: MOV RBX, RSI 0000000a: MOV R14, [RDI+0x40] 0000000e: CALL 0x1b34 00000013: CMP DWORD [RAX+0x30], 0x0 00000017: JZ 0x31 00000019: CMP QWORD [RAX+0x40], 0x0 0000001e: JZ 0x31 00000020: MOV RDI, RAX 00000023: MOV RSI, RBX 00000026: XOR EDX, EDX 00000028: POP RBX 00000029: POP R14 ...
The complex logic of converting a portable instruction set into low-level machine instructions is done by .NET's CLR JIT compiler.
All Python code executed after the JIT is enabled will be compiled into native machine code at runtime and cached on disk. For example, to enable the JIT on a simple
app.py for a Flask web app:
from src import pyjion pyjion.enable() from flask import Flask app = Flask(__name__) @app.route('/') def hello_world(): return 'Hello, World!' app.run()
How do you pronounce "Pyjion"?
Like the word "pigeon". @DinoV wanted a name that had something with "Python" -- the "Py" part -- and something with "JIT" -- the "JI" part -- and have it be pronounceable.
How do this compare to ...
PyPy is an implementation of Python with its own JIT. The biggest difference compared to Pyjion is that PyPy doesn't support all C extension modules without modification unless they use CFFI or work with the select subset of CPython's C API that PyPy does support. Pyjion also aims to support many JIT compilers while PyPy only supports their custom JIT compiler.
Pyston is an implementation of Python using LLVM as a JIT compiler. Compared to Pyjion, Pyston has partial CPython C API support but not complete support. Pyston also only supports LLVM as a JIT compiler.
Numba is a JIT compiler for "array-oriented and math-heavy Python code". This means that Numba is focused on scientific computing while Pyjion tries to optimize all Python code. Numba also only supports LLVM.
IronPython is an implementation of Python that is implemented using .NET. While IronPython tries to be usable from within .NET, Pyjion does not have a compatibility story with .NET. This also means IronPython cannot use C extension modules while Pyjion can.
Psyco was a module that monkeypatched CPython to add a custom JIT compiler. Pyjion wants to introduce a proper C API for adding a JIT compiler to CPython instead of monkeypatching it. It should be noted the creator of Psyco went on to be one of the co-founders of PyPy.
Unladen Swallow was an attempt to make LLVM be a JIT compiler for CPython. Unfortunately the project lost funding before finishing their work after having to spend a large amount of time fixing issues in LLVM's JIT compiler (which has greatly improved over the subsequent years).
Nuitka and Shedskin?
Both Nuitka and Shedskin are Python-to-C++ transpilers, which means they translate Python code into equivalent C++ code. Being a JIT, Pyjion is not a transpiler.
Will this ever ship with CPython?
Goal #1 is explicitly to add a C API to CPython to support JIT compilers. There is no expectation, though, to ship a JIT compiler with CPython. This is because CPython compiles with nothing more than a C89 compiler, which allows it to run on many platforms. But adding a JIT compiler to CPython would immediately limit it to only the platforms that the JIT supports.
Does this help with using CPython w/ .NET or UWP?
Code of Conduct
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact email@example.com with any additional questions or comments.
- BINARY_MULTIPLY and BINARY_POWER will assume the resulting integer is a big integer (not unboxed)
- Introduced two optimizations IntegerUnboxingMultiply and IntegerUnboxingPower which are applied at optimization level 2. Try level two if you work with integers, but at smaller values to see better performance.
- Pyjion will infer that
range(n)generates integers in iterator to improve unboxing
- LOAD_BUILD_CLASS will infer a function type instead of Any (#42)
- Instruction graphs will include the name of fast locals
- Instruction graph const values are capped to 40 characters
- Added abstract types for all builtins (#339)
pyjion.info()will now return a JitInfo object
- Optimization flags that were applied to a function during compilation are available in
- All optimizations are now runtime flags instead of compile-time features
- Unboxing PGC errors will raise
pyjion.PyjionUnboxingError(ValueError) instead of ValueError
- Instruction graphs will show conditional branches (in orange)
- Fixed a bug in generators where mixed unboxed/boxed fast locals would yield the wrong values with PGC between the first and second compilation stages
- Fixed a de-optimization that happened in rc1 due to PGC asserting the abstract kind on the profiling phase and then always asserting that integers were big-integers
- Fixed a bug where unboxed locals were decrefing frame locals on yield
- Generators will not unbox fast locals for stability reasons
- Fixed a regression on unboxed integers, caused by PGC values being set as Any
- Assert return types for float object methods (as_integer_ratio, conjugate, is_integer, hex)
pyjioncommand-line script to complement the
python -m pyjioncommand
- The pyjion CLI has flags for enabling profiling, tracing, optimization level, graphs and debugging
- Unboxing integers that don't fit into
long longwill raise a ValueError.
- Pyjion will mark any values above 1 billion as "big integers" and not escape them to reduce the chance of overflows.
- Floating point
__pow__with negative values matches all behaviour of CPython
0to a negative power will raise a
- PGC no longer uses a reference to probed values, dramatically reducing memory consumption between the first and second compile cycles
- Fixed a bug where
statistics.variance([0, 0, 1])would raise an assertion error because of an overflow raised in Fraction arithmetic (#326)
- Fixed a bug on calling
sys.settrace(None)would cause a segmentation fault (#330)
- Fixed a bug on optimized calls to custom types would crash on the 3rd execution because of the way PGC held and released references.
- Refactored Pyjion's test suite to Pytest
- Rewrote the documentation site
- Fixed a bug in the native disassembler printing empty comment lines
- Corrected the type signature of
- Fixed a bug on changed methods for object (like a global) causing crashes because of the way load_method was optimized (#335)
pyjion.symbols(callable)API to fetch the dictionary of external call tokens
- Extended the
dis_native()methods with a flag to not print the program counter (
- Improved the
dis_native()method to print the name of the method call after the
callinstructions as a line comment
- Fixed a bug in
dis_native()which showed unresolved sequence points at the top of the printout
- Fixed a bug where
in(CONTAINS_OP) result wasn't being checked for exceptions and the next operation would segfault if the
inoperation returned an error result.
- The IL in
dis()is closer in syntax to ILDasm and easier to read
- Added a
pyjion.status()method to get runtime data on the JIT
- Windows will now observe the
- Updated to .NET 6 preview 6
- Fixed a bug where
ord()builtin would return the wrong type (#315)
pyjion.dis.dis_native()will show sequence points as comments
- The BINARY_POWER and INPLACE_POWER opcodes will always return a native python long instead of an escaped integer, to avoid overflows
- Fixed a bug on large dictionary literals (>100 keys)
- Improved the efficiency of the BUILD_TUPLE, BUILD_LIST, BUILD_CONST_KEY_MAP, and BUILD_SET opcodes
- Fixed a bug with comparison of numpy arrays being unboxed into a boolean instead of staying as an array (#310)
- Support for the
yieldkeyword and Python generators
- Instruction graphs can be enabled with
pyjion.enable_graphs()and then exported by
- Pyjion will raise an ImportError if .NET is missing instead of a system exit
- Fast locals can store unboxed values
- Method calls are optimized for known types by asserting the return type, e.g.
str.upper()returns a string
- Updated to .NET 6 preview 5
- Pyjion can be run using
python -m pyjion <command>, e.g.
python -m pyjion script.pyor
python -m pyjion -m unittest
- Added unboxing for integers (OPT-16)
- Added unboxing for bool
- Fixed a bug with interned hash maps on Windows
- Added unboxing and escape analysis for floating point objects (OPT-16)
- Removed OPT-8 as it is superseded by OPT-16
- Updated to .NET 6 preview 4
- Debuggable JIT methods can be toggled at runtime using
- Added option for including Python bytecode disassembly in Pyjion disassemble printouts on
- Added API
pyjion.get_offsets(callable)to get the offsets of Python Opcode <> IL offset <> native offset.
- Moved internal representations to fixed width standard types.
- Pyjion uses .NET 6 Preview 3 as the compiler, for Linux and macOS make sure you have installed it first
- Rich comparisons (==, <, >) of floating point numbers are significantly faster (OPT-17)
- All method calls are faster by enforcing vectorcall protocol and inlining anything below 10 arguments (OPT-16)
- PGC now observes and optimizes heap-allocated (user-defined) types
- Fixed a crash on certain recursive functions with PGC enabled
- Fixed macOS wheel name
- LOAD_ATTR is now optimized by for types that implement the tp_getattr by prehashing the names (OPT-15)
- JIT will emit a direct call to LOAD_ATTR tp_getattro/tp_getattr slots for builtin types
- macOS wheels are now compiled with Clang PGO
- PGC will only profile non heap-allocated types (ie not user specified types) as type objects could be deallocated between compilation cycles
- Reduced stack effect during frame calls/function calls
- Improved performance on function calls
- Py_MakePendingCalls will be called every 100 instructions (previously 10), configurable at compile-time through the
- Updated to .NET 5.0.5 (5.0.202)
- Fixed a bug in PGC for large functions meaning they wouldn't be optimized
- Implemented PGC for BINARY_SUBSCR (OPT-5)
- Implemented PGC for STORE_SUBSCR (OPT-6)
- Implemented PGC for all inplace and regular binary operators (+, -, / etc) see OPT-13
- The compiler will now fail (and default back to CPython) if .NET emits a FAST_FAIL helper
- UNPACK_SEQUENCE is rewritten to be more efficient and use optimized paths for LIST and TUPLE types
- f-string (BUILD_STRING) is rewritten to be more efficient
- UNPACK_EX is rewritten to remove the requirement for dynamic heap allocation (and the stack canary) and leverage .NET compiler's dynamic eval stack
- PGC implemented for UNPACK_SEQUENCE
- PGC implemented for BINARY_SUBSCR
- PGC implemented for CALL_FUNCTION/OPT-14
- Added PGC emitter to first compile pass
- Drastically simplified the compilation process, resulting in a smaller call stack and allowing for more recursion (and better performance)
- Added a field to the pyjion.info() dictinary,
compile_result, indicating cause of compilation failure (if failed), see
- Fixed a bug in pyjion.dump_native/pyjion.dis.dis_native disassembling the wrapper function
- Incompatible functions (those with async, yield keyword) are marked as incompatible early in the compilation process
- Fixed a bug in OPT-13 if the type changed under certain circumstances
- Arguments to a frame are now marked as volatile and requiring type guards for certain optimizations
- Any Python type passed as an argument is now available to be optimized by OPT-13, OPT-12
- Fixed a bug occuring on Linux and Windows in sre_parse._compile which caused a GuardStackException when doing an inline decref operation.
- Added an environment variable DOTNET_LIB_PATH to allow specifying the exact path to libclrjit
- Added OPT-13 (OPTIMIZE_TYPESLOT_LOOKUPS) to optimize the type slots for all binary operators and resolve the precedence at compile-time (only for known types)
- Added OPT-14 (OPTIMIZE_FUNCTION_CALLS) to optimize calls to builtin functions
- Optimize all frame locals by determining abstract types on compilation
- Bugfix: Fixed a crash on f-strings with lots (>255) arguments
- Bugfix: Will now skip all functions containing the use of
exec()as it contains frame globals which are not supported
- Updated to .NET 5.0.3
- Updated the containers to Ubuntu 20
- Added fileobject abstract type
- Added enumerator abstract type
- Added code object abstract type
- Added integration tests for reference leaks for all binary operations (thanks @amaeckelberghe)
- Added module type (thanks @vacowboy75)
- Added OPT-12 (OPTIMIZE_BUILTIN_METHOD) to pre-lookup methods for builtin types and bypass LOAD_METHOD (PyObject_GetMethod)
- Optimized LOAD_METHOD to recycle lookups for the same object
- Expanded OPT-8, OPT-9, OPT-11, OPT-12 for nested stacks (e.g. inside expressions)
- Added a frozen set abstract type
- Added OPT-11 (OPTIMIZE_BINARY_SLICE) to optimize the BUILD_SLICE and BINARY_SUBSCR operations into a single function when the slice start, stop and step is None or a const number.
- Fixed a bug in the set_optimization_level() being reset (thanks @tetsuo-cpp)
- Added a bytearray abstract value kind (thanks @tetsuo-cpp)
- Added a type abstract value kind (thanks @tetsuo-cpp)
- Optimized the compiled instructions to only update the frame last instruction field on error/exit branch
- Removed the "periodic work" method which was called for every for/while loop and put a function to call Py_MakePendingCalls for every 10th loop
- Added an improvement to the process stage to infer the abstract types of return values to methods of builtin types, e.g. str.encode
- Added a check in dis_native for when the compiled function wasn't compiled (thanks @tetsuo-cpp)
- dis_native will now pretty print the assembly code when the
richpackage is installed (thanks @C4ptainCrunch)
pyjion[dis]is a new package bundled with pystorm3 and rich (thanks @C4ptainCrunch)
- Enhanced the process stage of the compiler with new abstract types, iterable, bytearray, codeobject, frozenset, enumerator, file, type and module
- Process stage will assert the abstract return type of any call to a builtin function (e.g. list(), tuple()), which will kick in the optimizations for a broader set of scenarios
- Added OPT-8 (OPTIMIZE_BINARY_FUNCTIONS) to combine 2 sequential binary operations into a single operation. Adds about 15-20% performance gain on PyFloat operations.
- Added OPT-9 (OPTIMIZE_ITERATORS) to inline the FOR_ITER opcode of a listiter (List iterator) into native assembly instructions.
- Added OPT-10 (OPTIMIZE_HASHED_NAMES) to precompute the hashes for LOAD_NAME and LOAD_GLOBAL dictionary lookups
- Fixed a bug where looking up a known hash for a dictionary object (optimized BINARY_SUBSCR) wouldn't raise a KeyError. Seen in #157
- Fixed a bug in JUMP_IF_FALSE_OR_POP/JUMP_IF_TRUE_OR_POP opcodes emitting a stack growth, which would cause a stack underflow on subsequent branch checks. JIT will compile a broader range of functions now
- Implemented PEP590 vector calls for methods with 10+ arguments (thanks @tetsuo-cpp)
- Implemented PEP590 vector calls for functions with 10+ arguments
- Fixed a reference leak on method calls with large number of arguments
- Support for tracing of function calls with 10+ arguments
- Disabled OPT-4 as it is causing reference leaks
- Added OPT-6 optimization. Frame constants are now used to speed up assignments to lists and dictionaries. STORE_SUBSCR will assert if something is a list, or dict and shortcut the assignment logic.
- Added OPT-7 optimization. The binary subscript operator is compiled to faster path under a set of circumstances, especially if the index/key is a frame constant. Hashes are precomputed and indexes for integer constants are converted to native numbers at compile-time.
- The native machine-code disassembler will show the actual position of the JITed code in memory, instead of starting the offset at 0
pyjion.dump_native()function returns a tuple with bytes, length and position
- Type inferencing has been improved for all inplace and binary operations
- Windows builds from source are fixed for when the user wants to compile against a checkout of .NET
- Implemented FAST_DISPATCH for additional opcodes
- Added a test runner for the CPython regression suite that tests the JIT in isolation
- Fixed a reference leak of (self) for the LOAD_METHOD opcode
- Fixed a reference leak of non C functions being called via Call<N> (CALL_FUNCTION)
- Fixed a bug where (very) large tuples being created via the BUILD_TUPLE opcode would cause an overflow error
- Fixed a bug on BUILD_MAP being called with very large dictionaries caused a fatal error
- Added OPT-4 optimization. Frame locals (named variables known at compilation) using the LOAD_FAST, STORE_FAST and DELETE_FAST opcodes will use native .NET locals instead of using the frame's f_localsplus array.
- Improved performance in LOAD_FAST and STORE_FAST through OPT-4
- Added OPT-5 optimization. Frame push/pop on entry/exit are now inline CIL instructions.
- LOAD_FAST skips unbound local checks when proceeded by a STORE_FAST (i.e. slot is definitely assigned)
- Fixed a crash bug where CPython checks recursion depth from ceval state, which may not be set
- Implemented a faster check for recursion depth
- Fixed a bug on LOAD_CLOSURE operator not being set
- Fixed OPT-2 on Windows and Linux
- Fixed a bug where the wrong CIL opcode was being used to subtract values, would throw an overflow error and fail back into EFD.
- Implemented the .NET EE exception handlers for guard stack canaries, overflow errors, and null reference exceptions
- Implemented a more efficient case of ld_i(1)
- Corrected cases of ob_refcnt to use 64-bit signed integers
- No longer print error messages on release code for unimplemented .NET EE methods
- Fixed a bug on the incorrect vtable relative field being set
- Fixed a bug where tracing and profiling would be emitted even when not explicitly enabled
- .NET Exceptions are transferred into Python exceptions at runtime
- Added an optimization (OPT-1/OPTIMIZE_IS) to inline the "is"/ "is not" statement into a simple pointer comparison with jump statement. Compiles to inline machine code instead of a method call
- Added an optimization (OPT-2/OPTIMIZE_DECREF) to decrement the refcount without a method call, when the object refcount is >1 and then call _Py_dealloc if the ref count becomes 0. Replaces the previous method call
- Windows now uses the system page size instead of the default value of 1MB
- Added support for .NET 5.0.1
- Implemented a CIL modulus emitter
- Added support for profiling compiled functions by enabling profiling (
- Added support for profiling C function calls, returns and exceptions
- Implemented a faster call path for functions and methods for 5-10 arguments
- Fixed a bug where the page size defaulted to 0 in the .NET EE, which caused a failed assertion (and fails to compile the function), would fix a large % of functions that previously failed to compile
- Added support for debugging compiled functions and modules by enabling tracing (
- Added support for debugging to catch unhandled/handled exceptions at runtime when tracing is enabled
- Added support for opcode-level tracing
- Fixed a bug on executing Pyjion with pydevd (VScode/PyCharm debugger) would cause the Python process to crash because of a doubly-freed code object (#7)
- Added a WSGI middleware function to enable Pyjion for Flask and Django (#67)
- Fix a bug on dictionary merging for mapping types incorrectly raising a type error (#66)
- Implemented supported for disassembling "large" methods into CIL (#27)
- Added type stubs for the pyjion C extension
- Fix a bug where merging or updating a subclassed dictionary would fail with a type error. (#28)
- Fixed a critical bug where method calls with large numbers of arguments, and the argument was a tuple could cause a segmentation fault on GC collection.
- Tested support for IPython REPL
- Fixed a bug where importing pyjion.dis after enabling the JIT would cause a stack overflow
- Has around 50% chance of working and not causing your computer to explode, or worse, segmentation fault
- Added a stack probe helper for Linux (will use JIT in more scenarios)
- Enabled support for running unit tests in Linux
- Fixed a bug where JIT would crash when a method call failed because of a bad-lookup
- Implemented helper method redirection for Linux to support PIC compiled symbols
- Has around 35% chance of working and not causing your computer to explode, or worse, segmentation fault
- Improved discovery of .NET libraries on Linux
- Fixed a bug where a garble-named log file would be generated (should be JIT timings log)
- Installable bdist_wheel for Ubuntu, Debian, macOS 10.15, 11 (10.16) and Windows x64
- Installable manylinux2014 wheel with clrjit.so bundled in
- Added multithreading/multiprocessing support
- Fixed a bug where the wheel would be broken if there are two distributions of Python 3.9 on the system
- Has around 30% chance of working and not causing your computer to explode, or worse, segmentation fault.
- Installable source distribution support for macOS, Windows and (barely) Linux.
- It compiles on my machine
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for pyjion-1.0.0rc2-cp39-cp39-win_amd64.whl
Hashes for pyjion-1.0.0rc2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Hashes for pyjion-1.0.0rc2-cp39-cp39-macosx_11_0_x86_64.whl