Pcc is a c compiler built on python and llvm.
Project description
pcc
pcc is a Python-authored compiler toolchain. Its most mature path is a C
frontend that lowers C to LLVM IR and runs real third-party projects. The
repository also contains an experimental typed-Python frontend, a Python runtime
being re-authored in pcc-Python, and an in-tree experimental backend that emits
native code without LLVM for selected targets.
This is a research compiler with practical integration tests, not a drop-in replacement for Clang or CPython.
Status
| Area | Current state |
|---|---|
| C frontend | Mature relative to the rest of the repo; validated through C tests, GCC/Clang-derived suites, and real projects such as Lua, SQLite, PostgreSQL libpq, zlib, lz4, zstd, PCRE, OpenSSL, readline, and nginx. |
| Python frontend | Experimental. Typed code can lower to native IR; unsupported Python idioms may route through a CPython bridge unless --python-libpython=off forbids it. |
| Runtime | Active migration from C runtime sources to pcc-Python modules under pcc/py_runtime/py/, using pcc.unsafe and pcc.extern for low-level operations. |
| Self backend | Experimental LLVM-free emission path for AArch64 Darwin and x86_64 Linux subsets. It is used by bootstrap/build gates, but the default public backend remains LLVM unless explicitly selected. |
| Bootstrap | macOS arm64 three-stage bootstrap completes as pcc1 -> pcc2 -> pcc3 in both the default path and the strict self-backend path (--backend self --python-libpython=off --ir-scaffold=on). In the strict path, the emitted IR for pcc2 and pcc3 is byte-identical with 0 py_cpy_* calls; linked binaries have no libpython dependency and are byte-identical after Mach-O signature normalization. |
| Tests | Broad unit, corpus, self-backend, and integration coverage. The Python corpus currently contains 177 end-to-end programs. |
Historical bootstrap gap notes live in docs/issues/open-bootstrap-issues.md. This README records the current verified bootstrap status.
Install
pip install python-cc
For repository development:
git clone https://github.com/jiamo/pcc
cd pcc
uv sync
The package requires Python 3.13+. Source builds may build the Python runtime
archive at wheel time through hatch_build.py. That hook prefers the self
backend and falls back to LLVM if needed; a missing prebuilt archive can also be
rebuilt lazily on first use.
Quick Start
Compile C
pcc hello.c
pcc hello.c -o hello
pcc hello.c -- arg1 arg2
Common C project modes:
pcc myproject/
pcc --separate-tus myproject/
pcc --sources-from-make lua projects/lua-5.5.0
pcc --system-link --link-arg=-lm mathprog.c
pcc --emit-llvm out.ll hello.c
pcc --emit-obj out.o --target x86_64-unknown-linux-gnu hello.c
Compile Python
pcc hello.py
pcc hello.py -o hello
pcc hello.py --emit-llvm
pcc hello.py --python-libpython=off
pcc-static hello.py
pcc hello.py --backend self
pcc hello.py compiles the file to a native executable and runs it. Passing
-o writes the executable without running it. Passing --emit-llvm stops after
IR generation. pcc-static hello.py is the strict no-libpython entrypoint: it
uses the same compiler as pcc, but defaults to
--python-libpython=off --ir-scaffold=on. Explicit CLI flags and existing
PCC_PYTHON_LIBPYTHON / PCC_IR_SCAFFOLD environment variables still take
precedence.
The strict command is available from the normal package install:
pip install python-cc
pcc-static hello.py
There is no separate python-cc[no-libpython] extra; it would install the same
files without changing runtime behavior. Use pcc-static or pass
--python-libpython=off --ir-scaffold=on explicitly.
For Python inputs, the most important controls are:
| Option | Meaning |
|---|---|
--python-libpython=auto |
Default. Link libpython only if codegen needed a CPython fallback. |
--python-libpython=on |
Always allow/link the CPython fallback surface. |
--python-libpython=off |
Hard error if the program would need a CPython fallback. |
--ir-scaffold=on |
Experimental closed-world lowering used by the strict self-host work. |
pcc-static |
Console-script shortcut for --python-libpython=off --ir-scaffold=on. |
--backend {llvm,llvm_capi,self} |
Select the backend. llvm is the public default; self is experimental. |
Use pcc From Python
The public Python API is for C compilation.
from pcc.evaluater.c_evaluator import CEvaluator
ev = CEvaluator()
print(ev.evaluate(
"int add(int a, int b) { return a + b; }",
entry="add",
args=[3, 7],
))
from pcc import build, module
artifact = build(["src/main.c", "src/util.c"], include_dirs=["include"])
print(artifact.output_path)
m = module("arith.c")
print(m.add(3, 4))
print(m.__pcc_artifact__.exports)
Architecture
CLI / Python API
-> project collection
-> C frontend or Python frontend
-> optimization / lowering passes
-> LLVM, LLVM-C compatibility, or self backend
-> MCJIT, object emission, system link, or native executable
| Layer | Main paths | Role |
|---|---|---|
| CLI | pcc/cli_core.py, pcc/pcc.py, pcc/cli_bootstrap.py |
User command line, bootstrap CLI, option routing. |
| Public API | pcc/api.py, pcc/evaluater/c_evaluator.py |
Embeddable C build/evaluate/module APIs. |
| Project collection | pcc/project.py |
Directory scanning, make-derived source sets, dependency projects, translation-unit setup. |
| C frontend | pcc/lex/, pcc/parse/, pcc/codegen/, pcc/evaluater/ |
C preprocessing, parsing, semantic lowering, execution/emission. |
| Python frontend | pcc/py_frontend/, pcc/parse/py_* |
Python parse/lift, type inference, native lowering, CPython fallback decisions. |
| Runtime | pcc/py_runtime/, pcc/extern/, pcc/unsafe/ |
Runtime objects, extern-C bridge, compiler-recognized low-level intrinsics. |
| Backends | pcc/llvm_capi/, pcc/backend/ |
LLVM compatibility layer and experimental self backend. |
| Tests/integrations | tests/, projects/, bench/, benchmarks/ |
Regression, corpus, real-project, and performance coverage. |
See docs/system-architecture.md for the fuller architecture guide.
Capabilities
C Frontend
The C pipeline is the production-quality part of the repository. It supports:
- C99-oriented source parsing and semantic lowering.
- Scalars, pointers, arrays, structs, unions, enums, typedefs, function pointers, control flow, casts, arithmetic, bitwise operations, shifts, and variadic functions.
- Preprocessing with macro expansion and conditional compilation.
- Merged-directory builds, separate translation units, make-derived source selection, dependency projects, compile caching, and host-system linking.
- LLVM IR, object, assembly, MCJIT, and executable workflows.
- Explicit signedness tracking on top of LLVM integer types, including compile-time constant evaluation and runtime lowering as separate semantic paths.
Representative C integration commands:
uv run pcc \
--cpp-arg=-DLUA_USE_JUMPTABLE=0 \
--cpp-arg=-DLUA_NOBUILTIN \
projects/lua-5.5.0/onelua.c -- projects/lua-5.5.0/testes/math.lua
uv run pcc \
--cpp-arg=-DHAVE_CONFIG_H \
--depends-on projects/pcre-8.45=libpcre.la \
projects/test_pcre_main.c
Python Frontend
The Python frontend is intentionally described as experimental. It is useful for typed-native programs, runtime-authoring work, and the self-host track, but it does not implement the full Python data model.
Supported or actively exercised areas include:
- Typed functions and local variables lowered to native LLVM IR.
- Native
int,bool,float,str,list,tuple,dict,set, class, exception, dunder, and selected stdlib/runtime paths in the corpus. - Direct C interop through
pcc.extern. - Low-level runtime authoring through
pcc.unsafe. - CPython fallback for imports and dynamic idioms that are not yet in the native subset.
- Multi-file/bootstrap compilation through
scripts/pcc_multi.pyandpcc/cli_bootstrap.py.
The core limitation is not parsing; it is preserving Python semantics without falling back to CPython. The strict gate is:
pcc program.py --python-libpython=off --ir-scaffold=on
That mode still fails on programs that need an unsupported CPython bridge. The bootstrap and fallback ratchets live in tests/fallback_baseline.json and tests/bootstrap_gate_baseline.json.
Self Backend
The self backend is the in-tree LLVM-free emitter. It currently targets selected AArch64 Darwin and x86_64 Linux IR shapes and is validated against LLVM-backed output through dedicated tests. Use it explicitly:
pcc --backend self hello.c
pcc --backend self --target x86_64-unknown-linux-gnu --emit-obj out.o hello.c
pcc hello.py --backend self
The self backend is not yet the universal default. It is the default in the macOS arm64 bootstrap script and in the runtime wheel-build hook where supported.
Implementation note: the self backend and pass framework were developed with AI assistance from LLVM's published behavior and IR semantics. They are tested against the LLVM-backed path; they are not a source-code port of LLVM.
Bootstrap
The supported macOS arm64 bootstrap flow is:
CPython runs pcc -> pcc1
pcc1 compiles pcc -> pcc2
pcc2 compiles pcc -> pcc3
compare pcc2 and pcc3 after Mach-O signature normalization
Run it with:
scripts/bootstrap.sh
scripts/bootstrap.sh --backend llvm
scripts/bootstrap.sh --backend self
scripts/bootstrap.sh --stage 1
The strict no-libpython macOS arm64 path has also been verified to complete:
pcc --ir-scaffold=on --python-libpython=off --backend self pcc/__main__.py -o pcc1
./pcc1 --ir-scaffold=on --python-libpython=off --backend self pcc/__main__.py -o pcc2
./pcc2 --ir-scaffold=on --python-libpython=off --backend self pcc/__main__.py -o pcc3
As of 2026-04-29, this strict path verifies with 0 py_cpy_* calls in the
pcc2/pcc3 emitted IR, no libpython entry in otool -L, byte-identical
pcc2/pcc3 emitted IR, and byte-identical pcc2/pcc3 binaries after
Mach-O signature removal. The public default CLI still uses the LLVM backend
and --python-libpython=auto unless strict flags or pcc-static are selected.
Testing
Use uv run ... for repository commands.
uv run pytest -q
uv run pytest -m integration
uv run pytest tests/test_lua.py -q -n0
uv run pytest tests/test_sqlite.py -q -n0
uv run pytest tests/test_postgres.py -q -n0
Focused gates:
uv run pytest tests/test_c_testsuite.py tests/test_clang_c.py -q -n0
uv run pytest tests/test_gcc_torture_execute.py -q -n0
uv run pytest tests/test_py_multi_file_compile.py tests/test_py_multi_file_bootstrap_shim.py -q -n0
uv run pytest tests/test_self_backend.py tests/test_self_backend_bootstrap_gate.py -q -n0
Codex users should unset LC_ALL before uv run ... commands in this repo:
env -u LC_ALL uv run pytest -q
env -u LC_ALL uv run pcc hello.c
Documentation
| Topic | Path |
|---|---|
| Architecture | docs/system-architecture.md |
| Python tutorial | docs/python-tutorial.md |
| Python how-to | docs/python-howto.md |
| Python limitations | docs/python-limitations.md |
| Python scorecard | docs/python-scorecard.md |
| Bootstrap issue history | docs/issues/open-bootstrap-issues.md |
| Python data-model gaps | docs/issues/python-data-model-gaps.md |
| Python semantics discussion | docs/issues/python-semantics-preservation.md |
| GC semantics gap | docs/issues/gc-semantics-gap.md |
| Investigation reports | docs/investigations/ |
| Plans | docs/plans/ |
| Contributor/agent notes | AGENTS.md |
Recommended investigation reports:
- docs/investigations/lua-sort-random-pivot-signedness.md
- docs/investigations/pcre-op-lengths-incomplete-array-binding.md
- docs/investigations/zlib-integration-static-local-arrays-and-layout.md
- docs/investigations/sqlite-integration-vfs-init-and-mcjit-lifecycle.md
- docs/investigations/sqlite-forward-declared-bitfield-struct-tags.md
- docs/investigations/nbody-shootout-fp-contract-and-vectorization.md
Repository Map
| Path | Role |
|---|---|
pcc/cli_core.py |
Installed pcc CLI entrypoint. |
pcc/pcc.py |
Compatibility CLI wrapper. |
pcc/api.py |
build(...) and module(...) APIs for C. |
pcc/project.py |
Source collection and build orchestration. |
pcc/evaluater/c_evaluator.py |
C compile/evaluate/link coordinator. |
pcc/codegen/c_codegen.py |
Main C semantic lowering implementation. |
pcc/py_frontend/ |
Python type inference and native lowering. |
pcc/py_runtime/ |
Python runtime archive sources and pcc-Python ports. |
pcc/backend/ |
Experimental self backend. |
pcc/llvm_capi/ |
In-repo LLVM-C compatibility path. |
pcc/extern/ |
Python-to-C extern declarations. |
pcc/unsafe/ |
Compiler-recognized low-level Python intrinsics. |
utils/fake_libc_include/ |
Fake libc headers used by the C frontend. |
tests/ |
Unit, corpus, bootstrap, and integration tests. |
projects/ |
Third-party projects used as stress targets. |
bench/, benchmarks/ |
Performance tooling. |
Development
Requires Python 3.13+ and uv.
uv sync
uv run pytest -q
Environment Controls
Command-line flags are preferred when an option has both CLI and environment forms. The variables below are the supported development/debug controls used by the compiler, bootstrap, runtime build, and local gates.
General compiler controls:
| Variable | Values | Effect |
|---|---|---|
PCC_BACKEND |
llvm, llvm_capi, self |
Default backend when --backend is not passed. |
PCC_PYTHON_LIBPYTHON |
auto, on, off |
Default Python fallback/linkage policy when --python-libpython is not passed. |
PCC_IR_SCAFFOLD |
off, on, auto |
Default for the experimental closed-world Python IR scaffold. |
PCC_COMPILE_CACHE_DIR |
path | Override the translation-unit compile cache directory. |
PCC_DISABLE_COMPILE_CACHE |
1, true, yes, on |
Disable the translation-unit compile cache. |
PCC_USE_PLY_C_PARSER |
1 |
Use the legacy PLY C parser instead of the native C parser. |
PCC_PLY_CACHE_DIR |
path | Override the legacy PLY parser table cache directory. |
PCC_BINARY |
path/command | Runtime build helper: choose the pcc command used when rebuilding runtime archives. |
LLVM, IR-builder, and pass controls:
| Variable | Values | Effect |
|---|---|---|
PCC_USE_LLVMLITE |
1 |
Force all LLVM compatibility shims back to llvmlite. |
PCC_USE_LLVMLITE_C |
1 |
Force only the C codegen path back to llvmlite. |
PCC_USE_LLVMLITE_PY |
1 |
Force only the Python codegen path back to llvmlite. |
PCC_USE_LLVMLITE_PASSES |
1 |
Force only the pass layer back to llvmlite. |
PCC_LIBLLVM_PATH |
path | Point the native LLVM-C binding at a specific libLLVM-C. |
PCC_DISABLE_PASSES |
comma-separated pass names | Disable managed passes by name or alias. |
PCC_LLVM_DISABLE_PASSES |
comma-separated LLVM pass names | Disable concrete LLVM passes in the text pipeline. |
PCC_CHEAP_LLVM_PIPELINE |
1 / pass list / false value |
Enable or customize the cheap LLVM pass bundle for low-opt paths. |
PCC_LLVM_PIPELINE |
1, default, or pipeline spec |
Run an external LLVM text pipeline. |
PCC_LLVM_OPT_BIN |
path | LLVM opt binary used by external LLVM pipeline selection. |
Python runtime, linking, packaging, and bootstrap controls:
| Variable | Values | Effect |
|---|---|---|
PCC_RUNTIME_CC |
pcc, cc |
Select whether Python runtime archives are built with pcc or the host C compiler. |
PCC_RUNTIME_HIGH |
py, c |
Select pcc-Python or C implementations for high-level runtime modules. |
PCC_HOST_PYTHON |
command | Host Python used for subprocess boundaries such as self-backend emission. |
PCC_PYTHON_LDFLAGS |
linker flags | Override python-config --ldflags --embed for libpython fallback linking. |
PCC_PYTHON_CONFIG |
command/path | Override the python*-config command used for libpython flags. |
PCC_WITH_LIBPYTHON |
1 |
Runtime Makefile toggle for building libpython-compatible runtime archives. |
PCC_BUILD_SKIP |
1 |
Wheel build hook: skip runtime archive prebuild. |
PCC_BUILD_BACKEND |
self, llvm |
Wheel build hook: backend used for runtime archive prebuild. |
PCC_BUILD_TARGET |
make target | Wheel build hook: runtime Makefile target to build. |
PCC_BOOTSTRAP_OUT_DIR |
path | scripts/bootstrap.sh output directory. |
PCC_BOOTSTRAP_RUNTIME_CC |
pcc, cc |
Bootstrap wrapper default for PCC_RUNTIME_CC. |
PCC_BOOTSTRAP_RUNTIME_HIGH |
py, c |
Bootstrap wrapper default for PCC_RUNTIME_HIGH. |
PCC_BOOTSTRAP_PYTHON_LIBPYTHON |
auto, on, off |
Bootstrap wrapper default for --python-libpython. |
Diagnostics and local gates:
| Variable | Values | Effect |
|---|---|---|
PCC_DUMP_BAD_IR |
directory | Dump invalid or unparsable LLVM IR snapshots on LLVM parse failures. |
PCC_DEBUG_PHI_TYPES |
file path | Append SSA phi type-mismatch diagnostics to a log file. |
PCC_DEBUG_SSA_LOWER_FAIL |
1 |
Print tracebacks when SSA lowering fails and falls back. |
PCC_PROBE_CLOSURE |
probe mode | Select the stage-1 closure probe mode. |
PCC_PROBE_VERBOSE |
truthy value | Print verbose probe output for closure-probe scripts. |
PCC_CSMITH_SEEDS |
integer | Number of Csmith seeds used by the Csmith pytest harness. |
PCC_SELF_BACKEND_LINUX_X86_64_IMAGE |
Docker image tag | Override the Linux x86_64 self-backend Docker image tag. |
PCC_SELF_BACKEND_DOCKER_REBUILD |
1 |
Force rebuild of the Linux x86_64 self-backend Docker image. |
Compiler changes should include a minimized regression test and, when relevant, a real-project confirmation. Read AGENTS.md before making semantic frontend or codegen changes; it documents the repository's debugging workflow and testing policy.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file python_cc-0.1.2.tar.gz.
File metadata
- Download URL: python_cc-0.1.2.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8351dc04ab1bb5978f53a704f7331d59667d1a381d89678c418dd62777598ae1
|
|
| MD5 |
c80c6b4f9328d33c3e8891154f294dcd
|
|
| BLAKE2b-256 |
0f96c839723d65acdc9b1d936b6ccfc43ae72d68df5051abf01a6d91647f0cf1
|
Provenance
The following attestation bundles were made for python_cc-0.1.2.tar.gz:
Publisher:
workflow.yml on jiamo/pcc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_cc-0.1.2.tar.gz -
Subject digest:
8351dc04ab1bb5978f53a704f7331d59667d1a381d89678c418dd62777598ae1 - Sigstore transparency entry: 1399382929
- Sigstore integration time:
-
Permalink:
jiamo/pcc@88ee9157174627eb5eb15743f0c536bb8ed27d1d -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/jiamo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@88ee9157174627eb5eb15743f0c536bb8ed27d1d -
Trigger Event:
release
-
Statement type:
File details
Details for the file python_cc-0.1.2-py3-none-any.whl.
File metadata
- Download URL: python_cc-0.1.2-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e683c68f2e623732b58245396c7d71c14dbbc73a63ca581c0cf8db05836a0d74
|
|
| MD5 |
d0ad9e1e6e0db2b9cc7713d5a3a14b4f
|
|
| BLAKE2b-256 |
d8c84db6aa15edb255966040e254e582d76bdf928000e106e924581b41b19337
|
Provenance
The following attestation bundles were made for python_cc-0.1.2-py3-none-any.whl:
Publisher:
workflow.yml on jiamo/pcc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_cc-0.1.2-py3-none-any.whl -
Subject digest:
e683c68f2e623732b58245396c7d71c14dbbc73a63ca581c0cf8db05836a0d74 - Sigstore transparency entry: 1399382931
- Sigstore integration time:
-
Permalink:
jiamo/pcc@88ee9157174627eb5eb15743f0c536bb8ed27d1d -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/jiamo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@88ee9157174627eb5eb15743f0c536bb8ed27d1d -
Trigger Event:
release
-
Statement type: