Normalize PE files for reproducible MSVC++ builds
Project description
msvcpp-normalize-pe - Normalize PE Files for Reproducible MSVC++ Builds
⚠️ AI-Assisted Development Notice: This project was developed as an experiment in AI-assisted "vibe coding" using Claude Code. While the code has comprehensive tests and linting, it was primarily generated through AI assistance. The implementation is based on reverse-engineering PE file formats and may have edge cases or behaviors that haven't been thoroughly tested with all possible MSVC configurations. Use with caution in production environments and verify results with your specific toolchain.
A Python tool to patch Windows PE (Portable Executable) files to make MSVC builds reproducible by normalizing timestamps, GUIDs, and other non-deterministic debug metadata.
The Problem
When compiling Windows executables with Microsoft Visual C++ (MSVC), even with the /Brepro flag enabled, builds are not fully reproducible. The same source code compiled twice produces different binaries due to non-deterministic debug information:
- COFF Header TimeDateStamp: Build timestamp in PE header
- Debug Directory Timestamps: 4 separate timestamps in debug entries (CODEVIEW, VC_FEATURE, POGO, REPRO)
- CODEVIEW GUID: Random GUID linking .exe to .pdb file
- CODEVIEW Age: Incremental counter that varies between builds
- REPRO Hash: Composite hash containing the GUID and timestamps
This makes binary verification in CI impossible - you can't verify that committed binaries match the source code because every rebuild produces different bytes, even though the executable code is identical.
The Solution
This tool patches all non-deterministic fields in PE files to fixed, deterministic values:
- All timestamps →
0x00000001(January 1, 1970 + 1 second) - CODEVIEW GUID →
00000000-0000-0000-0000-000000000000 - CODEVIEW Age →
1 - REPRO Hash → All zeros
After patching, identical source code produces byte-for-byte identical binaries, enabling reproducible builds and CI verification.
What Gets Patched
Fields Patched (8 total)
- PE COFF Header TimeDateStamp (offset varies, typically 0xC0-0x100)
- Debug CODEVIEW Entry Timestamp
- Debug CODEVIEW GUID (16 bytes)
- Debug CODEVIEW Age (4 bytes)
- Debug VC_FEATURE Entry Timestamp
- Debug POGO Entry Timestamp
- Debug REPRO Entry Timestamp
- Debug REPRO Hash (36 bytes)
What Doesn't Change
- All executable code (.text section)
- All program data (.data, .rdata sections)
- Import/Export tables
- Section headers
- Relocations
The binary behaves identically at runtime - only metadata used for debugging is normalized.
Installation
From PyPI (Recommended)
pip install msvcpp-normalize-pe
From Source
git clone https://github.com/mithro/msvcpp-normalize-pe.git
cd msvcpp-normalize-pe
pip install .
Using uv
uv pip install msvcpp-normalize-pe
Usage
Command Line
After installation, the msvcpp-normalize-pe command is available:
# Basic usage
msvcpp-normalize-pe program.exe
# Custom timestamp
msvcpp-normalize-pe program.exe 1234567890
# Verbose output
msvcpp-normalize-pe --verbose program.exe
# See all options
msvcpp-normalize-pe --help
Python API
You can also use msvcpp-normalize-pe as a library in your Python code:
from pathlib import Path
from msvcpp_normalize_pe import patch_pe_file
result = patch_pe_file(Path("program.exe"), timestamp=1, verbose=True)
if result.success:
print(f"Patched {result.patches_applied} fields")
else:
print(f"Errors: {result.errors}")
Example Output
[1/1] COFF header: 0x829692a8 -> 0x00000001
[2/?] Debug CODEVIEW timestamp: 0x829692a8 -> 0x00000001
[3/?] Debug CODEVIEW GUID: e97b6ac706ea9b2dd577392d2bf08df7 -> 00000000000000000000000000000000
[4/?] Debug CODEVIEW Age: 7 -> 1
[5/?] Debug VC_FEATURE timestamp: 0x829692a8 -> 0x00000001
[6/?] Debug POGO timestamp: 0x829692a8 -> 0x00000001
[7/?] Debug REPRO timestamp: 0x829692a8 -> 0x00000001
[8/?] Debug REPRO hash: 20000000e97b6ac7... -> 000000000000000000...
Total: 8 timestamp(s) patched in program.exe
Integration with Build Systems
Makefile Integration (Native MSVC)
# Native MSVC builds
ifeq ($(USE_NATIVE_MSVC),1)
program.exe: program.cpp
cl.exe /O2 /Zi program.cpp /link /DEBUG:FULL /Brepro
msvcpp-normalize-pe program.exe 1
endif
CI/CD Verification Workflow
name: Verify Binary Reproducibility
jobs:
verify:
runs-on: windows-latest
steps:
- name: Build from source
run: |
cl.exe /O2 program.cpp /link /DEBUG:FULL /Brepro
msvcpp-normalize-pe program.exe 1
- name: Compare with committed binary
run: |
fc /b program.exe committed/program.exe
Requirements
- Python 3.9+ (type hints, dataclasses)
- Target files: Windows PE executables (.exe) or DLLs (.dll)
- Architecture: Works with both 32-bit (PE32) and 64-bit (PE32+) binaries
No runtime dependencies - uses only Python standard library (struct, sys, pathlib, dataclasses).
Limitations and Known Issues
What This Tool Fixes
- ✅ Makes PE executables reproducible (timestamps, GUIDs)
- ✅ Works with native MSVC (cl.exe + link.exe)
- ✅ Preserves debugging capability (PDB files still work)
What This Tool Cannot Fix
-
❌ PDB files remain non-deterministic (~11% of PDB content varies)
- PDB files contain thousands of small differences (padding, internal offsets, GUIDs)
- Microsoft's PDB format has fundamental non-determinism issues
- Industry solution: Use clang-cl + lld-link instead of native MSVC
-
❌ Does not work with stripped binaries (no debug directory to patch)
Alternative: Use clang-cl + lld-link
For fully reproducible builds including PDB files, use LLVM's Windows toolchain:
clang-cl /O2 /std:c++17 program.cpp /link /DEBUG:FULL /Brepro /TIMESTAMP:1
The /TIMESTAMP: flag is only supported by lld-link, not native MSVC link.exe.
Technical Details
PE File Structure
The tool parses the PE file structure to locate and patch:
- DOS Header (offset 0x3C) → PE signature offset
- PE Signature (offset varies) → Verify "PE\0\0"
- COFF Header (after PE sig) → TimeDateStamp at +4
- Optional Header (after COFF) → Contains Data Directories
- Data Directory #6 → Debug Directory (RVA + Size)
- Debug Directory Entries → 28-byte structures with timestamps
- CODEVIEW RSDS Structure → GUID at +4, Age at +20
- REPRO Hash → Full hash data
Why /Brepro Isn't Enough
MSVC's /Brepro flag:
- ✅ Removes some non-determinism
- ✅ Uses hash-based timestamps instead of wall clock time
- ❌ Still produces different hashes for each build
- ❌ GUID remains random
- ❌ Age field increments
This is because /Brepro computes a hash of build inputs, but includes random/variable data in that hash.
Comparison with Alternatives
vs. ducible
ducible is an older tool with similar goals:
- ❌ Unmaintained (last update 2018)
- ❌ Only patches COFF header timestamp
- ❌ Does not patch Debug Directory timestamps
- ❌ Does not patch GUIDs or Age fields
vs. clang-cl + lld-link
Using LLVM's toolchain:
- ✅ Fully reproducible (including PDB files)
- ✅ Supports
/TIMESTAMP:flag - ❌ Not always possible (may need native MSVC for compatibility)
This tool fills the gap when you must use native MSVC but still want reproducible .exe files.
Research and References
The non-determinism of MSVC builds with debug symbols is well-documented:
- Microsoft PDB Repository Issue #9: PDB non-determinism issues (GUIDs, padding, uninitialized buffers)
- Chromium Project: Uses clang-cl + lld-link specifically for reproducible builds
- Bazel Team: Marked
/experimental:deterministicas "not planned" because "PDBs are not deterministic" - Reproducible Builds Mailing List (Dec 2024): "there is no way to really solve this issue" with MSVC
- Stack Overflow (Nov 2024): "No complete solution currently exists for achieving fully reproducible MSVC builds with debug symbols"
License
Apache License 2.0 - See LICENSE file
Contributing
Contributions welcome! Please test thoroughly with your build system before submitting PRs.
Credits
Developed as part of the ghidra-optimized-stdvector-decompiler project to enable CI verification of demo binaries compiled with multiple MSVC versions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file msvcpp_normalize_pe-0.0.post38.tar.gz.
File metadata
- Download URL: msvcpp_normalize_pe-0.0.post38.tar.gz
- Upload date:
- Size: 88.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ab403ac6a84b660e6ac3b2d7dc35f36c80de587fd960aaaa6ce5788dbcfb2fb
|
|
| MD5 |
98d6ea004e2e97c1ce479d0b4f792827
|
|
| BLAKE2b-256 |
efc26f7973ebe6134cfd338f9771db28eef482236e2c5a4838db099535ca1639
|
Provenance
The following attestation bundles were made for msvcpp_normalize_pe-0.0.post38.tar.gz:
Publisher:
publish.yml on mithro/msvcpp-normalize-pe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
msvcpp_normalize_pe-0.0.post38.tar.gz -
Subject digest:
0ab403ac6a84b660e6ac3b2d7dc35f36c80de587fd960aaaa6ce5788dbcfb2fb - Sigstore transparency entry: 703202434
- Sigstore integration time:
-
Permalink:
mithro/msvcpp-normalize-pe@fbbf86de499f3b3267ce7667b166cdd994fcd0a0 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mithro
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fbbf86de499f3b3267ce7667b166cdd994fcd0a0 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file msvcpp_normalize_pe-0.0.post38-py3-none-any.whl.
File metadata
- Download URL: msvcpp_normalize_pe-0.0.post38-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d7252718abbd491b387cf41a46e6f9c9de6562ea4cb808e73819af2c0036a90
|
|
| MD5 |
82bfe189483a0e044f054e5fa6e5e39e
|
|
| BLAKE2b-256 |
75aa76c172db6c296f378e38400544813e0694f4e689769c12aa66c35a5aadcd
|
Provenance
The following attestation bundles were made for msvcpp_normalize_pe-0.0.post38-py3-none-any.whl:
Publisher:
publish.yml on mithro/msvcpp-normalize-pe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
msvcpp_normalize_pe-0.0.post38-py3-none-any.whl -
Subject digest:
5d7252718abbd491b387cf41a46e6f9c9de6562ea4cb808e73819af2c0036a90 - Sigstore transparency entry: 703202444
- Sigstore integration time:
-
Permalink:
mithro/msvcpp-normalize-pe@fbbf86de499f3b3267ce7667b166cdd994fcd0a0 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mithro
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fbbf86de499f3b3267ce7667b166cdd994fcd0a0 -
Trigger Event:
workflow_dispatch
-
Statement type: