Convert candidate CVs into a standardised Word profile, with no invented facts.
Project description
profgen (cv_formatter)
Convert candidate CVs into standardised Word profiles — without inventing facts.
profgen (the tool is called cv_formatter) turns a candidate CV
(PDF/DOCX/TXT) into a standardised Word profile through a
verbatim-extract → typed-structure → grounding-check → render → review-report
pipeline. The profile is rendered against a template you supply, so any house
style — including a private or corporate one — can be applied without the
template living in the package.
The one hard rule: no invented facts
Omitting information is acceptable; fabricating a company, tool, date, qualification, institution or project is a defect. Concretely:
- Anything absent from the source CV is marked
"Not stated"(scalars) or left as an empty list — never guessed. - A deterministic, LLM-independent grounding check verifies that every extracted tool, certification, institution and project name actually appears in the source text. Anything it cannot find is flagged in the review report.
- Employers are anonymised. Experience is rendered as
Project N | <domain>rather than by company name (the company is still extracted, purely so the grounding check can confirm nothing was invented). - No derived fields. Years of experience, seniority and similar figures are
never computed; the skills table's "Years Experience" column always reads
"Not stated"unless the CV states a figure explicitly.
Each conversion therefore writes two files: the .docx profile and a sibling
*.review.md listing missing information and everything to verify before customer
submission.
Installation
Not yet published to PyPI. Install from source (Python 3.11+):
git clone https://github.com/ksteptoe/profgen
cd profgen
make dev # editable install with all dev/docs extras
# or, equivalently:
pip install -e ".[dev]"
Quickstart
# 1. Generate a starter .docx style-donor template (neutral default styles).
profgen make-template templates/profile_template.docx
# 2. Convert a CV offline (no API key, no network) — produces out.docx AND out.review.md.
profgen convert cv.pdf --output out.docx --offline
When --output is omitted the profile is written to <source-stem>_profile.docx
in the current directory (so cv.pdf becomes cv_profile.docx), with the review
report alongside.
cv-formatter is an identical alias for profgen, and python -m profgen works
too. Run profgen convert --help for the full option list.
Bring your own template
The renderer binds content to five logical roles — title, date_heading,
body, bullet and legal — rather than to fixed style names. By default each
role maps to a neutral built-in or starter style (DEFAULT_STYLE_MAP):
| Role | Default style |
|---|---|
title |
Profile Title |
date_heading |
Profile Date |
body |
Normal |
bullet |
List Bullet |
legal |
Profile Legal |
To apply your own house style, pass your branded document with --template and a
TOML style map with --style-map that points each role at the real paragraph
style names in your document:
profgen convert cv.pdf --template my_template.docx --style-map my-style-map.toml
# my-style-map.toml — map the logical roles to YOUR template's style names.
title = "My Heading Style"
date_heading = "My Date Style"
legal = "My Legal Style"
The map may be partial: any role you omit falls back to its default. This is how a private or corporate template can be applied without it ever living in the package.
One-step branded profile (make profile)
For repeated runs against a confidential template there is a convenience target:
cp examples/style-map.example.toml local/style-map.toml # then edit to taste
# drop your branded template at local/template.docx
make profile CV=cv.pdf # Claude path (needs ANTHROPIC_API_KEY)
make profile CV=cv.pdf OFFLINE=1 # deterministic, network-free path
make profile CV=cv.pdf OUT=out.docx
make profile renders against local/template.docx using local/style-map.toml.
The local/ directory and .env are gitignored, so confidential templates
and API keys stay out of the repository.
Offline vs real Claude path
The structuring stage has two interchangeable backends behind one interface:
- Offline (
--offline) — the deterministic, network-freeHeuristicStructuringClient. Needs no API key, makes no network call, and is what the entire test suite uses. Ideal for plumbing checks and CI. - Real Claude (default) — the
ClaudeStructuringClient, which calls the Anthropic API and needsANTHROPIC_API_KEY. This path is deliberately never exercised in CI; it is smoke-tested only behind an explicit opt-in (seeexamples/smoke_real_path.py).
Example
A runnable, fully-offline example builds a profile from a bundled synthetic CV with no API key:
.venv/bin/python examples/build_example_profile.py
It reads examples/input_cvs/sample_cv.txt, runs the offline pipeline, and writes
the profile and its review report into examples/output_profiles/ (gitignored).
Development
make dev # editable install with all dependencies
make test # run the fully-offline test suite
make lint # ruff
make format # ruff --fix
make docs # build the Sphinx HTML User Guide
make docs-pdf # build a single PDF of the docs (needs a LaTeX toolchain)
Quality gates: ruff clean, mypy --strict clean (scoped to src/), and
pytest green with the network disabled. See the Sphinx User Guide
(make docs) for the full pipeline walkthrough, and the cv_formatter_SPEC.md
file in the repository root for the build contract.
make docs-pdf produces docs/_build/latex/profgen.pdf. It needs a system LaTeX
toolchain on PATH — xelatex, latexmk, and makeindex (install
TeX Live or, on Windows,
MiKTeX). The toolchain is not pip-installable and is
optional: the target fails fast with a clear message if latexmk is missing.
Note
This project has been set up using PyScaffold 4.6 with the ClickStart extension.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file profgen-0.0.1rc1.tar.gz.
File metadata
- Download URL: profgen-0.0.1rc1.tar.gz
- Upload date:
- Size: 85.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
deee7c4c433a6520c46cdeb4c7d72681d0e210127d24083c564fc858252a1200
|
|
| MD5 |
fa2f1539c73a1e9836a518d90f0fdbc5
|
|
| BLAKE2b-256 |
b421a6df833ad836ed7fe0cddeffcbca7ce5f2fb678e0c5e30b908eb3c36741b
|
Provenance
The following attestation bundles were made for profgen-0.0.1rc1.tar.gz:
Publisher:
ci.yml on ksteptoe/profgen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
profgen-0.0.1rc1.tar.gz -
Subject digest:
deee7c4c433a6520c46cdeb4c7d72681d0e210127d24083c564fc858252a1200 - Sigstore transparency entry: 1888484183
- Sigstore integration time:
-
Permalink:
ksteptoe/profgen@9429261593cb79b883948ece7d23be8ccacdaf7f -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/ksteptoe
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@9429261593cb79b883948ece7d23be8ccacdaf7f -
Trigger Event:
push
-
Statement type:
File details
Details for the file profgen-0.0.1rc1-py3-none-any.whl.
File metadata
- Download URL: profgen-0.0.1rc1-py3-none-any.whl
- Upload date:
- Size: 32.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e618beaa4a64118076953df90dbf3207da35ba8c430e623bcc69c7f6814755fd
|
|
| MD5 |
8c5dec43be618d4a7f9f33997f56688e
|
|
| BLAKE2b-256 |
d0e036182f7d65d0b6e35a9684b7e84afa8f04095d59eca6628e6a87f229ca65
|
Provenance
The following attestation bundles were made for profgen-0.0.1rc1-py3-none-any.whl:
Publisher:
ci.yml on ksteptoe/profgen
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
profgen-0.0.1rc1-py3-none-any.whl -
Subject digest:
e618beaa4a64118076953df90dbf3207da35ba8c430e623bcc69c7f6814755fd - Sigstore transparency entry: 1888484263
- Sigstore integration time:
-
Permalink:
ksteptoe/profgen@9429261593cb79b883948ece7d23be8ccacdaf7f -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/ksteptoe
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@9429261593cb79b883948ece7d23be8ccacdaf7f -
Trigger Event:
push
-
Statement type: