Skip to main content

Convert candidate CVs into a standardised Word profile, with no invented facts.

Project description

profgen (cv_formatter)

Project generated with PyScaffold

Convert candidate CVs into standardised Word profiles — without inventing facts.

profgen (the tool is called cv_formatter) turns a candidate CV (PDF/DOCX/TXT) into a standardised Word profile through a verbatim-extract → typed-structure → grounding-check → render → review-report pipeline. The profile is rendered against a template you supply, so any house style — including a private or corporate one — can be applied without the template living in the package.

The one hard rule: no invented facts

Omitting information is acceptable; fabricating a company, tool, date, qualification, institution or project is a defect. Concretely:

  • Anything absent from the source CV is marked "Not stated" (scalars) or left as an empty list — never guessed.
  • A deterministic, LLM-independent grounding check verifies that every extracted tool, certification, institution and project name actually appears in the source text. Anything it cannot find is flagged in the review report.
  • Employers are anonymised. Experience is rendered as Project N | <domain> rather than by company name (the company is still extracted, purely so the grounding check can confirm nothing was invented).
  • No derived fields. Years of experience, seniority and similar figures are never computed; the skills table's "Years Experience" column always reads "Not stated" unless the CV states a figure explicitly.

Each conversion therefore writes two files: the .docx profile and a sibling *.review.md listing missing information and everything to verify before customer submission.

Installation

Not yet published to PyPI. Install from source (Python 3.11+):

git clone https://github.com/ksteptoe/profgen
cd profgen
make dev                  # editable install with all dev/docs extras
# or, equivalently:
pip install -e ".[dev]"

Quickstart

# 1. Generate a starter .docx style-donor template (neutral default styles).
profgen make-template templates/profile_template.docx

# 2. Convert a CV offline (no API key, no network) — produces out.docx AND out.review.md.
profgen convert cv.pdf --output out.docx --offline

When --output is omitted the profile is written to <source-stem>_profile.docx in the current directory (so cv.pdf becomes cv_profile.docx), with the review report alongside.

cv-formatter is an identical alias for profgen, and python -m profgen works too. Run profgen convert --help for the full option list.

Bring your own template

The renderer binds content to five logical rolestitle, date_heading, body, bullet and legal — rather than to fixed style names. By default each role maps to a neutral built-in or starter style (DEFAULT_STYLE_MAP):

Role Default style
title Profile Title
date_heading Profile Date
body Normal
bullet List Bullet
legal Profile Legal

To apply your own house style, pass your branded document with --template and a TOML style map with --style-map that points each role at the real paragraph style names in your document:

profgen convert cv.pdf --template my_template.docx --style-map my-style-map.toml
# my-style-map.toml — map the logical roles to YOUR template's style names.
title        = "My Heading Style"
date_heading = "My Date Style"
legal        = "My Legal Style"

The map may be partial: any role you omit falls back to its default. This is how a private or corporate template can be applied without it ever living in the package.

One-step branded profile (make profile)

For repeated runs against a confidential template there is a convenience target:

cp examples/style-map.example.toml local/style-map.toml   # then edit to taste
# drop your branded template at local/template.docx
make profile CV=cv.pdf            # Claude path (needs ANTHROPIC_API_KEY)
make profile CV=cv.pdf OFFLINE=1  # deterministic, network-free path
make profile CV=cv.pdf OUT=out.docx

make profile renders against local/template.docx using local/style-map.toml. The local/ directory and .env are gitignored, so confidential templates and API keys stay out of the repository.

Offline vs real Claude path

The structuring stage has two interchangeable backends behind one interface:

  • Offline (--offline) — the deterministic, network-free HeuristicStructuringClient. Needs no API key, makes no network call, and is what the entire test suite uses. Ideal for plumbing checks and CI.
  • Real Claude (default) — the ClaudeStructuringClient, which calls the Anthropic API and needs ANTHROPIC_API_KEY. This path is deliberately never exercised in CI; it is smoke-tested only behind an explicit opt-in (see examples/smoke_real_path.py).

Example

A runnable, fully-offline example builds a profile from a bundled synthetic CV with no API key:

.venv/bin/python examples/build_example_profile.py

It reads examples/input_cvs/sample_cv.txt, runs the offline pipeline, and writes the profile and its review report into examples/output_profiles/ (gitignored).

Development

make dev      # editable install with all dependencies
make test     # run the fully-offline test suite
make lint     # ruff
make format   # ruff --fix
make docs     # build the Sphinx HTML User Guide
make docs-pdf # build a single PDF of the docs (needs a LaTeX toolchain)

Quality gates: ruff clean, mypy --strict clean (scoped to src/), and pytest green with the network disabled. See the Sphinx User Guide (make docs) for the full pipeline walkthrough, and the cv_formatter_SPEC.md file in the repository root for the build contract.

make docs-pdf produces docs/_build/latex/profgen.pdf. It needs a system LaTeX toolchain on PATHxelatex, latexmk, and makeindex (install TeX Live or, on Windows, MiKTeX). The toolchain is not pip-installable and is optional: the target fails fast with a clear message if latexmk is missing.

Note

This project has been set up using PyScaffold 4.6 with the ClickStart extension.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profgen-0.0.1rc1.tar.gz (85.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

profgen-0.0.1rc1-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file profgen-0.0.1rc1.tar.gz.

File metadata

  • Download URL: profgen-0.0.1rc1.tar.gz
  • Upload date:
  • Size: 85.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for profgen-0.0.1rc1.tar.gz
Algorithm Hash digest
SHA256 deee7c4c433a6520c46cdeb4c7d72681d0e210127d24083c564fc858252a1200
MD5 fa2f1539c73a1e9836a518d90f0fdbc5
BLAKE2b-256 b421a6df833ad836ed7fe0cddeffcbca7ce5f2fb678e0c5e30b908eb3c36741b

See more details on using hashes here.

Provenance

The following attestation bundles were made for profgen-0.0.1rc1.tar.gz:

Publisher: ci.yml on ksteptoe/profgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file profgen-0.0.1rc1-py3-none-any.whl.

File metadata

  • Download URL: profgen-0.0.1rc1-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for profgen-0.0.1rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 e618beaa4a64118076953df90dbf3207da35ba8c430e623bcc69c7f6814755fd
MD5 8c5dec43be618d4a7f9f33997f56688e
BLAKE2b-256 d0e036182f7d65d0b6e35a9684b7e84afa8f04095d59eca6628e6a87f229ca65

See more details on using hashes here.

Provenance

The following attestation bundles were made for profgen-0.0.1rc1-py3-none-any.whl:

Publisher: ci.yml on ksteptoe/profgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page