Skip to main content

Convert candidate CVs into a standardised Word profile, with no invented facts.

Project description

profgen (cv_formatter)

Project generated with PyScaffold

Convert candidate CVs into standardised Word profiles — without inventing facts.

profgen (the tool is called cv_formatter) turns a candidate CV (PDF/DOCX/TXT) into a standardised Word profile through a verbatim-extract → typed-structure → grounding-check → render → review-report pipeline. The profile is rendered against a template you supply, so any house style — including a private or corporate one — can be applied without the template living in the package.

The one hard rule: no invented facts

Omitting information is acceptable; fabricating a company, tool, date, qualification, institution or project is a defect. Concretely:

  • Anything absent from the source CV is marked "Not stated" (scalars) or left as an empty list — never guessed.
  • A deterministic, LLM-independent grounding check verifies that every extracted tool, certification, institution and project name actually appears in the source text. Anything it cannot find is flagged in the review report.
  • Employers are anonymised. Experience is rendered as Project N | <domain> rather than by company name (the company is still extracted, purely so the grounding check can confirm nothing was invented).
  • No derived fields. Years of experience, seniority and similar figures are never computed; the skills table's "Years Experience" column always reads "Not stated" unless the CV states a figure explicitly.

Each conversion therefore writes two files: the .docx profile and a sibling *.review.md listing missing information and everything to verify before customer submission.

Installation

Not yet published to PyPI. Install from source (Python 3.11+):

git clone https://github.com/ksteptoe/profgen
cd profgen
make dev                  # editable install with all dev/docs extras
# or, equivalently:
pip install -e ".[dev]"

Quickstart

# 1. Generate a starter .docx style-donor template (neutral default styles).
profgen make-template templates/profile_template.docx

# 2. Convert a CV offline (no API key, no network) — produces out.docx AND out.review.md.
profgen convert cv.pdf --output out.docx --offline

When --output is omitted the profile is written to <source-stem>_profile.docx in the current directory (so cv.pdf becomes cv_profile.docx), with the review report alongside.

cv-formatter is an identical alias for profgen, and python -m profgen works too. Run profgen convert --help for the full option list.

Bring your own template

The renderer binds content to five logical rolestitle, date_heading, body, bullet and legal — rather than to fixed style names. By default each role maps to a neutral built-in or starter style (DEFAULT_STYLE_MAP):

Role Default style
title Profile Title
date_heading Profile Date
body Normal
bullet List Bullet
legal Profile Legal

To apply your own house style, pass your branded document with --template and a TOML style map with --style-map that points each role at the real paragraph style names in your document:

profgen convert cv.pdf --template my_template.docx --style-map my-style-map.toml
# my-style-map.toml — map the logical roles to YOUR template's style names.
title        = "My Heading Style"
date_heading = "My Date Style"
legal        = "My Legal Style"

The map may be partial: any role you omit falls back to its default. This is how a private or corporate template can be applied without it ever living in the package.

One-step branded profile (make profile)

For repeated runs against a confidential template there is a convenience target:

cp examples/style-map.example.toml local/style-map.toml   # then edit to taste
# drop your branded template at local/template.docx
make profile CV=cv.pdf            # Claude path (needs ANTHROPIC_API_KEY)
make profile CV=cv.pdf OFFLINE=1  # deterministic, network-free path
make profile CV=cv.pdf OUT=out.docx

make profile renders against local/template.docx using local/style-map.toml. The local/ directory and .env are gitignored, so confidential templates and API keys stay out of the repository.

Offline vs real Claude path

The structuring stage has two interchangeable backends behind one interface:

  • Offline (--offline) — the deterministic, network-free HeuristicStructuringClient. Needs no API key, makes no network call, and is what the entire test suite uses. Ideal for plumbing checks and CI.
  • Real Claude (default) — the ClaudeStructuringClient, which calls the Anthropic API and needs ANTHROPIC_API_KEY. This path is deliberately never exercised in CI; it is smoke-tested only behind an explicit opt-in (see examples/smoke_real_path.py).

Example

A runnable, fully-offline example builds a profile from a bundled synthetic CV with no API key:

.venv/bin/python examples/build_example_profile.py

It reads examples/input_cvs/sample_cv.txt, runs the offline pipeline, and writes the profile and its review report into examples/output_profiles/ (gitignored).

Development

make dev      # editable install with all dependencies
make test     # run the fully-offline test suite
make lint     # ruff
make format   # ruff --fix
make docs     # build the Sphinx HTML User Guide
make docs-pdf # build a single PDF of the docs (needs a LaTeX toolchain)

Quality gates: ruff clean, mypy --strict clean (scoped to src/), and pytest green with the network disabled. See the Sphinx User Guide (make docs) for the full pipeline walkthrough, and the cv_formatter_SPEC.md file in the repository root for the build contract.

make docs-pdf produces docs/_build/latex/profgen.pdf. It needs a system LaTeX toolchain on PATHxelatex, latexmk, and makeindex (install TeX Live or, on Windows, MiKTeX). The toolchain is not pip-installable and is optional: the target fails fast with a clear message if latexmk is missing.

Note

This project has been set up using PyScaffold 4.6 with the ClickStart extension.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

profgen-0.0.1.tar.gz (84.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

profgen-0.0.1-py3-none-any.whl (32.0 kB view details)

Uploaded Python 3

File details

Details for the file profgen-0.0.1.tar.gz.

File metadata

  • Download URL: profgen-0.0.1.tar.gz
  • Upload date:
  • Size: 84.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for profgen-0.0.1.tar.gz
Algorithm Hash digest
SHA256 986eefe8082c614a6e11bebbc0c0b1d35165ad6a9dfdb517ceb5f402db654642
MD5 f6dc11915e08676d2bcc0fdf69717f93
BLAKE2b-256 7a30fae0048766a79b8ae7f8cc486a484993bc4b06328707a5b6ad504925aa86

See more details on using hashes here.

Provenance

The following attestation bundles were made for profgen-0.0.1.tar.gz:

Publisher: ci.yml on ksteptoe/profgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file profgen-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: profgen-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 32.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for profgen-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3be22677925cac1228e5388ada63345b82cc07e6ee349cad53c08efb641fb395
MD5 37689f776f601be38b91e507cd1634b6
BLAKE2b-256 a3555e863f4cbf4cf8534ddc1c82bca4bea0593743f74df27513734e4e8dafc0

See more details on using hashes here.

Provenance

The following attestation bundles were made for profgen-0.0.1-py3-none-any.whl:

Publisher: ci.yml on ksteptoe/profgen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page