Local-first DOCX formatter for academic papers with a content-fingerprint guard that proves the text was left untouched — only the formatting changed.

These details have not been verified by PyPI

Project links

Project description

Paper Format Agent

中文说明 | English

Local-first Content Guard Python License

An open-source DOCX formatter for academic papers that can prove it didn't rewrite your text.

Paper Format Agent reformats a thesis or paper — fonts, indents, alignment, spacing, headings, captions — to match a target format guide, and it ships with a verifiable content fingerprint so you can confirm the wording of your paper came out unchanged. It compares a fingerprint of your body and table text (with whitespace and stray bullet characters normalized out) before and after formatting; the run is fail-closed, so if that text changed it aborts instead of writing a file. Everything runs locally on your machine. It's also packaged as an installable agent skill (SKILL.md + agents/openai.yaml), so tools like Claude Code or Codex CLI can invoke it directly instead of a human clicking through a GUI.

Proof, not a promise

Real fields from an actual run (--engine python, the fully-guarded path), taken from the produced format_report.json:

{
  "content_fingerprint_before": "793e6533fd670418141d11fdcf014be19750408129ecff8b1b78a2641a3786db",
  "content_fingerprint_after":  "793e6533fd670418141d11fdcf014be19750408129ecff8b1b78a2641a3786db",
  "content_changed": false,
  "content_guard_enforced": true
}

The before/after fingerprints match, and a paragraph-by-paragraph .text comparison I ran across the whole document confirms every word survived. What did change on that same file: body text went from unset font/indent/alignment to SimSun (宋体) 12pt, a 2-character first-line indent, and justified alignment; the abstract title became SimSun 18pt centered; the Chinese keywords line became SimSun 12pt left-aligned. The same run also reported the real problems it found — char_below_min (document under the guide's minimum length) and blank_page_risk — rather than silently claiming a perfect score.

Why This Exists

Every closed-source formatting service (论文无忧, WPS 论文排版, 大以论文, AIPoliDoc, and similar) asks you to trust that your content survives the reformatting pass — none of them let you verify it.

The content guard is the smallest honest promise: change the formatting, but not the wording of your body and table text — and if that can't be confirmed, the run aborts with an error (content guard failed) instead of shipping a silently-altered document. It's fail-closed and enforced by default. (Scope: it normalizes whitespace and stray bullet characters before comparing, and covers body paragraphs and tables; headers and footers, which the formatter sets on purpose, are out of scope. The fully-guarded path is --engine python; other engines run a local post-processor, e.g. to refresh the table of contents, after the check.)
Open-source and auditable: read the code, or just diff the fingerprint yourself.
Formatting-only automation across margins, fonts, line spacing, headings, captions, tables, and references, plus required-section checks (abstracts, keywords, table of contents) and running headers / centered page-number footers.
Reports are usable by students, supervisors, reviewers, and CI.

Status

This project is a practical open-source MVP. It is suitable for demos, internal pilots, agent workflows, and synthetic benchmark development. Before relying on it for high-stakes submissions, expand the regression corpus, template coverage, and object-level scoring for tables, figures, equations, footnotes, headers, and footers.

Agent Skill

This repository includes a top-level SKILL.md and agents/openai.yaml, so agent users can treat the repo as an installable skill.

The skill teaches an agent how to:

inspect input files safely
run the formatter in content-preserving mode
review format_report.json
validate changes before returning results
add new template rules with tests

MCP Server

The same pipeline is also exposed as an optional MCP server, so Claude Code, Codex CLI, or any MCP client can call it directly (requires Python 3.10+):

pip install "paper-format-agent[mcp]"
paper-format-agent-mcp

Tools: format_paper (content-guarded reformat), extract_format_rules, and score_paper (read-only). See docs/MCP.md for the client config and tool reference.

Quick Start

pip install -r requirements.txt

python -m paper_format_agent.cli \
  --format-file "format_guide.docx" \
  --paper-file "paper.docx" \
  --out-dir "./output" \
  --engine auto \
  --strict-required-sections

Optional GUI:

python run_gui.py

Batch processing:

python -m paper_format_agent.cli \
  --format-file "format_guide.docx" \
  --paper-dir "./papers" \
  --out-dir "./batch_output" \
  --engine python \
  --strict-required-sections

Batch mode writes one output folder per paper plus batch_summary.json, including pass rate, score averages, content-change count, and per-paper report locations.

Template Packs And Synthetic Examples

The repository includes privacy-safe template packs and synthetic examples so users can try the workflow without uploading real papers:

templates/ contains JSON presets for Chinese thesis, journal article, and IEEE-style conference formatting.
examples/ contains a synthetic format guide and sample reports for demos, issues, and PRs.
docs/TEMPLATE_PACKS.md explains the template contract and contribution checklist.

Template files are intentionally plain JSON. They are easy to review, easy to customize locally, and safe to extend through small PRs.

Outputs

File	Purpose
`formatted_paper_v3.docx`	repaired DOCX document
`format_rules.json`	extracted formatting rules
`format_report.json`	machine-readable score and checks
`format_report.html`	human-readable report
`modify_log.json`	formatting operation log
`engine_report.json`	Word COM / LibreOffice / Python post-process result
`marker_dump.json`	optional paragraph classification dump

Safety Model

By default, the pipeline enforces a content guard. Reports include:

content_changed
content_guard_enforced
content_fingerprint_before
content_fingerprint_after
diagnostics with severity, evidence, and suggested fixes for failed checks

For normal academic formatting, content_changed should be false.

Validation

python tools/validate_skill.py
python -m unittest discover -s tests -p "test_*.py"
python tools/compile_check.py
python tools/release_audit.py

Before publishing from a local workspace, also run:

python tools/release_audit.py --include-local

This optional check includes untracked and ignored local artifacts, such as generated outputs, scratch files, caches, and private document formats.

Good First PRs

We want many small, reviewable PRs. Good contribution areas:

Add a synthetic test for a school, journal, or conference formatting rule.
Add a new synthetic template pack in templates/.
Improve a narrowly scoped rule extractor.
Add scoring coverage for tables, figures, references, equations, headers, or footers.
Improve report wording or diagnostics.
Add local-first integrations such as MCP, GitHub Actions, or batch processing.
Improve this repo's SKILL.md workflow for agent users.

New contributors can start from the task-ready board in docs/CONTRIBUTOR_TASKS.md. Each task lists user pain, expected PR shape, and suggested labels.

See CONTRIBUTING.md, ROADMAP.md, and AGENTS.md.

Architecture

format guide + paper.docx
  -> rule extraction
  -> paragraph type tagging
  -> style application
  -> numbering cleanup
  -> optional engine post-process
  -> scoring and reports

Detailed notes:

Privacy

Do not commit real papers, private school templates, reviewer comments, API keys, or generated documents. Use synthetic fixtures or anonymized snippets in tests.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.1.0

Jul 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper_format_agent-3.1.0.tar.gz (51.3 kB view details)

Uploaded Jul 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

paper_format_agent-3.1.0-py3-none-any.whl (45.4 kB view details)

Uploaded Jul 4, 2026 Python 3

File details

Details for the file paper_format_agent-3.1.0.tar.gz.

File metadata

Download URL: paper_format_agent-3.1.0.tar.gz
Upload date: Jul 4, 2026
Size: 51.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.19

File hashes

Hashes for paper_format_agent-3.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ad4d33fc606a95bf08ae911daa8bdad888327a4f2168b0e763d14c5b88080766`
MD5	`16cb24413d60d555d6b2eb1c40672a0b`
BLAKE2b-256	`1d2db94dc0dd47adb8695538b84caa6bc53d82aa418b0b5312a0205b211a294c`

See more details on using hashes here.

File details

Details for the file paper_format_agent-3.1.0-py3-none-any.whl.

File metadata

Download URL: paper_format_agent-3.1.0-py3-none-any.whl
Upload date: Jul 4, 2026
Size: 45.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.19

File hashes

Hashes for paper_format_agent-3.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`85e37249a103f605f04523f6b147b94ab3a34a775ce795b72cacb36c2d4db37f`
MD5	`3727a8a85274d842992ea372a3295b59`
BLAKE2b-256	`bc0315d77b1b88f0ed73c4d0412c9e85b12707660350fd6ab81022a99c5ae742`

See more details on using hashes here.

paper-format-agent 3.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Paper Format Agent

Proof, not a promise

Why This Exists

Status

Agent Skill

MCP Server

Quick Start

Template Packs And Synthetic Examples

Outputs

Safety Model

Validation

Good First PRs

Architecture

Privacy

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes