Skip to main content

Render Markdown reports to multiple formats (DOCX, with pluggable renderers)

Project description

md-reports

Convert Markdown to DOCX using a configurable Word template. Designed for embedding in Python scripts (no CLI). Extensible to other output formats.

Install

uv add md-reports

Quick start

from md_reports import convert_markdown_text, convert_markdown_file

# from a string
convert_markdown_text(
    "# Title\n\nHello **world**.",
    "out.docx",
)

# from a file (relative image paths resolve against the markdown file)
convert_markdown_file("doc.md", "doc.docx")

# inject script values via Jinja2 substitution
convert_markdown_text(
    "# Q{{ q }} report\n\nRevenue grew by **{{ pct }}%**.",
    "report.docx",
    context={"q": 2, "pct": 14.5},
)

Reusable converter (avoids reloading the template each call; supports a default_context shared across all conversions):

from md_reports import (
    ConversionOptions, DocxRenderer, MarkdownConverter,
)

conv = MarkdownConverter(
    renderer=DocxRenderer(
        template_path="house_style.docx",
        options=ConversionOptions(strict_mode=True),
    ),
    default_context={"site": "Acme"},
)
conv.convert_file("a.md", "a.docx", context={"doc": "Q1"})
conv.convert_file("b.md", "b.docx", context={"doc": "Q2"})

The renderer argument selects the output format. DocxRenderer is the only built-in renderer today; the abstraction is in place for additional renderers (e.g. HTML) to be added without changes to parse, the model, options, or MarkdownConverter.

What's supported

Block elements:

  • Headings # to ###### (mapped to Heading 1Heading 6 with cascading fallback)
  • Paragraphs, block quotes, fenced code blocks
  • Bullet and ordered lists (with nesting)
  • Standard markdown tables (header + body rows, alignment markers)
  • CSV embedding via fenced blocks — both file-backed and inline literal data
  • Embedded images (![alt](path)) as figures with auto-numbered captions

Inline elements: bold, italic, inline code, markdown links, and minimal inline <a href="...">…</a> HTML.

Figures

Block-level images become a figure with an auto-numbered caption sourced from the alt text:

![Quarterly revenue chart](charts/revenue.png)

renders the image followed by a Caption-styled paragraph Figure 1: Quarterly revenue chart. The number is a Word SEQ field, so it stays correct after copy/paste or reordering (Word updates fields on print or F9).

Table captions

Markdown has no native table caption syntax. md-reports consumes the paragraph immediately preceding a table when it begins with Table::

Table: Quarterly revenue by region.

| Region | Q1 | Q2 |
|--------|----|----|
| EMEA   | 1  | 2  |

The caption is emitted above the table as Table 1: Quarterly revenue by region. styled with Caption. Figure and table counters are independent. The prefix is configurable via ConversionOptions.table_caption_prefix.

Cross-references

Attach a label to a figure or table by appending {#label} to its alt text or caption, then refer to it from anywhere in the document with a markdown link whose target is #label:

![Quarterly revenue {#fig-revenue}](charts/revenue.png)

Table: Sales by region {#tab-sales}

| Region | Total |
|--------|------:|
| EMEA   | 100   |

See [Figure 1](#fig-revenue) and [](#tab-sales) for details.

Each labelled caption is wrapped in a Word bookmark; each #label link becomes a Word REF field pointing at that bookmark. The link text becomes the cached display value; an empty link text auto-fills as "<Prefix> <Number>" (e.g. Table 1). Forward references work — the parser resolves all labels before rendering. After F9 (or print), Word recomputes both the SEQ counters and the REF fields so reordering or inserting figures keeps numbering and cross-references in sync.

Unknown #label targets degrade to plain text with a warning (or raise in strict_mode).

Preview-friendly label form

The bare {#label} marker shows up as literal text when the markdown is viewed in a plain markdown previewer (GitHub, VS Code, etc.). For table and CSV captions you can wrap the marker in an HTML comment so the marker is invisible in previews while still being picked up by md-reports:

Table: Sales by region <!-- {#tab-sales} -->

| Region | Total |
|--------|------:|
| EMEA   | 100   |

Both forms are supported and behave identically; pick whichever you prefer. The comment must come at the end of the caption. Image alt text is already invisible in previews, so the bare form is fine there and no comment variant is needed.

CSV embedding

Two fenced-block variants render CSV data as a DOCX table.

From a file — the body is a single path resolved against project_root (or the markdown file's directory):

Table: Quarterly revenue.

```csv-file
data/quarterly.csv
```

Inline — the body is the CSV literal itself:

```csv
region,q1,q2
EMEA,1,2
APAC,3,4
```

Either form accepts a no-header flag on the info string to suppress header-row treatment (no row gets bolded; all rows are body):

```csv-file no-header
data/raw.csv
```

CSV-derived tables share the same Table N counter as native markdown tables, accept the same preceding-Table: caption, and use the Table Grid style. The delimiter is auto-detected via csv.Sniffer (falls back to comma); encoding is UTF-8.

Embedding a pandas DataFrame

Pass a DataFrame in the context and pipe it through the built-in to_csv Jinja2 filter inside a csv fence:

Table: Quarterly figures.

```csv
{{ df | to_csv }}
```
import pandas as pd

df = pd.DataFrame(
    {"region": ["EMEA", "APAC"], "q1": [1, 3], "q2": [2, 4]}
)
convert_markdown_text(markdown_text, "out.docx", context={"df": df})

The filter calls value.to_csv(index=False) and strips the trailing newline. Captions, the shared Table N counter, and the no-header flag all work the same as for any csv fence.

The filter is duck-typed on .to_csv() — pandas is not a dependency of md-reports. Any object with a compatible .to_csv() method works (your script provides it). Pass any kwargs supported by the underlying method, e.g.:

```csv
{{ df | to_csv(sep=';', na_rep='—', index=True) }}
```

Jinja2 context

Pass a context dict to inject script-side values into the markdown before parsing. Substitution runs once on the raw markdown text, so values flow into every textual position — body, headings, table cells, image paths, CSV file paths, inline CSV data, captions:

convert_markdown_text(
    "# {{ title | upper }}\n\nGrowth: **{{ pct }}%**",
    "out.docx",
    context={"title": "q1 results", "pct": 14.5},
)

The full Jinja2 syntax is available — variables, filters, conditionals, loops:

# {{ report_title }}

{% for finding in findings %}
- {{ finding }}
{% endfor %}

{% if show_appendix %}
## Appendix

See [details]({{ appendix_url }}).
{% endif %}

Supported value types include str, int, float, bool, None, list/tuple of those, and dict (for attribute access via {{ user.name }}).

Missing-variable behavior:

  • Default mode: a simple {{ name }} whose key is missing renders as the literal {{ name }} and emits a warning — visible breadcrumb, no silent data loss. More complex Jinja2 errors (syntax errors, iteration over an undefined sequence, etc.) cause the markdown to be left unchanged with a warning.
  • strict_mode=True: any undefined variable or template error raises ValidationError.

MarkdownConverter accepts a default_context at construction time and per-call context= overrides that merge over it (call-site keys win).

Options

from md_reports import ConversionOptions

ConversionOptions(
    strict_mode=False,          # raise instead of warn on issues
    figure_caption_prefix="Figure",
    table_caption_prefix="Table",
    project_root=None,          # root for resolving relative paths
                                # to images and CSV files
)

Templates

If template_path is omitted on DocxRenderer (or you don't pass a renderer at all), a packaged default DOCX template is used. To inspect or copy the default:

from md_reports import get_default_template_path

print(get_default_template_path())

The template should provide these styles (fallbacks apply when missing):

  • Normal, Heading 1Heading 6
  • List Bullet, List Number (and their 2/3 variants for nesting)
  • Quote, Caption
  • Table Grid
  • Code (optional; falls back to monospace runs in Normal)

Front matter

Whatever already lives in the template (cover page, headers/footers, title block, table of contents) is preserved — markdown content is appended after the existing body.

Document properties

Set DOCX core properties (the fields shown under File > Info in Word) via properties=:

convert_markdown_text(
    md,
    "report.docx",
    properties={
        "title": "Q4 Report",
        "author": "Jane Doe",
        "subject": "Quarterly review",
        "tags": "revenue, headcount",
        "comments": "Reference: REP-2026-Q4",
        "category": "Finance",
    },
)

To display these in the rendered document, edit the template and insert the matching field via Insert > Quick Parts > Field…Title / Author / Subject / Keywords / Comments / Category. You can place these in the body, header, or footer. Word recomputes fields on F9 / print, same as SEQ and REF.

MarkdownConverter also accepts default_properties=, merged with per-call properties= (call-site keys win).

Accepted keys (case-insensitive) and the core property they target:

Key (and aliases) Core property
title title
author, creator author
subject subject
keywords, tags keywords
comments, description comments
category, categories category
content_status content_status
identifier identifier
language language
version version
last_modified_by last_modified_by

Unknown keys warn (or raise under strict_mode). Datetime properties (created, modified, last_printed, revision) are deliberately not exposed.

Company lives in DOCX extended properties (docProps/app.xml), not core properties, and is not currently writable via this API. Arbitrary user-defined properties (docProps/custom.xml) are likewise not yet supported — use subject/keywords/comments as a host for reference strings or project codes.

Limitations (v1)

  • No CLI.
  • Remote (http(s)://) image fetching is not supported — use local files.
  • Cell merges (rowspan/colspan) and nested tables are not supported.
  • CSV embedding has no per-fence delimiter/encoding overrides yet (UTF-8 + csv.Sniffer only).
  • SEQ field numbers display correctly in Word once fields update (typically on print or pressing F9); the file is written with a pre-computed display value so first-open looks right too.
  • No footnotes, math, definition lists, or task lists.

Errors

All exceptions inherit from MdAstDocxError. Specific types:

  • TemplateError — template missing/unreadable
  • ParseError — markdown could not be parsed
  • RenderError — DOCX rendering failed
  • ValidationError — bad input arguments

Security model

By default, md-reports treats the markdown source as code-equivalent — same trust as the script that calls the library. Two consequences follow that you should know about:

  1. Jinja2 substitution runs in a non-sandboxed environment. Any {{ ... }} expression in the markdown can reach into context values' attributes (e.g. {{ obj.__class__... }}). Useful for things like {{ df | to_csv }}, dangerous if the markdown is untrusted.
  2. Asset paths are not confined. Image and CSV references (![](path), csv-file blocks) accept absolute paths and .. traversals. The library will read whatever the process can read and embed it into the output.

This is fine for the typical use case (developer-authored markdown, script-supplied context). If you ever pass user-controlled markdown into the library — e.g. a CMS, a comment field, or a user-uploaded file — turn on the two hardening flags:

from md_reports import ConversionOptions, convert_markdown_text

convert_markdown_text(
    untrusted_markdown,
    "out.docx",
    context={"title": "Report"},
    options=ConversionOptions(
        sandboxed_context=True,   # use Jinja2's SandboxedEnvironment
        confine_assets=True,      # reject paths outside project_root
        project_root="/srv/uploads/work_dir",
    ),
)

sandboxed_context=True swaps the Jinja2 environment for jinja2.sandbox.SandboxedEnvironment, blocking attribute and built-in access in template expressions. Note this can break filters that rely on duck-typed attribute access (the built-in to_csv filter still works because it uses hasattr/getattr, not expression-level attribute access). Verify your templates against the sandboxed environment before rolling it out.

confine_assets=True enforces that every resolved image and CSV path lies under the asset base — project_root if set, otherwise the markdown file's directory. Absolute paths and .. traversals that escape the base are rejected (warned, or raised under strict_mode).

Also note:

  • Templates (template_path=) are loaded by python-docx / lxml and are otherwise opaque to md-reports. Treat templates as trusted: don't load templates from untrusted sources without reviewing them first.
  • Hyperlink schemes are limited to http://, https://, and mailto:; other schemes (including javascript:, file:, data:) fall back to plain text.
  • Remote image fetching is not supported; http(s):// image paths are explicitly rejected.

Development

uv sync --extra dev
uv run pytest
uv run ruff check src tests
uv run ruff format src tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

md_reports-0.1.1.tar.gz (58.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

md_reports-0.1.1-py3-none-any.whl (63.6 kB view details)

Uploaded Python 3

File details

Details for the file md_reports-0.1.1.tar.gz.

File metadata

  • Download URL: md_reports-0.1.1.tar.gz
  • Upload date:
  • Size: 58.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for md_reports-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4b4eac544c2f853666c21ff0d2466ceab0a191bd0dde117e64e070fd23bbd0f9
MD5 249697ceee13613e93cda5547f7595e9
BLAKE2b-256 04c855742e36b70c623fbc36673436ae2f677b18a3de4a58784f65dd382c579d

See more details on using hashes here.

File details

Details for the file md_reports-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: md_reports-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 63.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for md_reports-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3baa0fff2d84248182da8cb64342d137bc0c62638a8b9ddb4ced88dc2cda1b50
MD5 0fc823e833ab3d0df79a7abe9a4c3d94
BLAKE2b-256 5be04b49f68b29327e2f8176b239ae2af433f07e44341122710281df3c235f97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page