Bidirectional Markdown <-> Confluence Storage XHTML converter with lossless opaque preservation.

These details have not been verified by PyPI

Project links

Project description

cfxmark

Bidirectional Markdown ↔ Confluence Storage XHTML converter — with lossless opaque preservation for everything cfxmark doesn't explicitly know how to convert.

import cfxmark

# Markdown → Confluence storage XHTML
result = cfxmark.to_cfx(markdown_text)
result.xhtml          # str    — ready for Confluence REST PUT
result.attachments    # tuple  — local file refs the caller should upload
result.warnings       # tuple  — human-readable conversion warnings

# Confluence storage XHTML → Markdown
result = cfxmark.to_md(xhtml_text)
result.markdown       # str    — canonical markdown
result.warnings       # tuple

ConversionResult is the same dataclass for both directions — xhtml is populated for to_cfx, markdown for to_md.

Why another converter?

Two existing projects inspired this one — md2cf and md2conf — but both are one-directional (md → cf) and neither preserves unknown macros across a round trip. cfxmark fills both gaps:

Bidirectional. to_md(to_cfx(m)) is byte-identical to canonicalize(m) for every construct in the supported subset.
Opaque preservation. Confluence content cfxmark doesn't understand (custom plugins, drawio diagrams, exotic table cells) round-trips byte-for-byte, including the ac:macro-id UUID. Confluence treats the round-tripped macro as the same instance, so comments, attachments, and permissions stay attached.
Pure text-in / text-out. No Confluence API, no network, no attachment upload. The caller owns REST I/O. (See "Image assets" below for the helper function that lets the caller plug in network-bound logic without bloating cfxmark.)

Install

# With uv (recommended):
uv add cfxmark

# With pip:
pip install cfxmark

cfxmark depends on lxml and mistletoe. Python 3.10+.

The contract

cfxmark grades every Confluence construct into one of three buckets:

Grade	Description	Behaviour
I — Native	Standard CommonMark / GFM (headings, lists, tables, code fences, links, images, blockquote, hr, inline emphasis)	Lossless round-trip after canonicalization.
II — Directive	Confluence macros with a known Markdown directive mapping (`info`, `note`, `warning`, `tip`, `jira`, `expand`, `toc`)	Lossless after canonicalization. Pluggable via `MacroRegistry`.
III — Opaque	Everything else	Captured byte-for-byte through cfxmark's opaque-block / inline-opaque mechanism. Never dropped, never rewritten.

See docs/SPEC.md for the full mapping table and docs/OPAQUE.md for the opaque-block format.

Usage

Round-trip a Confluence page through Markdown

import cfxmark

# Whatever fetched the page (REST API call, exported XML file, …)
xhtml = my_confluence_client.get_storage_format(page_id)

# Convert to Markdown
md_result = cfxmark.to_md(xhtml)
markdown = md_result.markdown

# … user edits the Markdown …

# Convert back to Confluence storage XHTML
cfx_result = cfxmark.to_cfx(markdown)
my_confluence_client.update_page(page_id, cfx_result.xhtml)

# Optionally upload any newly referenced local images
for filename in cfx_result.attachments:
    my_confluence_client.upload_attachment(page_id, filename)

Image assets

When you convert a Confluence page that references uploaded attachments, the resulting Markdown looks like this:

![](image-3.png#cfxmark:w=700)<!-- cfxmark:asset src="image-3.png" -->

The image link still points at the original Confluence filename (broken in any local Markdown viewer until you fetch the bytes), and the  HTML comment carries enough metadata for a follow-up step to fetch and embed.

cfxmark.resolve_assets is that follow-up step. You provide a fetcher callback that returns bytes for one filename at a time, and choose between two output strategies:

import cfxmark
from pathlib import Path

def fetcher(filename: str) -> bytes:
    # Whatever you use to download from Confluence:
    return my_confluence_client.download_attachment(page_id, filename)

# Strategy A — sidecar directory (recommended for git-tracked docs).
# Saves bytes to ./assets/ and rewrites links to relative paths.
md = cfxmark.resolve_assets(
    md_result.markdown,
    fetcher,
    mode="sidecar",
    asset_dir="docs/page-42/assets",
    md_path="docs/page-42.md",
)
Path("docs/page-42.md").write_text(md)
# docs/page-42/assets/image-3.png exists
# md link: ![](assets/image-3.png#cfxmark:w=700)<!-- cfxmark:asset src="image-3.png" -->

# Strategy B — inline data URIs (single self-contained file).
md = cfxmark.resolve_assets(md_result.markdown, fetcher, mode="inline")
# md link: ![](data:image/png;base64,iVBORw0K...)<!-- cfxmark:asset src="image-3.png" -->

The asset markers are preserved through both strategies, so resolve_assets is idempotent and a subsequent to_cfx call always recovers the original Confluence filename — even if the visible link target has been rewritten to a sidecar path or a data URI.

Mermaid diagrams

cfxmark maps Markdown's ```mermaid fenced code block to Confluence's code macro with language=mermaid. If your Confluence instance has a Mermaid plugin installed (e.g. Mermaid Diagrams for Confluence) it will render the diagram automatically; otherwise the content is shown as a syntax-highlighted code block.

```mermaid
graph LR
  A --> B --> C
```

Inline opaque references

Inline elements that have no native Markdown form — Confluence user mentions, inline Jira issue macros, custom widget invocations, … — become a short Markdown link with a cfx:op-... URL:

Contact the purchaser ([@user-2c9402cc](cfx:op-4fab0f8d))

The [label] is auto-derived from the underlying element type (@user-…, jira:PROJ-1, cfx:status, …) and the op-XXXXXXXX ID is a SHA-256 prefix of the original XML payload. The full XML lives in a cfxmark:payloads sidecar at the bottom of the same Markdown file:

<!-- cfxmark:payloads -->
<!-- op-4fab0f8d
<ac:link><ri:user ri:userkey="2c9402cc83d4bcc40183d976ef730001"/></ac:link>
-->
<!-- /cfxmark:payloads -->

The SHA-256 fingerprint means a user who types that exact link syntax in their own Markdown is not silently re-interpreted as an opaque payload — the verification fails and the region falls back to ordinary text.

Block opaque blocks

Block-level Confluence content cfxmark doesn't know how to convert (e.g. drawio diagrams, plantuml, complex tables) is wrapped in a fenced code block with sentinel comments:

<!-- cfxmark:opaque id="op-1188e2b4" -->
```cfx-storage
<ac:structured-macro ac:name="drawio" ac:macro-id="...">
  <ac:parameter ac:name="diagramName">flow</ac:parameter>
  ...
</ac:structured-macro>
```
<!-- /cfxmark:opaque -->

Editors render this as a clearly visible code block — a "do not touch" signal for human readers. The Markdown parser detects the sentinels first and round-trips the contents byte-for-byte, including the original ac:macro-id UUID that Confluence uses to identify macro instances.

Header notice

When a converted Markdown document contains any opaque or directive markers, cfxmark prepends a single-line HTML comment explaining the conventions to humans and AI agents:

<!-- cfxmark:notice Converted from Confluence storage format. Inline
[label](cfx:op-XXXXXXXX) references preserve Confluence content that
has no native Markdown form; the raw XML for each lives in the
cfxmark:payloads sidecar at the bottom of this file. Do not edit
those references or the sidecar — tampering invalidates a SHA-256
fingerprint and the round trip falls back to plain text. -->

The comment is invisible in any Markdown viewer.

Custom macros

Promote a Confluence macro from "opaque" to "directive" by registering a custom handler:

import cfxmark
from cfxmark.macros import MacroRegistry
from cfxmark.macros.builtins import AdmonitionHandler

# Start from the default registry and add your own.
my_registry = cfxmark.default_registry.copy()
# Built-in AdmonitionHandler accepts one of: "info", "note", "warning", "tip".
# To promote a previously-opaque macro, write a small MacroHandler subclass —
# see cfxmark/macros/builtins/admonition.py for a complete example.
my_registry.register(AdmonitionHandler("warning"))

result = cfxmark.to_md(xhtml, macros=my_registry)

Implementing a MacroHandler from scratch requires a small amount of lxml knowledge — see cfxmark/macros/builtins/admonition.py for a complete example. A higher-level handler API that hides lxml is planned for v0.2.

Canonicalization helpers

Two Confluence storage fragments are "the same" only after a deep normalization pass that strips volatile attributes, editor noise, and rendering hints. Use canonicalize_cfx to compare two snapshots:

import cfxmark

c1 = cfxmark.canonicalize_cfx(original_xhtml)
c2 = cfxmark.canonicalize_cfx(round_tripped_xhtml)
assert c1 == c2  # passes for any document in the supported subset

canonicalize_cfx is the same function the test suite uses to verify byte-identical round trips against real Confluence pages.

Security

cfxmark hardens its XML parser against XXE and billion-laughs attacks:

Inputs containing <!DOCTYPE> or <!ENTITY> declarations are rejected before lxml ever sees them.
The lxml parser is configured with no_network=True, load_dtd=False, and huge_tree=False.
Opaque-block sentinels are SHA-256 verified — accidental sentinel syntax in user-typed Markdown does not become a real opaque block.

If you find a security issue, please open a GitHub issue.

Development

git clone https://github.com/eunsanMountain/cfxmark
cd cfxmark
uv sync --all-extras

# Run all tests
uv run pytest

# Type-check
uv run mypy src/

# Lint
uv run ruff check .

# Build
uv build

The corpus tests look for .cfx files in tests/corpus/ (gitignored to keep your own private samples out of version control). Drop your own Confluence storage XHTML there and they will be exercised by pytest tests/test_corpus.py.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Apr 13, 2026

0.3.0

Apr 8, 2026

0.2.0

Apr 8, 2026

This version

0.1.3

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cfxmark-0.1.3.tar.gz (78.5 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cfxmark-0.1.3-py3-none-any.whl (70.4 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file cfxmark-0.1.3.tar.gz.

File metadata

Download URL: cfxmark-0.1.3.tar.gz
Upload date: Apr 7, 2026
Size: 78.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for cfxmark-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`40e8a9f0be6e09acfba644cfcec0bfc3a97014691a376048c20a1940515362cd`
MD5	`19cbfc6758faafea3a6a1a3b3f17f04b`
BLAKE2b-256	`57653e68af6fc16c39ed3af8748e66b0590df4726265d90ffc0e83e342d09078`

See more details on using hashes here.

File details

Details for the file cfxmark-0.1.3-py3-none-any.whl.

File metadata

Download URL: cfxmark-0.1.3-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 70.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for cfxmark-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07bc0845889b593d19514cba3d93810dd7a8c4ebc3f36144c6d9eabacf3103cf`
MD5	`236368f4f3b78fd794295f131528a589`
BLAKE2b-256	`bb3f334e6d8de4e104645a58840ff617cd77c7d95a3cec5a1701634d8da475ab`

See more details on using hashes here.

cfxmark 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cfxmark

Why another converter?

Install

The contract

Usage

Round-trip a Confluence page through Markdown

Image assets

Mermaid diagrams

Inline opaque references

Block opaque blocks

Header notice

Custom macros

Canonicalization helpers

Security

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes