Pandoc filter for embedding data-driven content using Jinja2 templates
Project description
pandoc-embedz
A powerful Pandoc filter for embedding data-driven content in Markdown documents using Jinja2 templates. Transform your data into beautiful documents with minimal setup.
Features
- Full Jinja2 support: loops, conditionals, filters, macros, and all template features
- 9 data formats: CSV, TSV, SSV, lines, JSON, YAML, TOML, SQLite, Excel
- Auto-detection of format from file extension
- Inline and external data sources
- SQL queries for filtering, aggregation, and multi-table JOINs
- Template reuse with
define/templateand{% include %} - Variable scoping: local (
with:), global (global:), type-preserving (bind:), and preamble - Custom filters:
to_dict,raise,regex_replace,regex_search,alias - Standalone rendering mode for shell pipelines and non-Markdown output
tl;dr
Install:
pip install pandoc-embedz
Basic usage:
```embedz
---
data: data.csv
---
{% for row in data %}
- {{ row.name }}: {{ row.value }}
{% endfor %}
```
With template reuse:
```{.embedz define=item-list}
## {{ title }}
{% for item in data %}
- {{ item.name }}: {{ item.value }}
{% endfor %}
```
```{.embedz data=products.csv as=item-list}
with:
title: Product List
```
Note: as= is shorthand. In YAML headers, template: is preferred. See Template Reuse for details.
Render:
pandoc report.md --filter pandoc-embedz -o output.pdf
Works with CSV, JSON, YAML, TOML, SQLite, Excel and more. See Basic Usage to get started, or jump to Advanced Features for SQL queries, multi-table operations, and database access.
Installation
Install from PyPI (stable release):
pip install pandoc-embedz
Or grab the latest main branch directly from GitHub:
pip install git+https://github.com/tecolicom/pandoc-embedz.git
Dependencies: panflute, jinja2, pandas, pyyaml
Note: Requires Pandoc to be installed separately. A comprehensive reference manual is available via man pandoc-embedz after installation.
Basic Usage
These examples cover the most common use cases. Start here to learn the basics.
CSV File (Auto-detected)
```embedz
---
data: data.csv
---
{% for row in data %}
- {{ row.name }}: {{ row.value }}
{% endfor %}
```
JSON Structure
```embedz
---
data: report.json
---
# {{ data.title }}
{% for section in data.sections %}
## {{ section.name }}
{% for item in section['items'] %}
- {{ item }}
{% endfor %}
{% endfor %}
```
Inline Data
```embedz
---
format: json
---
{% for item in data %}
- {{ item.name }}: {{ item.count }}
{% endfor %}
---
[
{"name": "Apple", "count": 10},
{"name": "Banana", "count": 5}
]
```
Conditionals
Use Jinja2 if/elif/else to show different content based on data values:
```embedz
---
data: alerts.csv
---
{% for row in data %}
{% if row.severity == 'high' %}
- **URGENT**: {{ row.title }} ({{ row.count }} cases)
{% elif row.severity == 'medium' %}
- {{ row.title }} - {{ row.count }} reported
{% else %}
- {{ row.title }}
{% endif %}
{% endfor %}
```
Template Reuse
Define templates once with define, then reuse them with template (or as for short). Perfect for consistent formatting across multiple data sources:
```{.embedz define=item-list}
## {{ title }}
{% for item in data %}
- {{ item.name }}: {{ item.value }}
{% endfor %}
```
```embedz
---
data: products.csv
template: item-list
with:
title: Product List
---
```
Or more concisely with attribute syntax:
```{.embedz data=services.csv as=item-list}
with:
title: Service List
```
Code Block Syntax
An embedz code block can have up to three sections separated by ---:
```embedz
---
YAML configuration
---
Jinja2 template
---
Inline data (optional)
```
- First
---: Opens YAML header - Second
---: Closes YAML header, begins template section - Third
---: Separates template from inline data (optional)
Block Types
Data processing (most common) --- loads data and renders with a template:
```{.embedz data=file.csv}
{% for row in data %}
- {{ row.name }}
{% endfor %}
```
Template definition --- stores a named template for reuse (no output):
```{.embedz define=my-template}
{% for item in data %}
- {{ item.value }}
{% endfor %}
```
Template usage --- applies a previously defined template:
```{.embedz data=file.csv as=my-template}
```
With YAML configuration via attributes:
```{.embedz data=file.csv as=my-template}
with:
title: Report
```
With inline data (note the three --- separators):
```embedz
---
template: my-template
format: json
---
---
[{"value": "item1"}, {"value": "item2"}]
```
The structure is: YAML header -> (empty template section) -> inline data.
Variable definition --- sets global variables without output:
```embedz
---
global:
author: John Doe
version: 1.0
---
```
Content Interpretation (without ---)
When a block has no --- separator, the content is interpreted based on attributes:
| Attributes | Content Interpretation |
|---|---|
data + template/as |
YAML configuration |
template/as only |
Inline data |
define |
Template definition |
(none) or data only |
Template |
When --- is present, the standard three-section structure applies regardless of attributes.
See
man pandoc-embedzfor the complete configuration options reference.
Variable Scoping
pandoc-embedz provides five mechanisms for managing variables:
| Mechanism | Scope | Type Handling | Use Case |
|---|---|---|---|
with: |
Block-local | As-is | Input parameters, local constants |
bind: |
Document-wide | Type-preserving (dict, list, int, bool) | Extracting data, computations |
global: |
Document-wide | String (templates expanded) | Labels, messages, query strings |
alias: |
Document-wide | Key aliasing | Alternative key names for dicts |
preamble: |
Document-wide | Jinja2 control structures | Macros, {% set %} variables |
Processing order: preamble -> with -> query -> data load -> bind -> global -> alias -> render
Local Variables with with:
Block-scoped variables for parameters and constants:
```embedz
---
data: products.csv
with:
tax_rate: 0.08
currency: USD
---
{% for item in data %}
- {{ item.name }}: {{ currency }} {{ (item.price * (1 + tax_rate)) | round(2) }}
{% endfor %}
```
Global Variables with global:
Document-wide variables. Values containing {{ or {% are expanded as templates; the result is always a string.
```embedz
---
global:
author: John Doe
version: 1.0
---
```
```embedz
---
data: report.csv
---
# Report by {{ author }}
{% for row in data %}
- {{ row.item }}
{% endfor %}
```
Note: The
global.prefix is optional. For type-preserving values (dict, list, int, bool), usebind:instead.
Type-Preserving Bindings with bind:
Evaluate expressions while preserving their result types:
```embedz
---
format: csv
bind:
first_row: data | first
total: data | sum(attribute='value')
has_data: data | length > 0
---
Name: {{ first_row.name }}, Total: {{ total }}, Has data: {{ has_data }}
---
name,value
Alice,100
Bob,200
```
Dot notation for setting nested values is supported in both bind: and global::
bind:
record: data | first
record.note: "'Added by bind'"
global:
record.label: Description
See
man pandoc-embedzfor details onalias:andpreamble:, as well as nested structures and dot notation.
Advanced Features
These features enable powerful data processing, database access, and complex document generation workflows.
SQL Queries on CSV/TSV
Filter, aggregate, and transform CSV/TSV data using SQL:
```embedz
---
data: sales.csv
query: |
SELECT
product,
SUM(quantity) as total_quantity,
SUM(amount) as total_sales
FROM data
GROUP BY product
ORDER BY total_sales DESC
---
| Product | Quantity | Sales |
|---------|----------|-------|
{% for row in data -%}
| {{ row.product }} | {{ row.total_quantity }} | ${{ row.total_sales }} |
{% endfor -%}
```
Note: Table name is always data. CSV/TSV data is loaded into an in-memory SQLite database for querying.
Query Template Variables
Share SQL query logic across multiple blocks using global variables:
```{.embedz}
---
global:
year: 2024
start_date: "{{ year }}-01-01"
end_date: "{{ year }}-12-31"
date_filter: date BETWEEN '{{ start_date }}' AND '{{ end_date }}'
---
```
```{.embedz data=sales.csv}
---
query: "SELECT * FROM data WHERE {{ date_filter }}"
---
{% for row in data %}
- {{ row.date }}: ${{ row.amount }}
{% endfor %}
```
Variables are expanded in definition order, so later variables can reference earlier ones.
SQLite Database
Query SQLite database files directly:
```embedz
---
data: analytics.db
query: SELECT category, COUNT(*) as count FROM events WHERE date >= '2024-01-01' GROUP BY category
---
| Category | Count |
|----------|-------|
{% for row in data -%}
| {{ row.category }} | {{ row.count }} |
{% endfor -%}
```
Use the table parameter to read all rows from a specific table without a custom query.
Excel Files
Read .xlsx / .xls files directly. Requires openpyxl (pip install pandoc-embedz[excel]). Leading blank rows and all-blank columns are automatically skipped.
```embedz
---
data: report.xlsx
table: Sheet2
---
{% for row in data %}
- {{ row.item }}
{% endfor %}
```
Use startrow to skip leading description rows. Accepts an integer (1-indexed), a string to find automatically, or a list (AND logic):
```{.embedz data=report.xlsx startrow="name"}
{% for row in data %}
- {{ row.name }}: {{ row.value }}
{% endfor %}
```
Use transpose: true when headers run down the first column. Use header: false when there is no header row.
See
man pandoc-embedzfor the fullstartrowsyntax and Excel-specific details.
Multi-Table Data
Load multiple data files and access them directly or combine with SQL:
Direct access (no SQL):
```embedz
---
data:
config: config.yaml
sales: sales.csv
---
# {{ data.config.title }}
{% for row in data.sales %}
- {{ row.date }}: {{ row.amount }}
{% endfor %}
```
SQL JOIN (with query):
```embedz
---
data:
products: products.csv
sales: sales.csv
query: |
SELECT p.product_name, SUM(s.quantity) as total
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY p.product_name
---
{% for row in data %}
- {{ row.product_name }}: {{ row.total }}
{% endfor %}
```
file: dict with parameters (e.g., Excel sheets):
```embedz
---
data:
incidents:
file: data/report.xlsx
table: Incidents
phishing:
file: data/report.xlsx
table: Phishing
startrow: year
query: |
SELECT i.month, i.count, p.domestic
FROM incidents i
JOIN phishing p ON i.month = p.month
---
{% for row in data %}
- {{ row.month }}: {{ row.count }} (domestic: {{ row.domestic }})
{% endfor %}
```
Variable references, file paths, and inline data can be mixed freely within a data: dict.
See MULTI_TABLE.md for comprehensive examples and documentation.
Template Macros
Create reusable template functions with Jinja2 macros:
```{.embedz define=formatters}
{% macro format_item(title, date) -%}
**{{ title }}** ({{ date }})
{%- endmacro %}
```
```embedz
---
data: vulnerabilities.csv
---
{% from 'formatters' import format_item %}
{% for item in data %}
- {{ format_item(item.title, item.date) }}
{% endfor %}
```
Preamble & Macro Sharing
Use the preamble section to define reusable control structures across all blocks. Named templates can also share macros via {% from ... import %}:
```{.embedz define=sql-macros}
{%- macro BETWEEN(start, end) -%}
SELECT * FROM data WHERE date BETWEEN '{{ start }}' AND '{{ end }}'
{%- endmacro -%}
```
```embedz
---
global:
fiscal_year: 2024
start_date: "{{ fiscal_year }}-04-01"
end_date: "{{ fiscal_year + 1 }}-03-31"
_import: "{% from 'sql-macros' import BETWEEN %}"
yearly_query: "{{ BETWEEN(start_date, end_date) }}"
---
```
Comments in CSV/TSV/SSV
Lines starting with # are treated as comments and skipped by default. The comment parameter controls behavior: line (default), head, inline, or none.
```{.embedz data=data.csv comment=head}
{% for row in data %}
- {{ row.name }}: {{ row.value }}
{% endfor %}
```
Standalone Rendering
Render Markdown or LaTeX files without running full Pandoc:
pandoc-embedz --standalone templates/report.tex -c config/base.yaml -o build/report.tex
Command-line options:
--standalone(-s) enables standalone mode--template TEXT(-t) specifies template text directly--format FORMAT(-f) specifies data format for stdin--config FILE(-c) loads external YAML config file(s) (repeatable)--output FILE(-o) writes output to file (default: stdout)--debug(-d) enables debug output to stderr
Quick examples:
# Format CSV data from stdin
cat data.csv | pandoc-embedz -s -t '{% for row in data %}{{ row.name }}\n{% endfor %}' -f csv
# Use template file (data auto-read from stdin)
cat data.csv | pandoc-embedz -s template.md
# Static template without data
pandoc-embedz -s -t 'Static content'
External Config Files
Both filter and standalone modes can load shared configuration:
```embedz
---
config:
- config/base.yaml
- config/overrides.yaml
---
```
pandoc-embedz -s report.md -c config/base.yaml -c config/latex.yaml
Config files support multiple YAML documents separated by --- for logical grouping.
See
man pandoc-embedzfor details on stdin behavior, multi-document YAML, and config merging.
Best Practices
CSV Output Escaping
When generating CSV from templates, use a macro for proper escaping:
{%- macro csv_escape(value) -%}
{%- set v = value | string -%}
{%- if ',' in v or '"' in v or '\n' in v -%}
"{{ v | replace('"', '""') }}"
{%- else -%}
{{ v }}
{%- endif -%}
{%- endmacro -%}
File Extension Recommendations
.emz- Recommended for standalone templates (non-Markdown output).embedz- Descriptive alternative.md- Only for templates that generate Markdown
Pipeline Processing
Combine pandoc-embedz with other tools for data transformation:
extract_tool database table --columns 1-10 | \
pandoc-embedz -s transform.emz | \
post_process_tool > output.csv
Use -s (standalone mode) for pipeline processing. Each .emz file handles one transformation step.
Debugging
Enable debug output with the PANDOC_EMBEDZ_DEBUG environment variable (accepts 1, true, or yes) or the -d flag in standalone mode:
PANDOC_EMBEDZ_DEBUG=1 pandoc input.md --filter pandoc-embedz -o output.pdf
pandoc-embedz -s -d template.md
Related Tools
Similar Pandoc Filters (on PyPI)
- pantable - CSV/TSV to table with powerful options, table-focused
- pandoc-jinja - Document-wide metadata expansion, not for code blocks
- pandoc-include - Include external files with template support
- pandoc-pyrun - Execute Python code in code blocks
Additional Tools
- pandoc-csv2table (Haskell) - CSV to table conversion only
- Quarto - Comprehensive publishing system based on Pandoc. Excellent for data science and technical documents, but requires dedicated environment and workflow
- R Markdown - Similar to Quarto, requires R environment
- Lua Filters - Requires custom Lua scripting for each use case
Why pandoc-embedz?
pandoc-embedz fills a unique niche:
- Full Jinja2 templating (loops, conditionals, filters)
- Multiple data formats (CSV, JSON, YAML, TOML, SQLite, Excel, etc.)
- Code block level processing (not document-wide)
- Lightweight - no heavy dependencies
- Works with existing Pandoc workflow
See COMPARISON.md for detailed comparison.
Documentation
- REFERENCE.md --- comprehensive reference manual (options, syntax, data formats, variable scoping, custom filters); also available via
man pandoc-embedz - MULTI_TABLE.md --- multi-table SQL query examples
- COMPARISON.md --- comparison with alternative tools
License
MIT License
Copyright © 2025-2026 Office TECOLI, LLC and Kazumasa Utashiro
See LICENSE file for details.
Author
Kazumasa Utashiro
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Development Setup
Using uv (Recommended)
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone the repository
git clone https://github.com/tecolicom/pandoc-embedz.git
cd pandoc-embedz
# Install dependencies and setup development environment
uv sync --all-extras
# Run tests
uv run pytest tests/
Using pip
# Clone the repository
git clone https://github.com/tecolicom/pandoc-embedz.git
cd pandoc-embedz
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in editable mode with dev dependencies
pip install -e .[dev]
# Run tests
pytest tests/
For detailed development guidelines, see AGENTS.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pandoc_embedz-0.22.3.tar.gz.
File metadata
- Download URL: pandoc_embedz-0.22.3.tar.gz
- Upload date:
- Size: 91.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7cb05356bb5a0b685113cfb65a65024136a03602ae2f4af7b1f59d431f586d3c
|
|
| MD5 |
533339b4c0bfbeaad682bf440fb781dc
|
|
| BLAKE2b-256 |
ff280f9406a3a11395a02e3dc71c30ee20a450d26fafb0894fbf1c7f22aa575b
|
Provenance
The following attestation bundles were made for pandoc_embedz-0.22.3.tar.gz:
Publisher:
publish.yml on tecolicom/pandoc-embedz
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pandoc_embedz-0.22.3.tar.gz -
Subject digest:
7cb05356bb5a0b685113cfb65a65024136a03602ae2f4af7b1f59d431f586d3c - Sigstore transparency entry: 1189999115
- Sigstore integration time:
-
Permalink:
tecolicom/pandoc-embedz@16e73b59a7a2dff1b24486bb9632e5ca6d6ea98e -
Branch / Tag:
refs/tags/v0.22.3 - Owner: https://github.com/tecolicom
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@16e73b59a7a2dff1b24486bb9632e5ca6d6ea98e -
Trigger Event:
release
-
Statement type:
File details
Details for the file pandoc_embedz-0.22.3-py3-none-any.whl.
File metadata
- Download URL: pandoc_embedz-0.22.3-py3-none-any.whl
- Upload date:
- Size: 47.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9be38a656b4fa88692def6734d825c9f27a66baa75e5eeee05d34940cef03326
|
|
| MD5 |
192b78400a8ce6f38c70409b522f2992
|
|
| BLAKE2b-256 |
718d60f019d2c978fc79839623f805d5b89e00acac00dec7c9e6ccbf65a15f97
|
Provenance
The following attestation bundles were made for pandoc_embedz-0.22.3-py3-none-any.whl:
Publisher:
publish.yml on tecolicom/pandoc-embedz
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pandoc_embedz-0.22.3-py3-none-any.whl -
Subject digest:
9be38a656b4fa88692def6734d825c9f27a66baa75e5eeee05d34940cef03326 - Sigstore transparency entry: 1189999130
- Sigstore integration time:
-
Permalink:
tecolicom/pandoc-embedz@16e73b59a7a2dff1b24486bb9632e5ca6d6ea98e -
Branch / Tag:
refs/tags/v0.22.3 - Owner: https://github.com/tecolicom
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@16e73b59a7a2dff1b24486bb9632e5ca6d6ea98e -
Trigger Event:
release
-
Statement type: