Pandoc filter for embedding data-driven content using Jinja2 templates

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

pandoc-embedz

A powerful Pandoc filter for embedding data-driven content in Markdown documents using Jinja2 templates. Transform your data into beautiful documents with minimal setup.

Features

Full Jinja2 support: loops, conditionals, filters, macros, and all template features
9 data formats: CSV, TSV, SSV, lines, JSON, YAML, TOML, SQLite, Excel
Auto-detection of format from file extension
Inline and external data sources
SQL queries for filtering, aggregation, and multi-table JOINs
Template reuse with define/template and {% include %}
Variable scoping: local (with:), global (global:), type-preserving (bind:), and preamble
Custom filters: to_dict, raise, regex_replace, regex_search, alias
Standalone rendering mode for shell pipelines and non-Markdown output

tl;dr

Install:

pip install pandoc-embedz

Basic usage:

```embedz
---
data: data.csv
---
{% for row in data %}
- {{ row.name }}: {{ row.value }}
{% endfor %}
```

With template reuse:

```{.embedz define=item-list}
## {{ title }}
{% for item in data %}
- {{ item.name }}: {{ item.value }}
{% endfor %}
```

```{.embedz data=products.csv as=item-list}
with:
  title: Product List
```

Note: as= is shorthand. In YAML headers, template: is preferred. See Template Reuse for details.

Render:

pandoc report.md --filter pandoc-embedz -o output.pdf

Works with CSV, JSON, YAML, TOML, SQLite, Excel and more. See Basic Usage to get started, or jump to Advanced Features for SQL queries, multi-table operations, and database access.

Installation

Install from PyPI (stable release):

pip install pandoc-embedz

Or grab the latest main branch directly from GitHub:

pip install git+https://github.com/tecolicom/pandoc-embedz.git

Dependencies: panflute, jinja2, pandas, pyyaml

Note: Requires Pandoc to be installed separately. A comprehensive reference manual is available via man pandoc-embedz after installation.

Basic Usage

These examples cover the most common use cases. Start here to learn the basics.

CSV File (Auto-detected)

```embedz
---
data: data.csv
---
{% for row in data %}
- {{ row.name }}: {{ row.value }}
{% endfor %}
```

JSON Structure

```embedz
---
data: report.json
---
# {{ data.title }}

{% for section in data.sections %}
## {{ section.name }}
{% for item in section['items'] %}
- {{ item }}
{% endfor %}
{% endfor %}
```

Inline Data

```embedz
---
format: json
---
{% for item in data %}
- {{ item.name }}: {{ item.count }}
{% endfor %}
---
[
  {"name": "Apple", "count": 10},
  {"name": "Banana", "count": 5}
]
```

Conditionals

Use Jinja2 if/elif/else to show different content based on data values:

```embedz
---
data: alerts.csv
---
{% for row in data %}
{% if row.severity == 'high' %}
- **URGENT**: {{ row.title }} ({{ row.count }} cases)
{% elif row.severity == 'medium' %}
- {{ row.title }} - {{ row.count }} reported
{% else %}
- {{ row.title }}
{% endif %}
{% endfor %}
```

Template Reuse

Define templates once with define, then reuse them with template (or as for short). Perfect for consistent formatting across multiple data sources:

```{.embedz define=item-list}
## {{ title }}
{% for item in data %}
- {{ item.name }}: {{ item.value }}
{% endfor %}
```

```embedz
---
data: products.csv
template: item-list
with:
  title: Product List
---
```

Or more concisely with attribute syntax:

```{.embedz data=services.csv as=item-list}
with:
  title: Service List
```

Code Block Syntax

An embedz code block can have up to three sections separated by ---:

```embedz
---
YAML configuration
---
Jinja2 template
---
Inline data (optional)
```

First ---: Opens YAML header
Second ---: Closes YAML header, begins template section
Third ---: Separates template from inline data (optional)

Block Types

Data processing (most common) --- loads data and renders with a template:

```{.embedz data=file.csv}
{% for row in data %}
- {{ row.name }}
{% endfor %}
```

Template definition --- stores a named template for reuse (no output):

```{.embedz define=my-template}
{% for item in data %}
- {{ item.value }}
{% endfor %}
```

Template usage --- applies a previously defined template:

```{.embedz data=file.csv as=my-template}
```

With YAML configuration via attributes:

```{.embedz data=file.csv as=my-template}
with:
  title: Report
```

With inline data (note the three --- separators):

```embedz
---
template: my-template
format: json
---
---
[{"value": "item1"}, {"value": "item2"}]
```

The structure is: YAML header -> (empty template section) -> inline data.

Variable definition --- sets global variables without output:

```embedz
---
global:
  author: John Doe
  version: 1.0
---
```

Content Interpretation (without `---`)

When a block has no --- separator, the content is interpreted based on attributes:

Attributes	Content Interpretation
`data` + `template`/`as`	YAML configuration
`template`/`as` only	Inline data
`define`	Template definition
(none) or `data` only	Template

When --- is present, the standard three-section structure applies regardless of attributes.

See man pandoc-embedz for the complete configuration options reference.

Variable Scoping

pandoc-embedz provides five mechanisms for managing variables:

Mechanism	Scope	Type Handling	Use Case
`with:`	Block-local	As-is	Input parameters, local constants
`bind:`	Document-wide	Type-preserving (dict, list, int, bool)	Extracting data, computations
`global:`	Document-wide	String (templates expanded)	Labels, messages, query strings
`alias:`	Document-wide	Key aliasing	Alternative key names for dicts
`preamble:`	Document-wide	Jinja2 control structures	Macros, `{% set %}` variables

Processing order: preamble -> with -> query -> data load -> bind -> global -> alias -> render

Local Variables with `with:`

Block-scoped variables for parameters and constants:

```embedz
---
data: products.csv
with:
  tax_rate: 0.08
  currency: USD
---
{% for item in data %}
- {{ item.name }}: {{ currency }} {{ (item.price * (1 + tax_rate)) | round(2) }}
{% endfor %}
```

Global Variables with `global:`

Document-wide variables. Values containing {{ or {% are expanded as templates; the result is always a string.

```embedz
---
global:
  author: John Doe
  version: 1.0
---
```

```embedz
---
data: report.csv
---
# Report by {{ author }}

{% for row in data %}
- {{ row.item }}
{% endfor %}
```

Note: The global. prefix is optional. For type-preserving values (dict, list, int, bool), use bind: instead.

Type-Preserving Bindings with `bind:`

Evaluate expressions while preserving their result types:

```embedz
---
format: csv
bind:
  first_row: data | first
  total: data | sum(attribute='value')
  has_data: data | length > 0
---
Name: {{ first_row.name }}, Total: {{ total }}, Has data: {{ has_data }}
---
name,value
Alice,100
Bob,200
```

Dot notation for setting nested values is supported in both bind: and global::

bind:
  record: data | first
  record.note: "'Added by bind'"
global:
  record.label: Description

See man pandoc-embedz for details on alias: and preamble:, as well as nested structures and dot notation.

Advanced Features

These features enable powerful data processing, database access, and complex document generation workflows.

SQL Queries on CSV/TSV

Filter, aggregate, and transform CSV/TSV data using SQL:

```embedz
---
data: sales.csv
query: |
  SELECT
    product,
    SUM(quantity) as total_quantity,
    SUM(amount) as total_sales
  FROM data
  GROUP BY product
  ORDER BY total_sales DESC
---
| Product | Quantity | Sales |
|---------|----------|-------|
{% for row in data -%}
| {{ row.product }} | {{ row.total_quantity }} | ${{ row.total_sales }} |
{% endfor -%}
```

Note: Table name is always data. CSV/TSV data is loaded into an in-memory SQLite database for querying.

Query Template Variables

Share SQL query logic across multiple blocks using global variables:

```{.embedz}
---
global:
  year: 2024
  start_date: "{{ year }}-01-01"
  end_date: "{{ year }}-12-31"
  date_filter: date BETWEEN '{{ start_date }}' AND '{{ end_date }}'
---
```

```{.embedz data=sales.csv}
---
query: "SELECT * FROM data WHERE {{ date_filter }}"
---
{% for row in data %}
- {{ row.date }}: ${{ row.amount }}
{% endfor %}
```

Variables are expanded in definition order, so later variables can reference earlier ones.

SQLite Database

Query SQLite database files directly:

```embedz
---
data: analytics.db
query: SELECT category, COUNT(*) as count FROM events WHERE date >= '2024-01-01' GROUP BY category
---
| Category | Count |
|----------|-------|
{% for row in data -%}
| {{ row.category }} | {{ row.count }} |
{% endfor -%}
```

Use the table parameter to read all rows from a specific table without a custom query.

Excel Files

Read .xlsx / .xls files directly. Requires openpyxl (pip install pandoc-embedz[excel]). Leading blank rows and all-blank columns are automatically skipped.

```embedz
---
data: report.xlsx
table: Sheet2
---
{% for row in data %}
- {{ row.item }}
{% endfor %}
```

Use startrow to skip leading description rows. Accepts an integer (1-indexed), a string to find automatically, or a list (AND logic):

```{.embedz data=report.xlsx startrow="name"}
{% for row in data %}
- {{ row.name }}: {{ row.value }}
{% endfor %}
```

Use transpose: true when headers run down the first column. Use header: false when there is no header row.

See man pandoc-embedz for the full startrow syntax and Excel-specific details.

Multi-Table Data

Load multiple data files and access them directly or combine with SQL:

Direct access (no SQL):

```embedz
---
data:
  config: config.yaml
  sales: sales.csv
---
# {{ data.config.title }}
{% for row in data.sales %}
- {{ row.date }}: {{ row.amount }}
{% endfor %}
```

SQL JOIN (with query):

```embedz
---
data:
  products: products.csv
  sales: sales.csv
query: |
  SELECT p.product_name, SUM(s.quantity) as total
  FROM sales s
  JOIN products p ON s.product_id = p.product_id
  GROUP BY p.product_name
---
{% for row in data %}
- {{ row.product_name }}: {{ row.total }}
{% endfor %}
```

file: dict with parameters (e.g., Excel sheets):

```embedz
---
data:
  incidents:
    file: data/report.xlsx
    table: Incidents
  phishing:
    file: data/report.xlsx
    table: Phishing
    startrow: year
query: |
  SELECT i.month, i.count, p.domestic
  FROM incidents i
  JOIN phishing p ON i.month = p.month
---
{% for row in data %}
- {{ row.month }}: {{ row.count }} (domestic: {{ row.domestic }})
{% endfor %}
```

Variable references, file paths, and inline data can be mixed freely within a data: dict.

See MULTI_TABLE.md for comprehensive examples and documentation.

Template Macros

Create reusable template functions with Jinja2 macros:

```{.embedz define=formatters}
{% macro format_item(title, date) -%}
**{{ title }}** ({{ date }})
{%- endmacro %}
```

```embedz
---
data: vulnerabilities.csv
---
{% from 'formatters' import format_item %}

{% for item in data %}
- {{ format_item(item.title, item.date) }}
{% endfor %}
```

Preamble & Macro Sharing

Use the preamble section to define reusable control structures across all blocks. Named templates can also share macros via {% from ... import %}:

```{.embedz define=sql-macros}
{%- macro BETWEEN(start, end) -%}
SELECT * FROM data WHERE date BETWEEN '{{ start }}' AND '{{ end }}'
{%- endmacro -%}
```

```embedz
---
global:
  fiscal_year: 2024
  start_date: "{{ fiscal_year }}-04-01"
  end_date: "{{ fiscal_year + 1 }}-03-31"
  _import: "{% from 'sql-macros' import BETWEEN %}"
  yearly_query: "{{ BETWEEN(start_date, end_date) }}"
---
```

Comments in CSV/TSV/SSV

Lines starting with # are treated as comments and skipped by default. The comment parameter controls behavior: line (default), head, inline, or none.

```{.embedz data=data.csv comment=head}
{% for row in data %}
- {{ row.name }}: {{ row.value }}
{% endfor %}
```

Standalone Rendering

Render Markdown or LaTeX files without running full Pandoc:

pandoc-embedz --standalone templates/report.tex -c config/base.yaml -o build/report.tex

Command-line options:

--standalone (-s) enables standalone mode
--template TEXT (-t) specifies template text directly
--format FORMAT (-f) specifies data format for stdin
--config FILE (-c) loads external YAML config file(s) (repeatable)
--output FILE (-o) writes output to file (default: stdout)
--debug (-d) enables debug output to stderr

Quick examples:

# Format CSV data from stdin
cat data.csv | pandoc-embedz -s -t '{% for row in data %}{{ row.name }}\n{% endfor %}' -f csv

# Use template file (data auto-read from stdin)
cat data.csv | pandoc-embedz -s template.md

# Static template without data
pandoc-embedz -s -t 'Static content'

External Config Files

Both filter and standalone modes can load shared configuration:

```embedz
---
config:
  - config/base.yaml
  - config/overrides.yaml
---
```

pandoc-embedz -s report.md -c config/base.yaml -c config/latex.yaml

Config files support multiple YAML documents separated by --- for logical grouping.

See man pandoc-embedz for details on stdin behavior, multi-document YAML, and config merging.

Best Practices

CSV Output Escaping

When generating CSV from templates, use a macro for proper escaping:

{%- macro csv_escape(value) -%}
  {%- set v = value | string -%}
  {%- if ',' in v or '"' in v or '\n' in v -%}
    "{{ v | replace('"', '""') }}"
  {%- else -%}
    {{ v }}
  {%- endif -%}
{%- endmacro -%}

File Extension Recommendations

.emz - Recommended for standalone templates (non-Markdown output)
.embedz - Descriptive alternative
.md - Only for templates that generate Markdown

Pipeline Processing

Combine pandoc-embedz with other tools for data transformation:

extract_tool database table --columns 1-10 | \
  pandoc-embedz -s transform.emz | \
  post_process_tool > output.csv

Use -s (standalone mode) for pipeline processing. Each .emz file handles one transformation step.

Debugging

Enable debug output with the PANDOC_EMBEDZ_DEBUG environment variable (accepts 1, true, or yes) or the -d flag in standalone mode:

PANDOC_EMBEDZ_DEBUG=1 pandoc input.md --filter pandoc-embedz -o output.pdf
pandoc-embedz -s -d template.md

Related Tools

Similar Pandoc Filters (on PyPI)

pantable - CSV/TSV to table with powerful options, table-focused
pandoc-jinja - Document-wide metadata expansion, not for code blocks
pandoc-include - Include external files with template support
pandoc-pyrun - Execute Python code in code blocks

Additional Tools

pandoc-csv2table (Haskell) - CSV to table conversion only
Quarto - Comprehensive publishing system based on Pandoc. Excellent for data science and technical documents, but requires dedicated environment and workflow
R Markdown - Similar to Quarto, requires R environment
Lua Filters - Requires custom Lua scripting for each use case

Why pandoc-embedz?

pandoc-embedz fills a unique niche:

Full Jinja2 templating (loops, conditionals, filters)
Multiple data formats (CSV, JSON, YAML, TOML, SQLite, Excel, etc.)
Code block level processing (not document-wide)
Lightweight - no heavy dependencies
Works with existing Pandoc workflow

See COMPARISON.md for detailed comparison.

Documentation

REFERENCE.md --- comprehensive reference manual (options, syntax, data formats, variable scoping, custom filters); also available via man pandoc-embedz
MULTI_TABLE.md --- multi-table SQL query examples
COMPARISON.md --- comparison with alternative tools

License

MIT License

See LICENSE file for details.

Author

Kazumasa Utashiro

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Development Setup

Using uv (Recommended)

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/tecolicom/pandoc-embedz.git
cd pandoc-embedz

# Install dependencies and setup development environment
uv sync --all-extras

# Run tests
uv run pytest tests/

Using pip

# Clone the repository
git clone https://github.com/tecolicom/pandoc-embedz.git
cd pandoc-embedz

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in editable mode with dev dependencies
pip install -e .[dev]

# Run tests
pytest tests/

For detailed development guidelines, see AGENTS.md.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kaz-utashiro

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.22.3

Mar 28, 2026

0.22.2

Mar 11, 2026

0.22.1

Feb 15, 2026

0.22.0

Feb 14, 2026

0.21.0

Feb 13, 2026

0.20.0

Feb 13, 2026

0.19.0

Feb 13, 2026

0.18.1

Feb 12, 2026

0.18.0

Feb 12, 2026

0.17.0

Jan 13, 2026

0.16.0

Jan 4, 2026

0.15.0

Dec 10, 2025

0.14.0

Dec 5, 2025

0.13.1

Dec 4, 2025

0.13.0

Dec 4, 2025

0.12.0

Dec 4, 2025

0.11.0

Dec 3, 2025

0.10.2

Dec 2, 2025

0.10.1

Dec 2, 2025

0.10.0

Dec 2, 2025

0.9.2

Nov 28, 2025

0.9.1

Nov 27, 2025

0.9.0

Nov 27, 2025

0.8.1

Nov 26, 2025

0.8.0

Nov 25, 2025

0.7.3

Nov 25, 2025

0.7.2

Nov 25, 2025

0.7.1

Nov 25, 2025

0.7.0

Nov 25, 2025

0.6.0

Nov 24, 2025

0.5.0

Nov 21, 2025

0.4.1

Nov 20, 2025

0.4.0

Nov 20, 2025

0.3.0

Nov 19, 2025

0.2.0

Nov 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandoc_embedz-0.22.3.tar.gz (91.5 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pandoc_embedz-0.22.3-py3-none-any.whl (47.1 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file pandoc_embedz-0.22.3.tar.gz.

File metadata

Download URL: pandoc_embedz-0.22.3.tar.gz
Upload date: Mar 28, 2026
Size: 91.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pandoc_embedz-0.22.3.tar.gz
Algorithm	Hash digest
SHA256	`7cb05356bb5a0b685113cfb65a65024136a03602ae2f4af7b1f59d431f586d3c`
MD5	`533339b4c0bfbeaad682bf440fb781dc`
BLAKE2b-256	`ff280f9406a3a11395a02e3dc71c30ee20a450d26fafb0894fbf1c7f22aa575b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pandoc_embedz-0.22.3.tar.gz:

Publisher: publish.yml on tecolicom/pandoc-embedz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pandoc_embedz-0.22.3.tar.gz
- Subject digest: 7cb05356bb5a0b685113cfb65a65024136a03602ae2f4af7b1f59d431f586d3c
- Sigstore transparency entry: 1189999115
- Sigstore integration time: Mar 28, 2026
Source repository:
- Permalink: tecolicom/pandoc-embedz@16e73b59a7a2dff1b24486bb9632e5ca6d6ea98e
- Branch / Tag: refs/tags/v0.22.3
- Owner: https://github.com/tecolicom
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@16e73b59a7a2dff1b24486bb9632e5ca6d6ea98e
- Trigger Event: release

File details

Details for the file pandoc_embedz-0.22.3-py3-none-any.whl.

File metadata

Download URL: pandoc_embedz-0.22.3-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 47.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pandoc_embedz-0.22.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9be38a656b4fa88692def6734d825c9f27a66baa75e5eeee05d34940cef03326`
MD5	`192b78400a8ce6f38c70409b522f2992`
BLAKE2b-256	`718d60f019d2c978fc79839623f805d5b89e00acac00dec7c9e6ccbf65a15f97`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pandoc_embedz-0.22.3-py3-none-any.whl:

Publisher: publish.yml on tecolicom/pandoc-embedz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pandoc_embedz-0.22.3-py3-none-any.whl
- Subject digest: 9be38a656b4fa88692def6734d825c9f27a66baa75e5eeee05d34940cef03326
- Sigstore transparency entry: 1189999130
- Sigstore integration time: Mar 28, 2026
Source repository:
- Permalink: tecolicom/pandoc-embedz@16e73b59a7a2dff1b24486bb9632e5ca6d6ea98e
- Branch / Tag: refs/tags/v0.22.3
- Owner: https://github.com/tecolicom
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@16e73b59a7a2dff1b24486bb9632e5ca6d6ea98e
- Trigger Event: release

pandoc-embedz 0.22.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pandoc-embedz

Features

tl;dr

Installation

Basic Usage

CSV File (Auto-detected)

JSON Structure

Inline Data

Conditionals

Template Reuse

Code Block Syntax

Block Types

Content Interpretation (without ---)

Variable Scoping

Local Variables with with:

Global Variables with global:

Type-Preserving Bindings with bind:

Advanced Features

SQL Queries on CSV/TSV

Query Template Variables

SQLite Database

Excel Files

Multi-Table Data

Template Macros

Preamble & Macro Sharing

Comments in CSV/TSV/SSV

Standalone Rendering

External Config Files

Best Practices

CSV Output Escaping

File Extension Recommendations

Pipeline Processing

Debugging

Related Tools

Similar Pandoc Filters (on PyPI)

Additional Tools

Why pandoc-embedz?

Documentation

License

Author

Contributing

Development Setup

Using uv (Recommended)

Using pip

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Content Interpretation (without `---`)

Local Variables with `with:`

Global Variables with `global:`

Type-Preserving Bindings with `bind:`