Skip to main content

static site generator built on pandoc + jinja2

Project description

PyPI PyPI - Downloads docs

Checks Checks - docs Coverage

GitHub commits GitHub commit activity GitHub closed pull requests code size, bytes

pdj_sitegen

Pandoc and Jinja Site Generator

Installation:

pip install pdj-sitegen

you should either have Pandoc installed, or you can run

python -m pdj_sitegen.install_pandoc

which will install pandoc using pypandoc

Usage

Quick Start

Scaffold a new site with all default files:

python -m pdj_sitegen.setup_site [directory]

This creates:

  • config.yml - default configuration
  • templates/default.html.jinja2 - default HTML template
  • content/index.md - sample index page
  • content/resources/style.css - basic stylesheet
  • content/resources/syntax.css - code syntax highlighting

Manual Setup

  1. create a config file. For an example, see pdj_sitegen.config.DEFAULT_CONFIG_YAML, or print a copy of it via
python -m pdj_sitegen.config
  1. adjust the config file to your needs. most importantly:
# directory with markdown content files and resources, relative to cwd
content_dir: content
# templates directory, relative to cwd
templates_dir: templates
# default template file, relative to `templates_dir`
default_template: default.html.jinja2
# output directory, relative to cwd
output_dir: docs
  1. populate the content directory with markdown files and resources (images, css, etc.), and adjust templates in the templates directory. See the demo site for usage examples.

  2. run the generator

python -m pdj_sitegen your_config.yaml

CLI Arguments

python -m pdj_sitegen your_config.yaml [-q] [-s]
  • -q, --quiet: Disable verbose output (suppress progress messages)
  • -s, --smart-rebuild: Only rebuild files modified since last build

Smart Rebuild

The smart rebuild feature (-s flag) enables incremental builds by tracking file modification times:

  1. A .build_time file in your project root stores the timestamp of the last successful build
  2. Source files are compared against this timestamp; only newer files are rebuilt
  3. Ideal for large sites during development - significantly speeds up iteration
# Full rebuild (always safe)
python -m pdj_sitegen config.yml

# Smart rebuild (faster, for content-only changes)
python -m pdj_sitegen config.yml -s

When to use full rebuild: After modifying templates or config, since these changes affect all pages. The .build_time file is automatically created and updated.

Configuration

Config File Formats

pdj-sitegen supports multiple configuration file formats:

  • YAML (.yml, .yaml) - recommended, human-friendly
  • TOML (.toml) - also supported
  • JSON (.json) - for programmatic generation
python -m pdj_sitegen config.yml   # YAML
python -m pdj_sitegen config.toml  # TOML
python -m pdj_sitegen config.json  # JSON

Complete Configuration Examples

YAML Configuration
content_dir: content
templates_dir: templates
default_template: default.html.jinja2
output_dir: docs

copy_include: []
copy_exclude:
  - "*.md"

prettify: false
pandoc_fmt_from: markdown+smart
pandoc_fmt_to: html

__pandoc__:
  mathjax: true
  toc: true

jinja_env_kwargs: {}

globals_:
  site_name: "My Site"
  author: "Your Name"
TOML Configuration
content_dir = "content"
templates_dir = "templates"
default_template = "default.html.jinja2"
output_dir = "docs"

copy_include = []
copy_exclude = ["*.md"]

prettify = false
pandoc_fmt_from = "markdown+smart"
pandoc_fmt_to = "html"

[__pandoc__]
mathjax = true
toc = true

[jinja_env_kwargs]

[globals_]
site_name = "My Site"
author = "Your Name"
JSON Configuration
{
  "content_dir": "content",
  "templates_dir": "templates",
  "default_template": "default.html.jinja2",
  "output_dir": "docs",
  "copy_include": [],
  "copy_exclude": ["*.md"],
  "prettify": false,
  "pandoc_fmt_from": "markdown+smart",
  "pandoc_fmt_to": "html",
  "__pandoc__": {
    "mathjax": true,
    "toc": true
  },
  "jinja_env_kwargs": {},
  "globals_": {
    "site_name": "My Site",
    "author": "Your Name"
  }
}

Content Mirroring

Files from content_dir are automatically copied to output_dir, excluding markdown files (which are processed into HTML). Control this with copy_include and copy_exclude:

# Default: copy everything except .md files
copy_include: []
copy_exclude:
  - "*.md"

# Also exclude temp files and .git
copy_exclude:
  - "*.md"
  - "*.tmp"
  - ".git*"

# Copy only specific file types
copy_include:
  - "*.css"
  - "*.js"
  - "*.png"
  - "*.jpg"
copy_exclude: []

# Force copy .md files too (include wins over exclude)
copy_include:
  - "*.md"
copy_exclude:
  - "*.md"

Additional Options

# Global template variables accessible in all templates
globals_:
  site_name: "My Site"
  author: "Your Name"

# Directory to save intermediate processing files (for debugging)
intermediates_dir: null  # or "_intermediates"

# Prettify HTML output (uses BeautifulSoup)
prettify: false

# Pandoc format settings
pandoc_fmt_from: "markdown+smart"
pandoc_fmt_to: "html"

# Global Pandoc options (can be overridden per-file in frontmatter)
__pandoc__:
  mathjax: true

# Jinja2 environment customization
jinja_env_kwargs: {}

Debugging with Intermediates

Setting intermediates_dir saves intermediate processing stages for debugging template and Pandoc issues:

intermediates_dir: _intermediates

This creates the following structure:

_intermediates/
  frontmatter_txt/    # Raw frontmatter as parsed
  frontmatter_json/   # Frontmatter as JSON (for inspection)
  md/                 # Rendered Markdown (after Jinja2, before Pandoc)
  html/               # Pandoc output (before template wrapping)

Useful for debugging Jinja2 template rendering in content, inspecting what Pandoc receives vs. outputs, and understanding frontmatter parsing issues.

HTML Prettification

When prettify: true is set, the final HTML output is reformatted using BeautifulSoup for readable, indented HTML:

prettify: true

Considerations: Increases build time and output file size slightly. Useful for debugging or when HTML readability matters. For production, false (default) produces more compact output.

Jinja2 Environment Customization

The jinja_env_kwargs option allows you to customize the Jinja2 environment:

jinja_env_kwargs:
  # Trim whitespace around blocks
  trim_blocks: true
  lstrip_blocks: true

  # Change template delimiters (useful if content conflicts with {{ }})
  variable_start_string: "[["
  variable_end_string: "]]"

For the full list of options, see the Jinja2 Environment documentation.

Error Reporting

pdj-sitegen provides detailed error handling with actionable error messages:

Terminal Output

When a build error occurs, you'll see a terse, actionable error message showing:

  • The file path and line number where the error occurred
  • The problematic source line (when available)
  • The root cause of the error

Example output:

on content/blog/post.md:6:
  {{ undefined_variable }}
UndefinedError: 'undefined_variable' is undefined

1/15 files failed to convert
  Full details: .pdj-sitegen/2024-01-27_14-30-45/

Detailed Error Dumps

For debugging complex errors, full context is saved to .pdj-sitegen/<timestamp>/:

  • traceback_<file>.txt - Full Python stack trace
  • context_<file>.json - Template context (all variables available)
  • template_<file>.txt - The template content that failed

This directory is created automatically when build errors occur. Add .pdj-sitegen/ to your .gitignore:

# pdj-sitegen error dumps
.pdj-sitegen/

Content Organization

pdj-sitegen supports both flat and nested content structures:

Flat structure (using dot notation):

content/
  index.md
  blog.md
  blog.post-1.md
  blog.post-2.md

Outputs: index.html, blog.html, blog.post-1.html, blog.post-2.html

Nested structure (using directories):

content/
  index.md
  blog/
    index.md
    post-1.md
    post-2.md

Outputs: index.html, blog/index.html, blog/post-1.html, blog/post-2.html

Both approaches work with child_docs_dotlist (path prefix matching) and child_docs_folder (same directory) in templates for hierarchical navigation.

Pandoc Filters

pdj-sitegen includes two built-in pandoc filters:

links_md2html

Converts links ending in .md to .html during conversion. Enable in frontmatter or global config:

__pandoc__:
  filter: links_md2html

csv_code_table

Converts fenced code blocks with class csv_table to HTML tables.

In your markdown, use a fenced code block with the csv_table class and options:

'''{.csv_table header=1 aligns=LCR caption="My Table"}
Name,Count,Status
Alice,42,Active
Bob,17,Pending
'''

NOTE: in the above, use backticks (`) instead of single quotes (') for the fenced code block; single quotes are used here to avoid rendering issues.

Options:

  • header: Number of header rows (default: 1)
  • source: Path to external CSV file
  • aligns: Column alignments (L=left, C=center, R=right, D=default)
  • caption: Table caption

Template Variables

The following variables are available in templates:

Variable Description Example
frontmatter Full frontmatter dict from the current document {"title": "My Page"}
file_meta.path Relative path without extension blog/post-1
file_meta.path_html HTML output path blog/post-1.html
file_meta.path_raw Original file path content/blog/post-1.md
file_meta.path_to_root Relative path prefix to site root (no trailing slash) . or .. or ../..
file_meta.modified_time Unix timestamp of last modification 1706380800.0
file_meta.modified_time_str Human-readable modification time 2024-01-27 12:00:00
config Serialized site configuration {"output_dir": "docs"}
docs Dictionary of all documents in the site {"index": {...}}
child_docs_dotlist Documents matching by path prefix {"blog.post-1": {...}}
child_docs_folder Documents in the same directory {"about": {...}}
dir_files List of all filenames in the directory ["index.md", "about.md"]
dir_subdirs List of subdirectory names ["images", "posts"]
dir_contents_recursive List of all files recursively (relative paths) ["images/logo.png"]
content Rendered HTML content (in final template only) <p>Hello</p>

All frontmatter fields are also available directly (e.g., {{ title }}).

Frontmatter Formats

Frontmatter can be written in YAML, JSON, or TOML:

YAML (recommended):

---
title: My Page
tags: [foo, bar]
---

JSON:

;;;
{"title": "My Page", "tags": ["foo", "bar"]}
;;;

TOML:

+++
title = "My Page"
tags = ["foo", "bar"]
+++

Per-file Overrides

Override global settings in frontmatter:

---
title: My Page
__template__: custom.html.jinja2  # Use different template
__pandoc__:
  toc: true                        # Override pandoc options
  number-sections: true
---

similar tools/resources

This project is a descendant of my old project pandoc-sitegen, which was very similar but used mustache templates instead of jinja2.

Some other similar projects:

if you end up using this script for your site and would me to list it here, email me or submit a PR :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdj_sitegen-0.0.6.tar.gz (40.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdj_sitegen-0.0.6-py3-none-any.whl (46.5 kB view details)

Uploaded Python 3

File details

Details for the file pdj_sitegen-0.0.6.tar.gz.

File metadata

  • Download URL: pdj_sitegen-0.0.6.tar.gz
  • Upload date:
  • Size: 40.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for pdj_sitegen-0.0.6.tar.gz
Algorithm Hash digest
SHA256 edef0ded36b8c547196456531513da9472e9362779d581b155beef2bc08c713c
MD5 1ec34afff862189cd4c421213e258021
BLAKE2b-256 3a811f32aceb205a7a83258f589ac39ce2ca1cc8348556b82154f60c7e256ee4

See more details on using hashes here.

File details

Details for the file pdj_sitegen-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: pdj_sitegen-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 46.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for pdj_sitegen-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 84211a05ff9f082237ac4e8729bc8e5269cc7d797a10e86a8fa5b44fe5d8fd1e
MD5 e11d04abf752b86fe7e740225bdcb639
BLAKE2b-256 5551556495085686d3a9777cbdbf8e90a59a51684b8e376dc85e4547e804e08c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page