Skip to main content

Science/engineering dynamic doc generation

Project description

sciengdox

A python package for creating scientific and engineering documents via pandoc including inline-executable Python code.

Key Features

  1. Pandoc filter for converting pandoc markdown to other formats (especially HTML and PDF).
  2. Codeblock execution
  3. Helper functions for generating tables, SVG matplotlib plots, etc.

Motivation

This is inspired by pweave, codebraid, knitr, and cousins, but I always seemed to have to do some pre/post-processing to get things the way I want them. I already use other pandoc filters (e.g. pandoc-citeproc, pandoc-crossref), so why not simply have another pandoc filter that will execute inline code and insert the results?

Another key is getting quality diagrams from scientific python code. For example, pweave automatically inserts generated images, but there doesn't seem to be a way to get SVG images without, again, pre- and post-processing in another script. SVG plots are, obviously, scalable and work much better for web and PDF outputs.

Development

Use poetry for local environment management. After cloning the repository:

$ cd <project-repo>
$ poetry install
$ poetry shell

To package and release:

$ poetry build
$ poetry publish

Be sure to configure your credentials prior to publishing.

See also this page.

Use and Example

An example Pandoc markdown file can be found in example. To process this file, you need to have pandoc installed and in your path. You also need to install the Pandoc filters pandoc-crossref and pandoc-citeproc which provide nice cross-referencing and reference/bibliography handling.

Installation

When working with macOS or Linux or Linux on Windows via WSL pandoc and the filters can be installed via Homebrew. (On Linux/WSL, install linuxbrew.) Then simply run:

$ brew install pandoc
$ brew install pandoc-crossref
$ brew install pandoc-citeproc
$ brew install librsvg

Then, of course, you need to install this filter and some other helpers for the example. The example helpers can be installed into your Python virtual environment by running:

$ poetry install -E examples

Windows-specific Install

To set up an environment for Windows from scratch including terminals, editors, Python, etc., see this gist. Additional installation steps to use this library include installing pandoc and additional filters and utilities.

Install pandoc by downloading the installer and following the standard instructions. This should also get you pandoc-citeproc.exe for managing citations.

Install pandoc-crossref (for managing intra-document cross-references) by downloading the zipped Windows release. Unzip it, and move pandoc-crossref.exe to a location that is on your system path. For example, you can move to next to pandoc-citeproc.exe in C:\Program Files\Pandoc.

Finally, to handle embedding SVG images in PDF documents, this library relies on rsvg-convert. This can be installed via Chocolatey. Install the Chocolatey package manager if you do not already have it, and then run:

$ choco install rsvg-convert

Instead of (or in addition to) Chocolately, you can also install the Scoop installer. Scoop does not currently have a formula for rsvg-convert, but it can also be installed from SourceForge if you do not want to use Chocolatey.

UTF-8 Note

The underlying Pandoc filter for executing Python code embedded in your documents relies on inter-process communication with a Python REPL behind the scenes. The default inter-process character encoding for Python on Windows is CP-1252, and this can cause problems if your Python scripts generate output with special characters (and if you are doing any scientific or engineering writing, they definitely will).

Fortunately, this is easily worked-around by setting a Windows environment variable PYTHONIOENCODING to utf-8. After setting this, be sure to restart any open terminal windows for the change to take effect.

Matplotlib Note

If you use matplotlib for generating plots in inline Python code in your document, you should explicity set the Agg backend early in your document (see the example/example.md in this repo). Without this, document conversion can hang when the svg_figure helper function is called.

Somewhere near the top of your Markdown document, add an executable Python code block (without .echo so it won't appear in the output) that includes:

import matplotlib
matplotlib.use('Agg')

Panflute Version Note

This plugin relies on the panflute Python package as a bridge between Python and pandoc's Haskell. The panflute README lists API compatibility requirements between versions of panflute and versions of pandoc. Double-check this if you run into errors that mention panflute when compiling a document.

If you are running an older version of pandoc (e.g. 2.9.2) and start a new project, you will need to explicitly install the compatible panflute version in your environment with e.g. poetry add panflute@1.12.5. Or alternatively install a pandoc version 2.11.x or later.

PDF Generation

To generate PDF files through Pandoc, you need to have xelatex installed. On Linux/WSL:

$ sudo apt-get install texlive-xetex

On macOS:

$ brew install --cask mactex

On Windows (without WSL):

Download the MikTeX installer and install as usual. Then ensure that the binary folder is in your path (e.g. C:\Users\<username>\AppData\Local\Programs\MiKTeX 2.9\miktex\bin\x64\). Note that the first time you generate a document, MikTex will prompt you to install a lot of packages, so watch for a MikTeX window popping up (possibly behind other windows) and follow the prompts.

Fonts

The example templates rely on having a few fonts installed. The fonts to get are the Google Source Sans Pro, Source Code Pro, and Source Serif Pro families.

On macOS or Windows (without WSL), these can simply be downloaded and installed as you would any other font. On Linux via WSL, you can install these normally on the Windows side and then synchronize the Windows font folder to the Linux side. To do this, edit (using sudo) /etc/fonts/local.conf and add:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
    <dir>/mnt/c/Windows/Fonts</dir>
</fontconfig>

Then update the font cache on the Linux side:

$ sudo fc-cache -fv

Stylesheets

The example file uses an HTML template that includes a CSS stylesheet that is generated from SCSS. To compile this automatically, you need to have SASS installed.

On macOS, this can be installed via Homebrew:

$ brew install sass/sass/sass

On macOS/Linux/WSL/Windows it can be installed as a Node.js package (assuming you already have Node.js/npm installed):

$ npm install -g sass

Building

This Python library provides a script, compiledoc, that will appear in your poetry or pipenv virtual environment's path (or globally) once the library is installed. In general, you provide an output directory and an input markdown file, and it will build an HTML output when the --html flag is used (and also by default).

$ compiledoc -o output --html mydoc.md

To build a PDF (via xelatex):

$ compiledoc -o output --pdf mydoc.md

To build a Markdown file with executable Python output included (e.g. for debugging purposes), specify --md. This will generate a file in the output directory with (perhaps confusingly) the same name as the input:

$ compiledoc -o output --md mydoc.md

To build everything, specify --all:

$ compiledoc -o output --all mydoc.md

To see all available command line options (for specifying templates, paths to required external executables, static files like images and bibliography files, etc.):

$ compiledoc --help

Building the Example

Once everything is setup, compile the example HTML file by running:

$ cd example
$ compiledoc -o output example.md

Open example/output/example.html in your browser or use e.g. the Live Server plugin for VS Code.

Auto Regen

To autoregenerate the document (e.g. the HTML version, the output of which is watched by the Live Server ), you can use Watchman.

To create a trigger on a particular directory (doc/ in this example) with a notebook.md file (change this to suit your purposes), copy the following into a temporary trigger.json file:

[
    "trigger",
    "doc/",
    {
        "name": "build_html",
        "expression": [
            "anyof",
            [
                "match",
                "notebook.md",
                "wholename"
            ]
        ],
        "command": [
            "poetry",
            "run",
            "compiledoc",
            "-o",
            "output",
            "--html",
            "notebook.md"
        ]
    }
]

Then from your project root directory run:

watchman -j < trigger.json
rm trigger.json

It is also recommended that you add a .watchmanconfig file to the watched directory (e.g. doc/; also add .watchmanconfig to your .gitignore) with the following contents:

{
  "settle": 3000
}

The settle parameter is in milliseconds.

To turn off watchman:

watchman shutdown-server

To turn it back on:

cd <project-root>
watchman watch doc/

To watch the Watchman:

tail -f /usr/local/var/run/watchman/<username>-state/log

(Note that on Windows/WSL, to get tail to work the way you expect, you need to add ---disable-inotify to the command; and yes, that's three - for some reason.)

Older pandoc Versions

For pandoc 2.9 and earlier, the citation manager pandoc-citeproc was a separate filter that gets added to the compliation pipeline. The path to this filter can be specified on the command line to compiledoc with the --pandoc-citeproc PATH flag.

In newer versions of pandoc (2.11 and beyond), the citeproc filter is built-in to pandoc and is run by adding --citeproc to the pandoc command-line. The compiledoc script adds this by default unless the flag --use-pandoc-citeproc is added, in which case the older filter will be used.

If you do not with to run citeproc at all, you can add the flag compiledoc --no-citeproc to skip citation processing altogether.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sciengdox-0.11.0.tar.gz (29.6 kB view hashes)

Uploaded Source

Built Distribution

sciengdox-0.11.0-py3-none-any.whl (32.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page