Augmentations to nbconvert to facilitate the incorporation of inline artifacts into web-able documents.
Project description
External artifacts inlining in web-able documents using jupyter nbconvert
Web-able documents, typically based on HTML 5, such as ye olde webpage or as Jupyter notebooks, facilitate document associations through hyperlinking. However, there are situations where we would rather contain subdocuments inline rather than as separate files. For instance, if one is to send a report that includes figures, one must care to include all those as separate image files from some HTML document where the report is written, and then hope that the recipient unpacks all of that into a common directory. Another example is for documents declined in distinct formats -- audience member reading the document off of a screen may prefer having a HTML document that typesets according to their preferences and constraints, whereas folks that prefer reading off of paper will prefer having a well-typeset PDF file to send to the printer (given how printing on most web browsers is an tacked-on, disused feature). In both cases, one would rather incorporate all figures or alternative declinations of the document into a single file.
A common URL specification enables such incorporation: the data that make up images or alternative documents is embedded directly as into the URL. Modern web browsers support such data URLs fully, either displaying such embedded images as if they were externally linked, or enabling the user to "download" inline documents as if they were hosted remotely.
However, document authoring tools do not make the addition of such inline artifacts easy.
This project aim at filling this gap when the document being authored is a Jupyter notebook, which can be processed into a form distributable on the web: HTML, Markdown, even merely a modified notebook.
The key consists in a jupyter nbconvert
preprocessor.
Usage
First step is to write up a Jupyter notebook. Even folks who don't require interactive computation can make good, productive use of Jupyter as an authoring environment based on Markdown notation.
Inlining external files
Inlining external artifacts refers to the incorporation of assets usually stored in external files, such as images and alternative texts, directly in the core document, in the form of data URLs. As suggested, this can be figures, or other documents one would want to bundle with their main text so they can be "downloaded offline," so to speak. Such artifacts must be described among the Markdown cells of the notebook in any place one would otherwise write up a URL. The artifact description is as follows:
artifact:mime/type:path/to/file/to/inline
Event on Windows, components of the path must be separated by forward slashes (/
).
Examples:
Notation | Purpose | Artifact description in context |
---|---|---|
Markdown | Figure | ![Alt text](artifact:image/png:images/figure.png) |
Embedded document | [Text of the link](artifact:application/pdf:embed-this.pdf) |
|
HTML | Figure | <img src="artifact:image/png:images/figure.png" alt="Alt text"> |
Embedded document | <a href="artifact:application/pdf:embed-this.pdf" download="embed-this.pdf">Text of the link</a> |
Building the incorporated document from the command line
One can produce the document with embedded artifacts with a command such as this:
jupyter nbconvert --preprocessors nbconvert_inline_artifacts.ArtifactInlinePreprocessor #... <rest of the command>
For instance, to export a HTML file with inline artifacts:
jupyter nbconvert --preprocessors nbconvert_inline_artifacts.ArtifactInlinePreprocessor --to html document-as-notebook.ipynb
Scripting the incorporated document build in Python
The nbconvert tool is also productively used from a Python scripting perspective.
This usage pattern enables the inlining of artifacts that exist in a process' memory, as opposed to external files. Such artifacts are named instead using a unique identifier, which is mapped in turn to an artifact expressed as a Python bytes string.
For instance, consider a source notebook where one Markdown cell tagged pdf-version
contains the following text:
[Get the PDF version](artifact::pdf)
The following script would generate first a PDF version of this notebook (requires a LaTeX install and Pandoc) without that cell, then a HTML version where that PDF version is inlined:
from nbconvert import PDFExporter, HTMLExporter
from traitlets.config import Config
c_pdf = Config()
c_pdf.Exporter.preprocessors = ["nbconvert.preprocessors.TagRemovePreprocessor"]
c_pdf.TagRemovePreprocessor.remove_cell_tags = ["pdf-version"]
pdf_exporter = PDFExporter(config=c_pdf)
pdf = pdf_exporter.from_filename("document.ipynb")
c_html = Config()
c_html.Exporter.preprocessors = ["nbconvert_inline_artifacts.ArtifactInlinePreprocessor"]
c_html.ArtifactInlinePreprocessor.artifacts = {
"pdf": {
"mime_type": "application/pdf",
"content": pdf
}
}
html_exporter = HTMLExporter(config=c_html)
with open("document.html", "wb") as file:
file.write(pdf_exporter.from_filename("document.ipynb"))
Look here for a larger example.
Development
The development environment is put together easily using Conda:
conda env create
Checks on PEP8 conformance, typing coherence and unit tests:
conda run -n nbconvert-inline-artifacts --no-capture-output python script/checks.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nbconvert_inline_artifacts-1.0.tar.gz
.
File metadata
- Download URL: nbconvert_inline_artifacts-1.0.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.11.3 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec92ba72906a0b93183fe96b704be899d5e179b521d3583f0e1e5a887841e383 |
|
MD5 | 215bb0307d37dd4bf14d033d27fa758e |
|
BLAKE2b-256 | afa5ee936838ab467fa02dac0a0d3929f7a5d945d83568ff65731263aff3bb16 |
File details
Details for the file nbconvert_inline_artifacts-1.0-py3-none-any.whl
.
File metadata
- Download URL: nbconvert_inline_artifacts-1.0-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.11.3 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a0f9c0efa8220445f4080f2ffd096a858d4f7c1cfdda718613771da0ba9585a |
|
MD5 | e8d506388aa7b598289efaebb5576bc8 |
|
BLAKE2b-256 | bb0bc061875ac3490360ea36c3fe0fa8f5467e36ae122a0e785e14322a44f85a |