Skip to main content

A small Python package to convert Jupyter Notebook files to single PDF files via the use of Pandoc.

Project description

JuPDF

JuPDF is a small Python package to convert .ipynb files to single .pdf files via the use of Pandoc.

Supported Features

  • All standard markdown features supported in Pandoc conversion, such as lists, tables, etc.
  • PNG images within markdown (i.e. ![...](...png)).
  • LaTeX mathematics, including in-line LaTeX.
  • stdout code-cell output.
  • PNG code-cell output (e.g. the output of plot.show() from matplotlib).
  • YAML metadata for use with the eisvogel.tex template.

Other image types are planned to be supported in future versions. For now, please ensure both markdown images and code output images are .png.

Requirements

  • In order to use JuPDF, as well as any dependencies handled by pip, you must have Pandoc installed on your system, as the conversion process utilizes the pandoc command.

  • You must also have a TeX engine installed on your system. For example, on Windows, I use MikTeX.


Basic Usage

Converting a single .ipynb file to .pdf

from jupdf.pypdfnb import PYPDFNB
from jupdf.pypdfnb_jobs import single_to_pdf

pypdfnb = PYPDFNB()
pypdfnb.read_ipynb('notebook.ipynb')
single_to_pdf(pypdfnb, 'notebook.pdf')

Converting multiple .ipynb files to .pdf

from jupdf.pypdfnb import PYPDFNB
from jupdf.pypdfnb_jobs import multiple_to_pdf

pypdfnb_a, pypdfnb_b = PYPDFNB(), PYPDFNB()
pypdfnb_a.read_ipynb('notebook_a.ipynb')
pypdfnb_b.read_ipynb('notebook_b.ipynb')
multiple_to_pdf([pypdfnb_a, pypdfnb_b], 'notebook.pdf')

Setting a PYPDFNB instance's contents property back to []

from jupdf.pypdfnb import PYPDFNB

pypdfnb_instance = PYPDFNB()
pypdfnb_instance.read_ipynb('notebook.ipynb')

# do something . . .

pypdfnb_instance.empty()

eisvogel.tex Metadata

Before converting a PYPDFNB instance to a .pdf file, you can set various YAML metadata attributes of the instance that will affect how Pandoc converts the notebook markdown to PDF using the eisvogel.tex template.

The code below shows the YAML metadata properties available for PYPDFNB instances, and their values on initialization.

def __init__(...):

    # other attributes ...

    self.title: Optional[str] = None
    self.author: Optional[str] = None
    self.date: Optional[str] = None
    self.subject: Optional[str] = None
    self.keywords: Optional[list[str]] = None
    self.lang: Optional[str] = None
    self.listings: bool = False
    self.titlepage: bool = False
  • title - setting this will place the string in the top-left of the PDF pages.
  • author - setting this will place the string in the bottom-left of the PDF pages.
  • date - setting this will place the string in the top-right of the PDF pages.
  • subject - non-visual - the subject of the PDF document.
  • keywords - non-visual - keywords associated with the PDF document.
  • lang - non-visual - the language code of the document (e.g. 'en').
  • listings - whether or not to use Pandoc listings during conversion.
  • titlepage - whether or not to insert a title page at the start of the PDF, which will include the title, author and date.

Parsers

JuPDF provides a few different parsing Callables that can be passed to PYPDFNB instances. These callables determine how a .ipynb is read, therefore determining how a PDF will look following conversion.

There are currently two distinct types of parsers: cell parsers and code parsers. Cell parsers will determine how cells are placed within the documuent, where as code parsers determine how code cells should be handled within the document.

Callable Type Description
cell_parser_regular cell Parses cells such that cells are placed in the next available space in a PDF.
cell_parser_one_cell_per_page cell Parses cells such that every cell ends with a page break.
cell_parser_one_md_cell_per_page cell Parses cells such that each markdown cell specifically starts with a page break.
code_parser_regular code Parses code cells such that both the code itself and the code's output are included within the PDF.
code_parser_source_only code Parses code cells such that only the code itself is included within the PDF.
code_parser_output_only code Parses code cells such that only the code's output is included within the PDF.

An example of using these parsers is shown below.

# Convert a Jupyter Notebook to PDF, whereby each cell starts on a seperate page, and only the output of
# code cells is included in the PDF.

from jupdf.pypdfnb import PYPDFNB
from jupdf.pypdfnb_parsing import cell_parser_one_cell_per_page, code_parser_output_only
from jupdf.pypdfnb_jobs import single_to_pdf

pypdfnb = PYPDFNB(cell_parser_one_cell_per_page, code_parser_output_only)
pypdfnb.read_ipynb('notebook.ipynb')
single_to_pdf(pypdfnb, 'notebook.pdf')

Saving time with Saved Parses

Suppose you have a massive .ipynb file that you think you'll need to convert several times. If the file is big enough, then reading and parsing the file may take some time. As such, you likely do not wish to repeat this process again and again. This is where .pypdfnb files come into play.

Using the write_pypdfnb instance method, you can write the current contents of a PYPDFNB instance to a .pypdfnb file. Now, whenever you need to convert that massive file, you can use the read_pypdfnb instance method instead of read_ipynb, which will require no parsing.

from jupdf.pypdfnb import PYPDFNB
from jupdf.pypdfnb_jobs import single_to_pdf

pypdfnb = PYPDFNB()
pypdfnb.open_ipynb('massive_notebook.ipynb')

# writes massive_notebook.pypdfnb to a pypdfnbs directory - this method handles the .pypdfnb extension for you!
pypdfnb.write_pypdfnb('massive_notebook', dir='pypdfnbs')

pypdfnb.empty()


# Later on . . .
pypdfnb.read_pypdfnb('pypdfnbs/massive_notebook.pypdfnb')  # No time spent parsing!
single_to_pdf(pypdfnb, 'massive_notebook.pdf')
pypdfnb.empty()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jupdf-0.1.1.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

jupdf-0.1.1-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file jupdf-0.1.1.tar.gz.

File metadata

  • Download URL: jupdf-0.1.1.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for jupdf-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d1e2b51d6fbc649a0d4b5ca3eb4d89d47ce55dccee3cd835ff6d8178d301ba19
MD5 da3a738fcc1c39936b44abe0e5ca3367
BLAKE2b-256 d3aefbb19edae0a057011c5f7aeb551fd33e71fb74183054b6d2b07a2962abf9

See more details on using hashes here.

File details

Details for the file jupdf-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: jupdf-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for jupdf-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 532d3fcf83f878d79b6c6169388af029a66e52147ff6ac6f9c9d132f9c895de6
MD5 aba72e7663d7e97c4ed4e3dece679d2b
BLAKE2b-256 a8d337f4c59212da1a1629ce5a556174c225aa773f94dba304e2c275d065bb08

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page