Skip to main content

Saves plots as PDF with embedded generating script

Project description

Overview

The pypdfplot package is an extension to Matplotlib that generates a PDF file of the plot with the generating Python script embedded.

Although Matplotlib is a very powerful and flexible plotting tool, once a plot is saved as PNG or PDF file, the link between the plot and its generating Python script is lost. The philosophy of pypdfplot is that there should be no distinction between the Python script that generates a plot and its output PDF file, much like there is no such distinction in an Origin or Excel file. As far as pypdfplot is concerned, the generating script is the plot.

The way pypdfplot works is by extending Matplotlib with a publish function. This function saves the current plot as a PDF file and embeds the generating Python script in that PDF file. The Python script is embedded in such a way that when the PDF file is renamed from .pdf to .py, the file can be read by a Python interpreter without alteration. The script can then be edited at will to implement changes in the plot, after which the script is ran again to produce the updated PDF file of the plot – including the updated embedded generating script.

Since the resulting output file is compliant with both Python and PDF syntax, we will call such a file a PyPDF file for the remainder of this document.

Installation

Via Python Package

Install the package from the PyPI repository opening a command prompt and enter:

pip install pypdfplot

Via Git or Download

Alternatively, the source files can be downloaded directly from the GitHub repository. After downloading the source files, navigate to the project directory and install the package by opening a command prompt and enter:

python setup.py install

Quickstart

The pypdfplot package is designed to seamlessly integrate with Matplotlib. In this section we will go through a basic example that shows how to use pypdfplot.

Simple Example

For this example, our goal is to produce a red plot of the quadratic function from -10 to 20. To start off, we will first plot this in the conventional way using Matplotlib.

Create a new python file, let’s call it example.py.

Open the file and enter the following script:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(-10,20,0.1)
y = x**2

plt.plot(x,y,'r')
plt.show()

After running this script, we should get the following figure:

https://pypdfplot.readthedocs.io/en/latest/_images/plot.png

Next, we will use pypdfplot to publish this plot as a PyPDF file. In order to do this we need to make two changes to the script:

  1. Instead of importing matplotlib.pyplot we have to import pypdfplot. Note that pypdfplot wraps all Matplotlib’s functions, so by importing pypdfplot as plt like before, no other modifications to the code are needed.

  2. Instead of calling show() we have to call publish(). Note that publish() can take all keywords that show() can, in addition to some new keywords (see XXX).

The code now looks as follows:

import pypdfplot as plt
import numpy as np

x = np.arange(-10,20,0.1)
y = x**2

plt.plot(x,y,'r')
plt.publish()

After running this script, if we look in the folder where our example.py file once was, we notice it has been replaced by a new file example.pdf. Of course the fact that the example.py file disappeared doesn’t mean the script is gone – it is now embedded in the PyPDF file example.pdf!

We can find evidence of this by opening the example.pdf file:

https://pypdfplot.readthedocs.io/en/latest/_images/plot_pdf.png

The table on the left shows all files that are embedded, and clearly example.py is there.

Most versions of Acrobat reader don’t allow the embedded .py file to be opened for security reasons, which is probably a good thing. To access the python script, rename example.pdf into example.py and open the file. This is what we should find:

#%PDF-1.3 23 0 obj << /Type /EmbeddedFile /Length 124 >> stream
import pypdfplot as plt
import numpy as np

x = np.arange(-10,20,0.1)
y = x**2

plt.plot(x,y,'r')
plt.publish()

"""
endstream
endobj
1 0 obj

<...>

startxref
8829
%%EOF
0000009410 00000
"""

The first line is the PDF header that helps the PDF reader to determine this is a valid PDF file. It also includes the object header for the EmbeddedFile object of our example.py file. This line may not be altered, as it will result in coruption of the PyPDF file.

What follows is our original python script, followed by a massive multiline string. This multiline string contains all the PDF objects including the data for any remaining embedded files (see XXX). Making any edits in this string will again likely result in corruption of the file, so it is strongly discouraged as well.

In between the first line and the multiline string is our original python script, which may be edited in any way. For example, let’s give the plot a title and change the color to blue:

#%PDF-1.3 23 0 obj << /Type /EmbeddedFile /Length 124 >> stream
import pypdfplot as plt
import numpy as np

x = np.arange(-10,20,0.1)
y = x**2

plt.plot(x,y,'b')
plt.title('Blue Example')
plt.publish()

"""
endstream
endobj
1 0 obj

<...>

startxref
8829
%%EOF
0000009410 00000
"""

Again, after running the script the example.py file is replaced by the example.pdf file. When we open example.pdf, we should find the updated blue plot with caption:

https://pypdfplot.readthedocs.io/en/latest/_images/plot_pdf2.png

Embedding Files

In many cases we would like to plot data that is stored in a separate file. In order for this to work, the external data file must be included in the PyPDF file as well. What follows is an example how to embed external files with pypdfplot.

We will write a script that opens data from an external excel file and reads the title and axis label from an extrnal text file.

Create an excel file called data.xlsx. For this example, we will fill the file with the first 10 numbers of the Fibonacci sequence:

https://pypdfplot.readthedocs.io/en/latest/_images/excel_data.png

Then we create a text file with our title and axis labels called title.txt:

https://pypdfplot.readthedocs.io/en/latest/_images/notepad_title.png

Finally, we create a new python file called packing.py.

As before, let’s first have a look at how this script would look using Matplotlib. We will use Pandas to import the Excel file into Python. Open packing.py and enter the following script:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_excel('data.xlsx')
plt.plot(df.x,df.y,'r.')

with open('title.txt','r') as f:
    title = f.readline()
    xlabel = f.readline()
    ylabel = f.readline()

plt.title(title)
plt.xlabel(xlabel)
plt.ylabel(ylabel)

plt.show()

After running this script, the following figure should pop up:

https://pypdfplot.readthedocs.io/en/latest/_images/plot2.png

In order to use pyplotpdf to publish this as a PyPDF file, we change matplotlib.pyplot to pypdfplot and show() to publish() as before.

Additional files can be embedded in the PyPDF file by calling the function pack(flist). The argument flist is a list of filenames that are to be embedded.

By calling cleanup() after the publish() function, the local files are deleted after they are successfully embedded in the PyPDF file.

The script now looks as follows:

import pypdfplot as plt
import pandas as pd

df = pd.read_excel('data.xlsx')
plt.plot(df.x,df.y,'r.')

with open('title.txt','r') as f:
    title = f.readline()
    xlabel = f.readline()
    ylabel = f.readline()

plt.title(title)
plt.xlabel(xlabel)
plt.ylabel(ylabel)

plt.pack(['data.xlsx',
          'title.txt'])

plt.publish()
plt.cleanup()

After running the script, the output file packing.pdf is generated and all three files packing.py, data.xlsx, and title.txt are deleted after being embedded in packing.pdf. This can be confirmed by opening packing.pdf:

https://pypdfplot.readthedocs.io/en/latest/_images/plot_pdf3.png

To maximize integration with Matplotlib, the PyPDF file is checked for embedded files at the time the pypdfplot package is imported. If embedded files are found, they are extracted provided there are no local files with the same filename. If a local file is found with the same filename, it is assumed this is a more recent version (e.g. a file that was extracted and then updated), and should therefore have precedence over the embedded file.

In case you want to keep the files that are extracted from the PyPDF file, simply comment out the cleanup() function.

PyPDF File specification

The file generated by pypdfplot is both a PDF file and a Python file. It would be more accurately to describe it as the intersection of both speficifications, i.e. a PyPDF file.

Classes

Description of the two classes

Changelog

Here we list all changes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfplot-0.3.5.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

pypdfplot-0.3.5-py2.py3-none-any.whl (16.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pypdfplot-0.3.5.tar.gz.

File metadata

  • Download URL: pypdfplot-0.3.5.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.16

File hashes

Hashes for pypdfplot-0.3.5.tar.gz
Algorithm Hash digest
SHA256 6b0b04cfc09e0edcf60995c8b0110fb58efa2961bb3ec577ede59259c126b753
MD5 5e4028c4664d00b39a5c35708b2e2c6c
BLAKE2b-256 36dd4e96c73213651632341f68c893a917adb5c57942c618eef0e673a92ab027

See more details on using hashes here.

File details

Details for the file pypdfplot-0.3.5-py2.py3-none-any.whl.

File metadata

  • Download URL: pypdfplot-0.3.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.16

File hashes

Hashes for pypdfplot-0.3.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8758415988d2ca4f325f868656ba3b6cdd55099ceca9fda3d727d5dded9306f7
MD5 9a61ddf81374bd9dd54198c67ec7cf99
BLAKE2b-256 8ca3f735eb3e6554efe10dd2d0018741efbf6cbb1b6de7f49a31e7080ccc22a0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page