Skip to main content

write graded unit tests for Jupyter Notebooks in a few lines of code

Project description

JPTest2

JPTest is a unit testing framework for Jupyter Notebooks and aims for fast test writing in less lines of code. It creates the possibility to score and automatically grade exams with separate notebook (.ipynb) and test (.py) files.

Quick Start

JPTest relies on Jupyter. If you need any other libraries for executing your notebook cells they can be installed in your environment the usual way.

The preferred way to use JPTest is in a virtual environment:

python -m venv venv
source venv/bin/activate

Use pip to download and install JPTest. Make sure not to install the first version of JPTest, which is missing the 2 at the end of the package name, as it is only available for compatibility reasons.

pip install jptest2

Now imagine that your student has to implement a Fibonacci function called fibonacci, which accepts a single parameter n and returns the n-th fibonacci number. Inside the notebook to submit, you prepared a cell with the tag task-1. A simple test could look like the following example:

from jptest2 import *
import asyncio


# Create test with name "Task 1" and a maximum score of 1.
# Execute *all* cells with tag "task-1" prior to executing the test function.
# Please note: Every test function must be async!
@JPTest('Task 1', max_score=1, execute=('task-1',))
async def test_task1(nb: Notebook):
    # Create a reference to the function `fibonacci` inside the notebook.
    fib_fun_in_nb = nb.ref('fibonacci')

    # Receive five results from the fibonacci function.
    result = await asyncio.gather(
        fib_fun_in_nb(1),
        fib_fun_in_nb(2),
        fib_fun_in_nb(3),
        fib_fun_in_nb(4),
        fib_fun_in_nb(5)
    )

    # Yield a tuple containing a condition, an award, a comment in case the
    # condition is false and a comment in case the condition is true.
    yield result == [1, 1, 2, 3, 5], 1, 'fib fun incorrect', 'fib fun correct'

Let us assume the notebook file is called notebook.ipynb and the test file is called tests.py. Call JPTest with the following command:

python -m jptest2 notebook.ipynb tests.py

Please note the test function is an async function!

Table of Contents

Contexts and Processes

JPTest manages different processes. The first process that is started collects all annotations like @JPTest and stores them together with their respecting test functions. Later, this process will control the startup of notebooks and take over the evaluation. We refer to this process as the test context.

Jupyter uses kernels that are started in a separate process. JPTest supports Python3 kernels and does not share them between tests, so for each test at least one independent kernel process is started to run the contents of the notebook. However, as we will see later, it is also possible to start multiple kernels per test. We refer to this set of processes as the notebook context.

JPTest always runs on an in-memory copy of the notebook and does not modify files, but tests and code in the notebook still have the possibility to do so.

The Execute Parameter

One can control in detail what code is executed prior to a test. Therefore, the execute parameter accepts different types, which can also be nested recursively:

String. If the parameter is of type str, the value is considered as code and injected into the notebook.

Tuple. If the parameter is of type tuple, the value is considered as tags. If there is one element in the tuple, every cell with this tag is executed. If there are two elements in the tuple, every cell between the first appearance of the first tag and the first appearence of the second tag (including) is executed.

Function. If the parameter is a function (Callable), it will be executed in the notebook context using execute_fun.

List. If the parameter is of type list, every element will be executed in the order of its appearence, following the rules stated above. Even though it is not actually needed because of list operations in Python, nested statements are possible with it.

Execute Code

The easiest way to execute code in the notebook is via the cells property. It returns a list of all cells present in the notebook and allows to filter and execute them one by one.

for cell in nb.cells:
    print(cell.tags)

    if cell.type == 'code':
        await cell.execute()

The function execute_cells represents a shortcut to select only code cells by tags prior to executing them in their order of appearance.

# execute all cells with tag `task-1`
await nb.execute_cells('task-1')

# execute all cells from `task-3` to `task-5`
await nb.execute_cells(from_tag='task-3', to_tag='task-5')

It is also possible to inject code into the notebook context. execute_code creates a new code cell from the given string, inserts it at the end of the notebook and executes it. Additional indentation of otherwise correct code is possible.

await nb.execute_code('''
    a = 5
    b = 10
''')

Please note that there are functions store and stores to store values in the notebook. Unlike the previous example this also works with non-primitive types and references.

ref1 = await nb.store(5, 'a')
ref2 = await nb.store(6)
await nb.store(ref1, 'copy_of_ref1')

ref3, ref4 = await nb.stores(b=10, c={'tiger': 'dangerous'})

References

It is possible to interact with objects and code in the notebook context. The most important class in this regard is NotebookReference. References are returned, for example, by the ref and get functions, represent objects in the notebook context and may be used for interaction in various ways:

  • receive serializes the referenced object and transfers it from the notebook context to the test context. execute, on the other hand, executes a statement without processing the result and transferring it to the test context.
  • Calling await on a NotebookReference object directly triggers a call to receive.
  • Access to an object's attributes or items is possible with the usual syntax. Note that the result is not evaluated immediately and thus errors due to missing attributes or keys are carried over until the actual execution.
  • References may be called like functions. The parameters are either other references, then these are resolved within the notebook context, or local variables from the test context, then these are transferred into the notebook context and used for the call. Function calls are also not executed immediately.
  • For references to lists and other sequences there is a function len to determine the length. The built-in len function, however, cannot be used with the async/await syntax.
my_fun_in_nb = nb.my_fun
my_fun_return = my_fun_in_nb()

my_dict_in_nb = nb.my_dict
val_of_x = my_dict_in_nb['x']

# Raises an exception if `my_fun` does not exist
# or if it raises an exception itself.
print(await my_fun_return.receive())

# You can omit the call to `receive`.
print(await my_fun_return)

# Raises an exception if `my_dict` does not `x` is not a key in `my_dict`.
print(await val_of_x)

Pickle is used to serialize and deserialize objects. Therefore, it is also possible to transfer more complex objects like Pandas DataFrames or NumPy Arrays.

References to a notebook's objects can be used as parameters to call a function within another notebook. JPTest will transfer the value to the notebook containing the function prior to calling it. This means the used reference has to be serializable.

Annotations and Parameters

Previously you have already seen the annotation @JPTest. It has two additional optional parameters. timeout specifies a timeout in seconds per cell. As you may have noticed, the notebook is passed as a parameter to the test function after the execute parameter is executed. You can set prepare_second to True to get a second notebook with the same settings as a second parameter.

@JPTest('Task 1', max_score=1, execute=('task-1',), prepare_second=True)
async def test_task1(nb1: Notebook, nb2: Notebook):
    # `nb1` equals `nb2`, but they were created and prepared independently!
    pass

Furthermore, there is @JPTestGet if you are only interested in data stored within the notebook. To this annotation you pass a name, a maximum score, a timeout and an execute command. It further accepts a list of names that are variables inside the notebook. All of these are transferred to the test context and used as parameters for your test function. You can not access the notebook using this annotation!

@JPTestGet('Task 1', max_score=1, execute=('task-1',), get=['first_var', 'second_var'])
async def test_task1(first_value, second_value):
    # `first_value` and `second_value` are the received values.
    pass

The last annotation is @JPTestComparison. It allows using two notebooks prepared in different ways inside the test function. We use this mainly to compare the student's results with those from a sample solution.

def import_pandas():
    # noinspection PyUnresolvedReferences
    import pandas as pd


def sample_solution():
    correct_df = pd.read_csv('my_dataset.csv')


# Everything passed to `prepare` is executed in both notebooks independently.
# Everything passed to `execute_left` is executed in the first notebook.
# `execute_right` does the same in the second notebook.
# `hold_left` expects a list of variable names to copy to the test context
# from the first notebook. `hold_right` does the same to the second notebook.
# Every received value is used as a parameter for the test function.
@JPTestComparison('Task 1', max_score=1, execute=import_pandas,
                  prepare_left=('task-1',), hold_left='students_df',
                  prepare_right=sample_solution, hold_right='correct_df')
async def test_task1(students_val, correct_val):
    pass

Function Injection

There are two ways to inject functions:

The first method inject_fun transfers a function to the notebook context and returns a reference. This can be called as described before or passed as a parameter to another function.

def fun(i: int):
    return i + 1


injected = await nb.inject_fun(fun)
result = await injected(5)

# `result` equals `6`.

You can also send classes to the notebook context. But there is no way to transfer needed superclasses automatically as well.

The second method execute_fun executes a function's body in the notebook context while the header is only used in the test context. This makes it possible to write syntactically correct code with alle benefits of analysis within an IDE, although it is later executed in the notebook context.

def fun(i: int):
    k = i + 1


# `i` has to be available in the notebook context!
await nb.execute_fun(fun)

# `k` is defined globally in the notebook context
# after the execution.

Function Replacing

Functions in the notebook context can be replaced with others, for example to skip network requests and return a fixed response instead to speed them up.

await nb.execute_code('''
    from time import sleep

    def my_fun():
        sleep(10)
        return 1                
''')


def replacement():
    return 2


async with nb.replace_fun('my_fun', replacement):
    # executes `replacement` in notebook context.
    result = await nb.ref('my_fun')()
    # prints `2`
    print(result)

Function Tracking

Furthermore, it is possible to track function calls. This may be used to check if an implementation uses recursion. In addition, the parameters and return values used can be extracted.

await nb.execute_code('''
    def fib(i):
        return i if i <= 1 else fib(i-1) + fib(i-2)
''')

async with nb.track_fun('fib') as calls:
    await nb.ref('fib')(15).execute()

print(len(await calls) > 1000)

Setup and Teardown Methods

Use @JPSetup and @JPTeardown to annotate async functions. Setup functions are run prior to all tests and teardown functions after all tests have completed.

@JPSetup
async def setup():
    print('setup')


@JPTeardown
async def teardown():
    print('teardown')

Multiple setup or teardown functions are run in parallel.

Output Formats

The default output format is JSON. You can switch it to Markdown using the command line flag --md.

Parallelization

Since all notebook kernels are started in different processes, multicore processors can be fully utilized. However, there are a few things to keep in mind:

  • The test context can become a bottleneck because it uses only one thread. Therefore, the notebooks should work as independently as possible and the test context should only be used for coordination and evaluation.
  • Keep inter-process communication to a minimum and outsource computationally intensive operations to the notebook context.
  • Use the parameter --tests to limit the number of concurrently running tests.

Running Without Tests

If no test file is given on startup, JPTest will choose a default test set. It executes all cells once in the correct order, does not score and passes exceptions. This can be used to check notebooks for syntax errors, determine if libraries are missing within an image or if data sets have not been shipped.

Use the command line parameter --quiet to suppress any output other than exceptions and stacktraces.

Live Preview

The live preview is activated with the --live switch. It monitors changes to the given files and automatically reruns the evaluation as soon as one of them is modified. This mode is useful for developing tests or for demonstrations.

Install the package jptest2[demo] to receive the optional dependencies. The clear command is used to clear the terminal window and therefore must be available.

Other Kernels

While the main focus is on Python notebooks, development to support other kernels is possible. Set the kernel parameter to specify which language is used in the notebook. Besides Python3, two databases are supported at the moment. However, they do not start an actual Jupyter Kernel but send the queries directly to a connected database.

  1. SQLite using jptest2[sqlite]
  2. DuckDB using jptest2[duckdb]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jptest2-2.1.7.tar.gz (29.4 kB view hashes)

Uploaded Source

Built Distribution

jptest2-2.1.7-py3-none-any.whl (25.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page