Skip to main content

A quality checker for Jupyter

Project description

Julynter

Github Actions Status

Julynter is a linter for Jupyter Notebooks that aims at improving their Quality and Reproducibility based on the following guidelines [1]:

  • Use short titles with a restrict charset (A-Z a-z 0-9 .-) for notebook files and markdown headings for more detailed ones in the body.
  • Pay attention to the bottom of the notebook. Check whether it can benefit from descriptive markdown cells or can have code cells executed or removed.
  • Abstract code into functions, classes, and modules and test them.
  • Declare the dependencies in requirement files and pin the versions of all packages.
  • Use a clean environment for testing the dependencies to check if all of them are declared.
  • Put imports at the beginning of the notebooks.
  • Use relative paths for accessing data in the repository.
  • Re-run notebooks top to bottom before committing.

Julynter example

This repository provides an extension for Jupyter Lab comprising two parts: a Python package named julynter for the server extension and CLI commands and an NPM package named @julynter/labextension for the frontend extension.

Would you like to participate on a Julynter experiment?

Team

  • João Felipe Pimentel (UFF) (main developer)
  • Leonardo Murta (UFF)
  • Vanessa Braganholo (UFF)
  • Juliana Freire (NYU)

Publications

Install

Just install the python package:

pip install julynter

Running

If you have NodeJS, run Jupyter Lab as usual:

jupyter lab

In the first run, it should detect the installation of Julynter and ask you to rebuild it.

If you do not have NodeJS, run the provided pre-compiled version of Jupyter Lab:

julynter lab

Julynter appears on the left panel of Jupyter Lab as a circle with a correct sign (Julynter icon). Open it after opening the notebook to start linting.

CLI

The linting part of Julynter is restricted to the Jupyter Lab extension. However, Julynter also provides a command line interface for other operations:

Validate

This command checks the linting results stored at the notebook's metadata to indicate whether the notebook is valid (i.e., has no remaining linting messages). For running this command, you must perform the linting before in Jupyter Lab.

julynter validate <notebook path>

This command also has options to view the linting messages in the terminal (-v 1) and to display different types of filtered messages (-t -r -i -m)

Run

This command runs a notebook and checks if it reproduces the stored results.

julynter run <notebook path>

It has many options. Some examples:

  • Specify the execution order: -c t follows the top down order of cells with execution results; -c a follows the topdown order, but attempts to execute all cells; -c e follows the cell execution count order (i.e, the numbers that appear next to the cells).

  • Save a notebook with the new results: -o <path>

  • Prevent comparing the results: -x

  • Define normalizer order for cell comparison: -n <normalizer1> <normalizer2>

Use -h to check the other options.

Env

This command is similar to julynter run: it checks the reproducibility of notebooks in a directory. However, julynter env attempts to run the notebooks in a clean virtual environment to check the requirements files' completeness.

julynter env run venv

This command attempts to run all notebooks of the current directory in a new virtual env created by venv. Before running, it tries to discover all setup.py, requirements.txt, and Pipfile and install them in the new environment.

It is possible to specify both a subset of notebooks using arguments after venv and a subset of requirement files using the flags -s <setups>, -r <requirements>, and -p <pipfiles>.

For specifying julynter run options in the julynter env command, use the flag -o at the end. For instance, for running the notebooks following the cell execution order:

julynter env run venv -o -c e

In addition to venv, julynter supports running the notebooks in a docker container or a conda environment, if you have these tools installed:

julynter env run docker
julynter env run conda

Each of these commands has specific options:

  • venv allows to specify the target environment name (-e <name>) and the command to create the environment (-c 'python -m venv'). You may use the last command to run virtualenv and specify different python versions: -c 'virtualenv venv --python=python3.8'
  • docker allows to specify the base image name (-e <image>) and the target directory in the container volume (-t <path>).
  • conda allows to specify the anaconda/miniconda installation path (-a <path>), the target environment name (-e <name>) and the create arguments (-c <arguments>). You may use the create arguments either to specify a python version (-c 'python=3.8') or to clone an existing conda environment (-c '--clone base')

Env prepare

After running and checking the reproducibility of the notebooks, julynter env run destroys the created environment/container. If you want to create permanent environments/containers, you may run:

julynter env prepare venv

Note, however, that this command will not check the reproducibility of the notebooks.

Env import

As an alternative to running all the code in the notebooks and checking its reproducibility, it is possible to use a command for running only the notebooks' import statements to check the completeness of requirements files.

julynter env import venv

(it only works with Python at this moment)

Since this command's goal is to check the possibility of importing modules, I suggest adding the flags -o -x to prevent the comparison of results.

Additionally, this command will attempt to run all imports, including conditional imports that should not run in a normal situation. If you want to run only top-level imports, use the following command:

julynter env importtop venv

Troubleshoot

Known issues:

Linting list does not update

Sometimes Julynter does not update the linting list after an action. For instance, after updating the title, Julynter keeps the previous lint.

Sometimes Julynter updates before properly detecting the execution of a cell. It usually occurs when the cell throws an exception.

If either of these things occur, click on the notebook name in the Julynter tab to force a reload.

External and global options are not recognized

Julynter loads settings from .julynter/config.json at the initialization of Jupyter Lab. For reloading it, click on the configuration icon, select Project settings in the top and click on Load.

User and Project Settings will not apply for existing notebooks that have their individual settings stored as metadata. For updating the settings of these notebooks, click on the configuration icon, select Project settings in the top, click on Load, select Notebook settings in the top, and click on Save.

Some linting messages never appear

Julynter connects to the kernel to perform some checks. When it is not able to connect to the kernel, it indicates "Kernel not found" in the status icon and it does not perform all the checks.

Currently, Julynter only supports the IPython kernel. If you are using a different kernel or programming language that you would like to be fully supported, please submit an issue. If your notebook is connected to a Python kernel, but Julynter still displays this status code, please submit an issue as well.

Different results

Julynter sends a code to execute on the kernel after every code executed by the user. This code usually does not interfere with the results, since one cell produces a result that is independent of the other. However, it is possible to add stateful-hooks to the execution of cells in the kernel, and the execution of Julynter might interfere with the results.

I noticed it in an ipykernel bug that prevented it from executing post-execution hooks on async cells. Julynter queries received the results of mattplotlib images and did not display them (https://github.com/jupyter/notebook/issues/5568). This bug has been fixed in ipykernel 5.3.1.

If you find a different result in another situation, please submit an issue.

Frontend extension is not working

If you see the frontend extension, but it is not working, check that the server extension is enabled:

jupyter serverextension list

Server extension is installed and enables, but Julynter does not appear

If the server extension is installed and enabled, but you are not seeing the frontend, check the frontend is installed:

jupyter labextension list

If it is installed, try:

jupyter lab clean
jupyter lab build

Julynter works with the command julynter lab, but the command jupyter lab started to show exceptions

Jupyter Lab attempts to run NodeJS when an extension is registered. If you do not have NodeJS installed, it will show exceptions in the console. As far as I checked, these exceptions do not cause problems, but if you still want to fix them, there are two possible solutions:

1- Install NodeJS

2- Remove julynter-labextension-*.tgz from sys.prefix\share\jupyter\lab\extensions

Supporting other programming languages

By default, Julynter works with Python and supports some basic checks for other programming languages. However, it is possible to create a custom linting kernel using a Comm.

To do it, first create a Comm in the desired programming language using the name julynter.comm, specify the receive callback and send a message with the following content:

{"operation": "init"}

The interface may send two types of operations:

  • query

    {
        "operation": "query",
        "requirements": "<requirements.txt file>"
    }
    

    This operations expects the Julynter kernel to perform the linting

  • addModule

    {
        "operation": "addModule",
        "requirements": "<requirements.txt file>",
        "module": "<module name>"
    }
    

    This operations expects the Julynter kernel to add a module to the requirements file and perform the linting

After processing the operatings, the kernel should reply with the linting result:

{   // consider all the numbers as the cell execution count number
    "operation": "queryResult",
    "executed_code": {1: "<code1>", 2: "<code2">, ...}, // Cell history based on the execution count
    "cell_dependencies": { // Map of maps
        3: {
            "a": 1, // cell 3 depends on the definition of 'a' in cell 1 
            "b": 2
        }
    },
    "missing_dependencies": {
        1: ["c", "d"] // 'c' and 'd' appear on cell 1, but they were not defined
    },
    "absolute_paths": {
        1: ["/home/"] // cell 1 has the absolute path /home
    },
    "has_imports": [2, 3], // execution counts with imports
    "missing_requirements": { // Map of maps
        2: {
            "numpy": { // Module numpy that is imported by cell 2 is not on requirements
                "status": 3,
                "msg": "Module not found on requirements.txt"
            }
        }
    }
}

If there is any error during linting, the kernel may replay with an error:

{
    "operation": "error",
    "command": "<operation that caused the error>",
    "message": "<error message>"
}

After defining the Comm, you may set the Julynter interface to automatically execute your Julynter kernel for a programming language, by configuring the Julynter options in ~/.julynter/config.json:

{
    ...
    "options": {
        ...
        "kernel": {
            "order": ["<newkernel>", "python", "default"],
            "values": {
                "<newkernel>": {
                    "language": "<regex to identify language>",
                    "initScript": "<command to create Julynter kernel Comm>",
                    "name": "<kernel name>",
                },
                "python": {
                    "language": "python",
                    "initScript": "import julynter.kernel; julynter.kernel.init()",
                    "name": "python"
                },
                "default": {
                    "kernel": ".*",
                    "initScript": null,
                    "name": "default"
                }
            }
        }
    }
}

Contributing

Pull requests for bugfixes and new features are welcome!

Development install

Note: You will need NodeJS to build the extension package.

The jlpm command is JupyterLab's pinned version of yarn that is installed with JupyterLab. You may use yarn or npm in lieu of jlpm below.

# Clone the repo to your local environment
# Change directory to the julynter directory

# Install package in development mode
pip install -e .
# Link your development version of the extension with JupyterLab
jupyter labextension develop . --overwrite
# Server extension must be manually installed in develop mode
jupyter server extension enable julynter
# Rebuild extension Typescript source after making changes
jlpm run build

You can watch the source directory and run JupyterLab at the same time in different terminals to watch for changes in the extension's source and automatically rebuild the extension.

# Watch the source directory in one terminal, automatically rebuilding when needed
jlpm run watch
# Run JupyterLab in another terminal
jupyter lab

With the watch command running, every saved change will immediately be built locally and available in your running JupyterLab. Refresh JupyterLab to load the change in your browser (you may need to wait several seconds for the extension to be rebuilt).

By default, the jlpm run build command generates the source maps for this extension to make it easier to debug using the browser dev tools. To also generate source maps for the JupyterLab core extensions, you can run the following command:

jupyter lab build --minimize=False

Development uninstall

# Server extension must be manually disabled in develop mode
jupyter server extension disable jftemp
pip uninstall jftemp

In development mode, you will also need to remove the symlink created by jupyter labextension develop command. To find its location, you can run jupyter labextension list to figure out where the labextensions folder is located. Then you can remove the symlink named julynter within that folder.

Packaging the extension

See RELEASE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

julynter-0.4.2.tar.gz (297.3 kB view details)

Uploaded Source

Built Distribution

julynter-0.4.2-py3-none-any.whl (269.5 kB view details)

Uploaded Python 3

File details

Details for the file julynter-0.4.2.tar.gz.

File metadata

  • Download URL: julynter-0.4.2.tar.gz
  • Upload date:
  • Size: 297.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for julynter-0.4.2.tar.gz
Algorithm Hash digest
SHA256 838f350463e9964401bc3e9cede1930a35f1c35dccc780de19681d2cc0a9828c
MD5 d86714f7dabbc65a7ef9a713909e419a
BLAKE2b-256 2801c7d3ea97b06a65b80ac88398181007d8fe54a64b8d3397b89e571cb03aa4

See more details on using hashes here.

File details

Details for the file julynter-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: julynter-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 269.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for julynter-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5ffd04e8d2a1ea3104c91e070825a037f653ec3b2a898e94f866208dce5c1c83
MD5 de973d0a1dd6c3469c0dbad198be3ede
BLAKE2b-256 81c3b3a34e3be734964dc73ba5ff707b877b3bd3cd049f09ab73eb4e28acfc33

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page