Skip to main content

A linter for Jupyter notebooks written in Python.

Project description

Pynblint

CI Documentation Status codecov License: MIT code style

Many professional data scientists use Jupyter Notebook to accomplish their daily tasks, from preliminary data exploration to model prototyping. Notebooks' interactivity is particularly convenient for data-centric programming; moreover, their self-documenting nature greatly simplifies and enhances the communication of analytical results.

However, Jupyter Notebook has often been criticized for offering scarce native support for Software Engineering best practices and inducing bad programming habits. To really benefit from computational notebooks, practitioners need to be aware of their common pitfalls and learn how to avoid them.

In our paper "Eliciting Best Practices for Collaboration with Computational Notebooks" [1], we introduced a catalog of validated best practices for the collaborative use of notebooks in professional contexts.

To raise awareness of these best practices and promote their adoption, we have created Pynblint, a static analysis tool for Jupyter notebooks written in Python. Pynblint can be operated as a standalone CLI application or as part of a CI/CD pipeline. It reveals potential defects of Jupyter notebooks found in software repositories and recommends corrective actions.

The core linting rules that power Pynblint have been derived as operationalizations of the validated best practices from our catalog. Nonetheless, the tool is designed to be easily customized and extended with new rules.

Catalog of best practices

  • Use version control
  • Manage project dependencies
  • Use self-contained environments
  • Put imports at the beginning
  • Ensure re-executability (re-run notebooks top to bottom)
  • Modularize your code
  • Test your code
  • Name your notebooks consistently
  • Stick to coding standards
  • Use relative paths
  • Document your analysis
  • Leverage Markdown headings to structure your notebook
  • Keep your notebook clean
  • Keep your notebook concise
  • Distinguish production and development artifacts
  • Make your notebooks available
  • Make your data available

Installation

To use Pynblint, clone this repository and install it with Poetry:

poetry install --no-dev

To install Pynblint for development purposes, simply omit the --no-dev option:

poetry install

At present, we are finalizing the first version of Pynblint (v0.1.0). When released, it will become available as a Python package on PyPI and installable via pip.

Usage

Once installed, Pynblint can be used to analyze:

  • a single notebook:

    pynblint path/to/the/notebook.ipynb
    
  • the set of notebooks found in the current working directory:

    pynblint .
    
  • the set of notebooks found in the directory located at the specified path:

    pynblint path/to/the/project/dir/
    
  • the set of notebooks found in a compressed .zip archive:

    pynblint path/to/the/compressed/archive.zip
    
  • the set of notebooks found in a public GitHub repository (support for private repositories is on our roadmap 🙂):

    pynblint --from-github https://github.com/collab-uniba/pynblint
    

For further information on the available options, please read Pynblint's CLI manual:

pynblint --help

References

Luigi Quaranta, Fabio Calefato, and Filippo Lanubile. 2022. Eliciting Best Practices for Collaboration with Computational Notebooks. Proc. ACM Hum.-Comput. Interact. 6, CSCW1, Article 87 (April 2022), 41 pages. https://doi.org/10.1145/3512934

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynblint-0.1.3.tar.gz (18.9 kB view hashes)

Uploaded Source

Built Distribution

pynblint-0.1.3-py3-none-any.whl (22.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page