A linter for Jupyter notebooks written in Python.
Project description
Pynblint
Many professional data scientists use Jupyter Notebook to accomplish their daily tasks, from preliminary data exploration to model prototyping. Notebooks' interactivity is particularly convenient for data-centric programming; moreover, their self-documenting nature greatly simplifies and enhances the communication of analytical results.
However, Jupyter Notebook has often been criticized for offering scarce native support for Software Engineering best practices and inducing bad programming habits. To really benefit from computational notebooks, practitioners need to be aware of their common pitfalls and learn how to avoid them.
In our paper "Eliciting Best Practices for Collaboration with Computational Notebooks" [1], we introduced a catalog of validated best practices for the collaborative use of notebooks in professional contexts.
To raise awareness of these best practices and promote their adoption, we have created Pynblint, a static analysis tool for Jupyter notebooks written in Python. Pynblint can be operated as a standalone CLI application or as part of a CI/CD pipeline. It reveals potential defects of Jupyter notebooks found in software repositories and recommends corrective actions.
The core linting rules that power Pynblint have been derived as operationalizations of the validated best practices from our catalog. Nonetheless, the tool is designed to be easily customized and extended with new rules.
Catalog of best practices
- Use version control
- Manage project dependencies
- Use self-contained environments
- Put imports at the beginning
- Ensure re-executability (re-run notebooks top to bottom)
- Modularize your code
- Test your code
- Name your notebooks consistently
- Stick to coding standards
- Use relative paths
- Document your analysis
- Leverage Markdown headings to structure your notebook
- Keep your notebook clean
- Keep your notebook concise
- Distinguish production and development artifacts
- Make your notebooks available
- Make your data available
Installation
To use Pynblint, clone this repository and install it with Poetry:
poetry install --no-dev
To install Pynblint for development purposes, simply omit the --no-dev
option:
poetry install
At present, we are finalizing the first version of Pynblint (v0.1.0).
When released, it will become available as a Python package on PyPI and installable via pip
.
Usage
Once installed, Pynblint can be used to analyze:
-
a single notebook:
pynblint path/to/the/notebook.ipynb
-
the set of notebooks found in the current working directory:
pynblint .
-
the set of notebooks found in the directory located at the specified path:
pynblint path/to/the/project/dir/
-
the set of notebooks found in a compressed
.zip
archive:pynblint path/to/the/compressed/archive.zip
-
the set of notebooks found in a public GitHub repository (support for private repositories is on our roadmap 🙂):
pynblint --from-github https://github.com/collab-uniba/pynblint
For further information on the available options, please read Pynblint's CLI manual:
pynblint --help
References
Luigi Quaranta, Fabio Calefato, and Filippo Lanubile. 2022. Eliciting Best Practices for Collaboration with Computational Notebooks. Proc. ACM Hum.-Comput. Interact. 6, CSCW1, Article 87 (April 2022), 41 pages. https://doi.org/10.1145/3512934
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.