Skip to main content

Wipe clean your Jupyter Notebooks!

Project description

nbwipers

Test License:MIT PyPI - Version Crates.io Conda codecov

nbwipers is a command line tool to wipe clean jupyter notebooks, written in Rust.

The interface and functionality are based on nbsripout and the idea to implement it in rust comes from nbstripout-fast.

Usage

nbwipers has a few subcommands that provide functionality related to cleaning Jupyter notebooks.

  • clean: clean a single notebook. This is more-or-less equivalent to nbstripout
  • check: check notebooks in a given path for elements that would be removed by clean. This could be used in a CI context to enforce clean notebooks.
  • clean-all clean all notebooks in a given path. This one should be used carefully!
  • install register nbwipers as a git filter for ipynb files. Equivalent to nbstripout --install
  • uninstall remove nbwipers as a git filter
  • check-install check that nbwipers or nbstripout is installed in the local repo. This is used in the pre-commit hook.

The full options can be found in CommandLineHelp.md.

Examples

To set up nbwipers as a git filter in your repository, use

nbwipers install local

To check the notebooks in your folder

nbwipers check .

pre-commit

You can add the following to your pre-commit-config.yaml file to ensure that nbwipers or nbstripout is installed in your repo, as a friendly reminder

  - repo: https://github.com/felixgwilliams/nbwipers
    rev: v0.3.0
    hooks:
      - id: nbwipers-check-install

Motivation

A working copy of a Jupyter notebook contains

  1. Code written by the author
  2. Notebook outputs: tables, logs, tracebacks, images, widgets and so on...
  3. Execution counts
  4. Metadata, such as whether cells are collapsed, scrollable etc.

Of these categories of data, only the first — code written by the author — should definitely be tracked by version control, since it is the product of the author's intension and hard work. The other categories of data are subject to change outside of the explicit intensions of the author, and are generally noisy from a version control perspective.

Moreover, including notebook outputs in version control

  • makes diffs harder to interpret, as they will contain lots of unintended changes
  • increases the risk of a tricky merge conflict if different users run the same cell and get a slightly different result
  • increases the amount of data committed, which can degrade repository performance
  • risks leaking sensitive data

By using nbwipers or nbstripout as a git filter, the problematic parts of the notebook are removed from the version of the file that git sees, while leaving your working copy intact.

Note on pre-commit for Windows users

Using the above pre-commit hook requires compiling nbwipers from source. If you do not have rust installed on your system, pre-commit can create a temporary rust installation to do the compilation, but this can fail on Windows if the prerequisites for the MSVC toolchain are not met.

Running the rustup installer will give you an opportunity to automatically install the prerequisites. Once you do this and get a working rust installation, pre-commit will use that to compile.

An alternative is install rust rustup, but customize the installation to specify x86_64-pc-windows-gnu as the default host triple. This way you do not need to install the multi-gigabyte MSVC dependencies.

Testing Coverage

To test coverage, use the command:

cargo tarpaulin -o stdout -o html -o lcov --engine llvm

Using the llvm engine means that integration tests contribute to coverage.

Acknowledgements

nbwipers relies on inspiration and code from several projects. For the projects whose code was used please see LICENSE for the third-party notices.

nbsripout

strip output from Jupyter and IPython notebooks

nbstripout is an invaluable tool for working with Jupyter Notebooks in the context of version control. This project forms the basis of the interface and logic of this project and is also the source of the testing examples.

nbstripout-fast

A much faster version of nbstripout by writing it in rust (of course).

nbstripout-fast, like this project, implements the functionality of nbstripout in Rust, while also allowing repo-level configuration in a YAML file.

With nbwipers I hoped to recreate the idea of nbstripout-fast, but with the ability to install as a git filter, and configuration via pyproject.toml.

ruff

An extremely fast Python linter and code formatter, written in Rust.

Ruff is quickly becoming the linter for python code, thanks to its performance, extensive set of rules and its ease of use. It was a definite source of knowledge for the organisation of the configuration and the file discovery. The schema for Jupyter Notebooks, and some of the file discovery code was adapted from Ruff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nbwipers-0.3.1.tar.gz (44.7 kB view hashes)

Uploaded Source

Built Distributions

nbwipers-0.3.1-py3-none-win_arm64.whl (1.6 MB view hashes)

Uploaded Python 3 Windows ARM64

nbwipers-0.3.1-py3-none-win_amd64.whl (1.7 MB view hashes)

Uploaded Python 3 Windows x86-64

nbwipers-0.3.1-py3-none-win32.whl (1.6 MB view hashes)

Uploaded Python 3 Windows x86

nbwipers-0.3.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB view hashes)

Uploaded Python 3 manylinux: glibc 2.17+ x86-64

nbwipers-0.3.1-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl (3.5 MB view hashes)

Uploaded Python 3 manylinux: glibc 2.17+ s390x

nbwipers-0.3.1-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (3.6 MB view hashes)

Uploaded Python 3 manylinux: glibc 2.17+ ppc64le

nbwipers-0.3.1-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (3.3 MB view hashes)

Uploaded Python 3 manylinux: glibc 2.17+ i686

nbwipers-0.3.1-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (3.0 MB view hashes)

Uploaded Python 3 manylinux: glibc 2.17+ ARMv7l

nbwipers-0.3.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.1 MB view hashes)

Uploaded Python 3 manylinux: glibc 2.17+ ARM64

nbwipers-0.3.1-py3-none-macosx_11_0_arm64.whl (2.0 MB view hashes)

Uploaded Python 3 macOS 11.0+ ARM64

nbwipers-0.3.1-py3-none-macosx_10_12_x86_64.whl (2.0 MB view hashes)

Uploaded Python 3 macOS 10.12+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page