Skip to main content

Wipe clean your Jupyter Notebooks!

Project description

nbwipers

Test License:MIT PyPI - Version Crates.io Conda codecov

nbwipers is a command line tool to wipe clean jupyter notebooks, written in Rust.

The interface and functionality are based on nbstripout and the idea to implement it in rust comes from nbstripout-fast.

Usage

nbwipers has a few subcommands that provide functionality related to cleaning Jupyter notebooks.

  • clean: clean a single notebook. This is more-or-less equivalent to nbstripout.
  • check: check notebooks in a given path for elements that would be removed by clean. This could be used in a CI context to enforce clean notebooks.
  • clean-all clean all notebooks in a given path. This one should be used carefully!
  • install register nbwipers as a git filter for ipynb files. Equivalent to nbstripout --install
  • uninstall remove nbwipers as a git filter.
  • check-install check that nbwipers or nbstripout is installed in the local repo. This is used in the pre-commit hook.

The full options can be found in CommandLineHelp.md.

Examples

To set up nbwipers as a git filter in your repository, use

nbwipers install local

If this step is performed on a pre-existing repo, you can touch your notebooks so that git can detect the changes. In bash:

for f in $(git ls-files '*.ipynb'); do touch $f; done

To check the notebooks in your folder, you can run the following

nbwipers check .

pre-commit

You can add the following to your pre-commit-config.yaml file to ensure that nbwipers or nbstripout is installed in your repo, in order to prevent Jupyter notebook outputs from being committed to version control.

  - repo: https://github.com/felixgwilliams/nbwipers-pre-commit
    rev: v0.3.4
    hooks:
      - id: nbwipers-check-install

Alternatively, you can use the URL for this repo in your config, but this will compile nbwipers from source, rather than retrieving the binary from PyPI, and is therefore not recommended.

If you are using your pre-commit configuration as part of CI, you should set the environment variable NBWIPERS_CHECK_INSTALL_EXIT_ZERO which forces this check to pass, since you do not need nbwipers configured in your CI environment.

Motivation

A working copy of a Jupyter notebook contains:

  1. Code written by the author.
  2. Notebook outputs: tables, logs, tracebacks, images, widgets and so on...
  3. Execution counts.
  4. Metadata, such as whether cells are collapsed, scrollable etc.

Of these categories of data, only the first — code written by the author — should definitely be tracked by version control, since it is the product of the author's intention and hard work. The other categories of data are subject to change outside of the explicit intentions of the author and are generally noisy from a version control perspective.

Moreover, including notebook outputs in version control

  • makes diffs harder to interpret, as they will contain lots of unintended changes.
  • increases the risk of a tricky merge conflict if different users run the same cell and get a slightly different result.
  • increases the amount of data committed, which can degrade repository performance.
  • risks leaking sensitive data.

An effective way to ensure you do not commit problematic parts of your notebooks is to use nbwipers or nbstripout as a git filter.

A git filter sits between your actual files and what git sees when you stage and commit your changes. This way, git only sees the transformed version of the file without the problematic elements. At the same time, you do not have to lose them from your local copy.

An exception is when you checkout a branch or do a git pull, which results in changes to the notebook. In this case, your local copy will be replaced by the clean version and you will lose your cell outputs.

Configuration

Configuration is currently done via the tool.nbwipers section of the pyproject.toml file. Most of the command line options can be set per-project in the pyproject.toml, nbwipers.toml or .nbwipers.toml file. If you use pyroject.toml, you need to put the configuration under [tool.nbwipers]. If you use nbwipers.toml or .nbwipers.toml, the configuration needs to be at the top level.

For example you can use extra-keys to specify additional notebook elements you want to ignore. If you don't need the python version or the details about the Jupyter Kernel, you can include the following in your pyproject.toml file:

[tool.nbwipers]
extra-keys = ["metadata.kernelspec", "metadata.language_info.version"]

The equivalent for nbwipers.toml or .nbwipers.toml is just

extra-keys = ["metadata.kernelspec", "metadata.language_info.version"]

This can be useful when collaborating, as the precise python version and the name assigned to the kernel are ephemeral and can change from person to person.

Testing Coverage

To test coverage, use the command:

cargo tarpaulin -o stdout -o html -o lcov --engine llvm

Using the llvm engine means that integration tests contribute to coverage.

Acknowledgements

nbwipers relies on inspiration and code from several projects. For the projects, whose code was used please see LICENSE for the third-party notices.

nbstripout

strip output from Jupyter and IPython notebooks

nbstripout is an invaluable tool for working with Jupyter Notebooks in the context of version control. This project forms the basis of the interface and logic of this project and is also the source of the testing examples.

nbstripout-fast

A much faster version of nbstripout by writing it in rust (of course).

nbstripout-fast, like this project, implements the functionality of nbstripout in Rust, while also allowing repo-level configuration in a YAML file.

With nbwipers I hoped to recreate the idea of nbstripout-fast, but with the ability to install as a git filter, and configuration via pyproject.toml.

ruff

An extremely fast Python linter and code formatter, written in Rust.

Ruff is quickly becoming the linter for python code, thanks to its performance, extensive set of rules and its ease of use. It was a definite source of knowledge for the organisation of the configuration and the file discovery. The schema for Jupyter Notebooks, and some of the file discovery code was adapted from Ruff.

pre-commit

A framework for managing and maintaining multi-language pre-commit hooks.

This repo contains a version of the check-large-files hook, that will not flag notebook files whose clean size is less that the threshold, even if the size on-disk including outputs is greater than the threshold. The logic and interface of the hook was adapted from the pre-commit-hooks repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nbwipers-0.5.0.tar.gz (63.4 kB view details)

Uploaded Source

Built Distributions

nbwipers-0.5.0-py3-none-win_amd64.whl (1.9 MB view details)

Uploaded Python 3 Windows x86-64

nbwipers-0.5.0-py3-none-win32.whl (1.7 MB view details)

Uploaded Python 3 Windows x86

nbwipers-0.5.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded Python 3 manylinux: glibc 2.17+ x86-64

nbwipers-0.5.0-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl (2.5 MB view details)

Uploaded Python 3 manylinux: glibc 2.17+ s390x

nbwipers-0.5.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (2.4 MB view details)

Uploaded Python 3 manylinux: glibc 2.17+ ppc64le

nbwipers-0.5.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.5 MB view details)

Uploaded Python 3 manylinux: glibc 2.17+ i686

nbwipers-0.5.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.2 MB view details)

Uploaded Python 3 manylinux: glibc 2.17+ ARMv7l

nbwipers-0.5.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.2 MB view details)

Uploaded Python 3 manylinux: glibc 2.17+ ARM64

nbwipers-0.5.0-py3-none-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded Python 3 macOS 11.0+ ARM64

nbwipers-0.5.0-py3-none-macosx_10_12_x86_64.whl (2.2 MB view details)

Uploaded Python 3 macOS 10.12+ x86-64

File details

Details for the file nbwipers-0.5.0.tar.gz.

File metadata

  • Download URL: nbwipers-0.5.0.tar.gz
  • Upload date:
  • Size: 63.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.4

File hashes

Hashes for nbwipers-0.5.0.tar.gz
Algorithm Hash digest
SHA256 2d8e48ec9332c81ad5fe104206b782c7b43015462ccec774041d4cbaedf0103d
MD5 dc7e03eb5ed6cb4270474a9891ce16d0
BLAKE2b-256 5a3cf240b2dc5b1dad266bb59ce191e83e5848857845163c6d2621c2f69bc028

See more details on using hashes here.

File details

Details for the file nbwipers-0.5.0-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for nbwipers-0.5.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 1649df3cb9a187d826b70cb853054fef4569da2a6b03a01285655fd9dca2904f
MD5 d39ab44b5996f58fa23feebf92133626
BLAKE2b-256 1686253d1092863a3cd6900b1e5b7816974bba2a1341b6afdf90e0e1dbe6cbb2

See more details on using hashes here.

File details

Details for the file nbwipers-0.5.0-py3-none-win32.whl.

File metadata

  • Download URL: nbwipers-0.5.0-py3-none-win32.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.4

File hashes

Hashes for nbwipers-0.5.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 81c6dfa0d85f4e103df43789371431a07e7769d0941cd24ea5bf0ca6d0a8d8aa
MD5 246377e95de0ee0b4ebc4a352ca0b745
BLAKE2b-256 e172af0379b453cb09939702598594c07b6fd0460cd7347b60b2ea7e06d359da

See more details on using hashes here.

File details

Details for the file nbwipers-0.5.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nbwipers-0.5.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8abe4d6dc5be7201461fd9046d4827bf67d47a3418d481a67a694ac641152242
MD5 d6d7257c88bc1d47e43615d31afd61b7
BLAKE2b-256 870b27c06eacefb2b8664f69255ae4b526be09b00caede3453f13e49d2603992

See more details on using hashes here.

File details

Details for the file nbwipers-0.5.0-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for nbwipers-0.5.0-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 7eba7b387494528d03c42bc17d2fcf0d526d4a8d72b4915364eafd347b02d932
MD5 09000d4b31ef6473223d9280455c5a7c
BLAKE2b-256 32748a804f45fdf14a335d784d6a6b1b7f99416233c2a11ec255a90f23b8c298

See more details on using hashes here.

File details

Details for the file nbwipers-0.5.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for nbwipers-0.5.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 af5d6b58c58d36270ed2f9e2903c183bb895f114a233026bd0743404765b4a30
MD5 53d87a913e6e3c651588e95872b6374c
BLAKE2b-256 d769bf63f397c7f7cef1a2e7135be57efec69652b957753844d0232845200c9e

See more details on using hashes here.

File details

Details for the file nbwipers-0.5.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nbwipers-0.5.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 c44550b0378955958a915e043cf6fae074b695f3f067b2ce8b157c2513ae6d12
MD5 a7743a49d95b0495c48230f16ffa66d8
BLAKE2b-256 1dbc4e6a3648a0a5e05b30c4677e112edd1f79ced81ece39c4890d077f66d7ce

See more details on using hashes here.

File details

Details for the file nbwipers-0.5.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for nbwipers-0.5.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 79f1a2607300abdd7c662664334b5b77f071f5cccb6b6db6a105c89a7a7fa145
MD5 809c8078f445dc3ae7c7586297107014
BLAKE2b-256 0302b582f2e5ddf1124ffcbdf6e4a7cab5f7262960e5261a914d0314b2cff08d

See more details on using hashes here.

File details

Details for the file nbwipers-0.5.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for nbwipers-0.5.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a807691c8bb3cec461bf9e6adb9daab0f74d05af41c328024eba66ee75676550
MD5 51244c5ae46003bbbb6d856626c49291
BLAKE2b-256 a991c554176b2a460d959df3b09eaab450c5d0a2ab1cd23733011333f19b5d20

See more details on using hashes here.

File details

Details for the file nbwipers-0.5.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nbwipers-0.5.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7e8ef3c95d1532b611b5cbccba64c6b4b95211cddcba8184df62fdb3350bb71a
MD5 fdc88c94619210593a1f696c402b5b4c
BLAKE2b-256 c76de9e7533ab112d26671e237406162bc1d7318c0f92f174f0019ec17a938c2

See more details on using hashes here.

File details

Details for the file nbwipers-0.5.0-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for nbwipers-0.5.0-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 45785295d9e64766d9d3089fa241a08e944369e76a919a59aca2a0c06fac2c30
MD5 1ed2d71d218847b970dbdb4c76cd9bf4
BLAKE2b-256 d3c16aaa1516ffd382ee613b7f30e1a617a6711e7f46dc5dee3b4def4385e412

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page