Skip to main content

Wipe clean your Jupyter Notebooks!

Project description

nbwipers

Test License:MIT PyPI - Version Crates.io Conda codecov

nbwipers is a command line tool to wipe clean jupyter notebooks, written in Rust.

The interface and functionality are based on nbstripout and the idea to implement it in rust comes from nbstripout-fast.

Usage

nbwipers has a few subcommands that provide functionality related to cleaning Jupyter notebooks.

  • clean: clean a single notebook. This is more-or-less equivalent to nbstripout.
  • check: check notebooks in a given path for elements that would be removed by clean. This could be used in a CI context to enforce clean notebooks.
  • clean-all clean all notebooks in a given path. This one should be used carefully!
  • install register nbwipers as a git filter for ipynb files. Equivalent to nbstripout --install
  • uninstall remove nbwipers as a git filter.
  • check-install check that nbwipers or nbstripout is installed in the local repo. This is used in the pre-commit hook.

The full options can be found in CommandLineHelp.md.

Examples

To set up nbwipers as a git filter in your repository, use

nbwipers install local

If this step is performed on a pre-existing repo, you can touch your notebooks so that git can detect the changes. In bash:

for f in $(git ls-files '*.ipynb'); do touch $f; done

To check the notebooks in your folder, you can run the following

nbwipers check .

pre-commit

You can add the following to your pre-commit-config.yaml file to ensure that nbwipers or nbstripout is installed in your repo, in order to prevent Jupyter notebook outputs from being committed to version control.

  - repo: https://github.com/felixgwilliams/nbwipers-pre-commit
    rev: v0.3.4
    hooks:
      - id: nbwipers-check-install

Alternatively, you can use the URL for this repo in your config, but this will compile nbwipers from source, rather than retrieving the binary from PyPI, and is therefore not recommended.

If you are using your pre-commit configuration as part of CI, you should set the environment variable NBWIPERS_CHECK_INSTALL_EXIT_ZERO which forces this check to pass, since you do not need nbwipers configured in your CI environment.

Configuration

Configuration is currently done via the tool.nbwipers section of the pyproject.toml file. Most of the command line options can be set per-project in the pyproject.toml, nbwipers.toml or .nbwipers.toml file. If you use pyroject.toml, you need to put the configuration under [tool.nbwipers]. If you use nbwipers.toml or .nbwipers.toml, the configuration needs to be at the top level.

For example you can use strip-kernel-info to remove metadata on the python version or the details about the Jupyter Kernel.

You can also drop cell ids using id-action = "drop".

To enable these options, you can include the following in your pyproject.toml file:

[tool.nbwipers]
strip-kernel-info = true
id-action = "drop"

The equivalent for nbwipers.toml or .nbwipers.toml is just

strip-kernel-info = true
id-action = "drop"

This can be useful when collaborating, as the precise python version and the name assigned to the kernel are ephemeral and can change from person to person. Cell IDs are another element of the file which is generated by the tool you use and can change from person to person.

Motivation

A working copy of a Jupyter notebook contains:

  1. Code written by the author.
  2. Notebook outputs: tables, logs, tracebacks, images, widgets and so on...
  3. Execution counts.
  4. Metadata, such as whether cells are collapsed, scrollable etc.

Of these categories of data, only the first — code written by the author — should definitely be tracked by version control, since it is the product of the author's intention and hard work. The other categories of data are subject to change outside of the explicit intentions of the author and are generally noisy from a version control perspective.

Moreover, including notebook outputs in version control

  • makes diffs harder to interpret, as they will contain lots of unintended changes.
  • increases the risk of a tricky merge conflict if different users run the same cell and get a slightly different result.
  • increases the amount of data committed, which can degrade repository performance.
  • risks leaking sensitive data.

An effective way to ensure you do not commit problematic parts of your notebooks is to use nbwipers or nbstripout as a git filter.

A git filter sits between your actual files and what git sees when you stage and commit your changes. This way, git only sees the transformed version of the file without the problematic elements. At the same time, you do not have to lose them from your local copy.

An exception is when you checkout a branch or do a git pull, which results in changes to the notebook. In this case, your local copy will be replaced by the clean version and you will lose your cell outputs.

Acknowledgements

nbwipers relies on inspiration and code from several projects. For the projects, whose code was used please see LICENSE for the third-party notices.

nbstripout

strip output from Jupyter and IPython notebooks

nbstripout is an invaluable tool for working with Jupyter Notebooks in the context of version control. This project forms the basis of the interface and logic of this project and is also the source of the testing examples.

nbstripout-fast

A much faster version of nbstripout by writing it in rust (of course).

nbstripout-fast, like this project, implements the functionality of nbstripout in Rust, while also allowing repo-level configuration in a YAML file.

With nbwipers I hoped to recreate the idea of nbstripout-fast, but with the ability to install as a git filter, and configuration via pyproject.toml.

ruff

An extremely fast Python linter and code formatter, written in Rust.

Ruff is quickly becoming the linter for python code, thanks to its performance, extensive set of rules and its ease of use. It was a definite source of knowledge for the organisation of the configuration and the file discovery. The schema for Jupyter Notebooks, and some of the file discovery code was adapted from Ruff.

pre-commit

A framework for managing and maintaining multi-language pre-commit hooks.

This repo contains a version of the check-large-files hook, that will not flag notebook files whose clean size is less that the threshold, even if the size on-disk including outputs is greater than the threshold. The logic and interface of the hook was adapted from the pre-commit-hooks repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nbwipers-0.6.1.tar.gz (67.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nbwipers-0.6.1-py3-none-win_amd64.whl (1.9 MB view details)

Uploaded Python 3Windows x86-64

nbwipers-0.6.1-py3-none-win32.whl (1.8 MB view details)

Uploaded Python 3Windows x86

nbwipers-0.6.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

nbwipers-0.6.1-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl (2.6 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ s390x

nbwipers-0.6.1-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ppc64le

nbwipers-0.6.1-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (2.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

nbwipers-0.6.1-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (2.2 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARMv7l

nbwipers-0.6.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.3 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

nbwipers-0.6.1-py3-none-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

nbwipers-0.6.1-py3-none-macosx_10_12_x86_64.whl (2.2 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file nbwipers-0.6.1.tar.gz.

File metadata

  • Download URL: nbwipers-0.6.1.tar.gz
  • Upload date:
  • Size: 67.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.1

File hashes

Hashes for nbwipers-0.6.1.tar.gz
Algorithm Hash digest
SHA256 d27ffd2c5e0ad6f52698aaa33194f16223394c21853ec9c250f00f56b554f0a9
MD5 4b88449bf6896e7f04ffe701820105d7
BLAKE2b-256 b6677380a0bc4f14f2d16b4938b226d31287a9487ded741a16cd9afc0d7e8323

See more details on using hashes here.

File details

Details for the file nbwipers-0.6.1-py3-none-win_amd64.whl.

File metadata

  • Download URL: nbwipers-0.6.1-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.1

File hashes

Hashes for nbwipers-0.6.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 b84477d611cf114ac5b224a94ac1d4bc1a03c23f637094c53c7aae34f220b130
MD5 77786574f393d3ffb4e72c3a1fa8cd97
BLAKE2b-256 15159512585a3361a3a4e0276f3dcc322ea6272cc0b4daebb774e1389d3bfd77

See more details on using hashes here.

File details

Details for the file nbwipers-0.6.1-py3-none-win32.whl.

File metadata

  • Download URL: nbwipers-0.6.1-py3-none-win32.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.1

File hashes

Hashes for nbwipers-0.6.1-py3-none-win32.whl
Algorithm Hash digest
SHA256 bea8da0e338dcc0d61ce8929edf37f55c21bb63e3f984b933601388082ccb5cc
MD5 0df1d5d0af8738faf16e713cf6c4b62a
BLAKE2b-256 81de385baf877999056867db0fb704014f75a6da73d93f8071f391df9942408e

See more details on using hashes here.

File details

Details for the file nbwipers-0.6.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nbwipers-0.6.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cc913734dd2dd9319020961f1bdb23c128aa7c91967cce0206ba03d9ea44ff28
MD5 6b1e2e5199d66855a7d9d0ff4234e715
BLAKE2b-256 dac1b476de6558f6c71374ff9ac5ee2558d4001a982484b27a10eaec76362762

See more details on using hashes here.

File details

Details for the file nbwipers-0.6.1-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for nbwipers-0.6.1-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 ed54c590e5cd19409e6c21c51e5f29c2d2bd0bc001ac99522973a0b4ae7410e7
MD5 ea3b4cbdb09f29a0236858a86701c2de
BLAKE2b-256 efd86a30170a9544bdc7f77e20e1ae979cee1b7d0495133f1a65db0350b82b8b

See more details on using hashes here.

File details

Details for the file nbwipers-0.6.1-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for nbwipers-0.6.1-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 e9f609d7157b76a1a7a754b24670cd0bad091f39e8aed40249fca72543e142e8
MD5 f34df85c579d6bf829789e49e51c5737
BLAKE2b-256 a303ece1733857cbd3ea34f79fe9184889031288513e503793849a2e5ae05e20

See more details on using hashes here.

File details

Details for the file nbwipers-0.6.1-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nbwipers-0.6.1-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 612e4ae9dea14ba53fe6847b1997ce5b78fa6acb427a6ea4ee174c95ad2ee37f
MD5 113efb6f09dd0fcfcb5b7feee8784390
BLAKE2b-256 6f6c973e076aad968d440532a997d993c443a9b8b4022f102dab7f188be193d1

See more details on using hashes here.

File details

Details for the file nbwipers-0.6.1-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for nbwipers-0.6.1-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 e96e41fb65da99f6b1e60763e7516e52d35f425c4d87879b510c58ca46179f52
MD5 a0df3de14842d57340246eb325337d62
BLAKE2b-256 7c34133071dbe0177d563b750e786db48260e220f74a20dc9eb3977d381d9a5d

See more details on using hashes here.

File details

Details for the file nbwipers-0.6.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for nbwipers-0.6.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a4c084119685791857db8998899c22ed8022450615156354667c093d4389b2a5
MD5 8b0d308a54c0c4cf2985f52846ab4547
BLAKE2b-256 847e5fd658335851e56e2acfbcb32a4b7dfbbbd224b7a550afffc236d7f59eb6

See more details on using hashes here.

File details

Details for the file nbwipers-0.6.1-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nbwipers-0.6.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a719e015d608cfa9a4716630e1f2a7a926935247475a4de3a9660a0761bda2a2
MD5 50f9e3135eed8d2a4631c95dda345387
BLAKE2b-256 fa5d2f186ee3645f8dc617bf10c389e0dc1a32c7e0603db097e1d63f5e1a9079

See more details on using hashes here.

File details

Details for the file nbwipers-0.6.1-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for nbwipers-0.6.1-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8e51ae6d7e47469d9a8db9d2e491e3a0346b1db4aa1b4f97ba8595046990018a
MD5 cf8080c3af4d887bceded09434ebc544
BLAKE2b-256 e311997d87e877510b69ea377c89cffd2ef1e00d79e23cf1d16f79e446267fe8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page