Wipe clean your Jupyter Notebooks!
Project description
nbwipers
nbwipers is a command line tool to wipe clean jupyter notebooks, written in Rust.
The interface and functionality are based on nbstripout and the idea to implement it in rust comes from nbstripout-fast.
Usage
nbwipers has a few subcommands that provide functionality related to cleaning Jupyter notebooks.
clean
: clean a single notebook. This is more-or-less equivalent tonbstripout
check
: check notebooks in a given path for elements that would be removed byclean
. This could be used in a CI context to enforce clean notebooks.clean-all
clean all notebooks in a given path. This one should be used carefully!install
register nbwipers as a git filter foripynb
files. Equivalent tonbstripout --install
uninstall
remove nbwipers as a git filtercheck-install
check thatnbwipers
ornbstripout
is installed in the local repo. This is used in the pre-commit hook.
The full options can be found in CommandLineHelp.md
.
Examples
To set up nbwipers as a git filter in your repository, use
nbwipers install local
To check the notebooks in your folder
nbwipers check .
pre-commit
You can add the following to your pre-commit-config.yaml
file to ensure that nbwipers
or nbstripout
is installed in your repo, in order to prevent Jupyter notebook outputs from being committed to version control.
- repo: https://github.com/felixgwilliams/nbwipers-pre-commit
rev: v0.3.3
hooks:
- id: nbwipers-check-install
Alternatively, you can use the URL for this repo in your config, but this will compile nbwipers
from source, rather than retrieving the binary from PyPI, and is therefore not recommended.
If you are using your pre-commit configuration as part of CI, you should set the environment variable NBWIPERS_CHECK_INSTALL_EXIT_ZERO
which forces this check to pass, since you do not need nbwipers
configured in your CI environment.
Motivation
A working copy of a Jupyter notebook contains
- Code written by the author
- Notebook outputs: tables, logs, tracebacks, images, widgets and so on...
- Execution counts
- Metadata, such as whether cells are collapsed, scrollable etc.
Of these categories of data, only the first — code written by the author — should definitely be tracked by version control, since it is the product of the author's intension and hard work. The other categories of data are subject to change outside of the explicit intensions of the author, and are generally noisy from a version control perspective.
Moreover, including notebook outputs in version control
- makes diffs harder to interpret, as they will contain lots of unintended changes
- increases the risk of a tricky merge conflict if different users run the same cell and get a slightly different result
- increases the amount of data committed, which can degrade repository performance
- risks leaking sensitive data
By using nbwipers or nbstripout as a git filter, the problematic parts of the notebook are removed from the version of the file that git sees, while leaving your working copy intact.
Testing Coverage
To test coverage, use the command:
cargo tarpaulin -o stdout -o html -o lcov --engine llvm
Using the llvm
engine means that integration tests contribute to coverage.
Acknowledgements
nbwipers relies on inspiration and code from several projects.
For the projects whose code was used please see LICENSE
for the third-party notices.
nbstripout
strip output from Jupyter and IPython notebooks
nbstripout is an invaluable tool for working with Jupyter Notebooks in the context of version control. This project forms the basis of the interface and logic of this project and is also the source of the testing examples.
nbstripout-fast
A much faster version of nbstripout by writing it in rust (of course).
nbstripout-fast, like this project, implements the functionality of nbstripout in Rust, while also allowing repo-level configuration in a YAML file.
With nbwipers I hoped to recreate the idea of nbstripout-fast, but with the ability to install as a git filter, and configuration via pyproject.toml
.
ruff
An extremely fast Python linter and code formatter, written in Rust.
Ruff is quickly becoming the linter for python code, thanks to its performance, extensive set of rules and its ease of use. It was a definite source of knowledge for the organisation of the configuration and the file discovery. The schema for Jupyter Notebooks, and some of the file discovery code was adapted from Ruff.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for nbwipers-0.3.3-py3-none-win_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd7eba35011bdf2be39e7e75eb8d15d217a525dcd0ecad72aa09548a7ce11f9a |
|
MD5 | 0cc0bbdd2ad8f0ca46786c6e3e4c33fc |
|
BLAKE2b-256 | 712e434f26c400e7ce5c0ff9737b58442a7363e4c909a6ade9033bd9ac6a8fe3 |
Hashes for nbwipers-0.3.3-py3-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cf797b457bc969f8e707bd69ebb6c0f9935e958cfac78982bd7d941a671f8ee |
|
MD5 | 610d4334d5193b5354dd00c9bf1b2881 |
|
BLAKE2b-256 | 080e0bbee784551b993b17cb3f398765a44c05ef02d436484cf4dc522637c72a |
Hashes for nbwipers-0.3.3-py3-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d2f78079b8f0f79a6b134b8cfd467e1a60ad6f3136e262fed4c14d0125a3e00 |
|
MD5 | daa3440b5ffc7cbbde228c500a976602 |
|
BLAKE2b-256 | 8a1938548069ba1173ea19ae26c649dd2d4339960265b1dc2f3b46ac298a0da6 |
Hashes for nbwipers-0.3.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9217bc0b96924f85153721798c206301f83a811dab7885351be3e3c2dfee9b9d |
|
MD5 | 712c76d79be3cff1eaf74e5385290631 |
|
BLAKE2b-256 | d3f9ace287f3db12998c6f37dcb950d51062e7687ae382536e91fd6b6fcac3a6 |
Hashes for nbwipers-0.3.3-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21bdc17eb51b177ba0d5af96bdbfe58670e428b5fb63bceb1591265c7255c065 |
|
MD5 | cab2afbdf6dcabe2301df34079c7c832 |
|
BLAKE2b-256 | f7a24222ff6d953fa48637890c638e6e12e2f7b13a6142b19ca0a0a23b9412a8 |
Hashes for nbwipers-0.3.3-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2f9ed9f9e9828da7bdf71d7895827f488936a4aff83439c288756ce975fd365 |
|
MD5 | 29b441ff722c73b8f7ceb83223ede6b3 |
|
BLAKE2b-256 | 3b5d68de23eadb9ced49f892959cf5fc952a69685daec962d6e7bf38ac7ceaca |
Hashes for nbwipers-0.3.3-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b378ad3dcf7383fc25076a6d9f710cd8d649ba7ee305da941b77dd4844fa4ea6 |
|
MD5 | 3d50c1e3a00b417fbf5f31ae8e85a482 |
|
BLAKE2b-256 | ec8dcf934839f36081a307a1df8b3f55cb2cc85c786f0906938fbfac65810ccb |
Hashes for nbwipers-0.3.3-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5d8aa45f16a64ff0cd0ecafea26b19c62af708f13aa0bcc45d3915b2c93502d |
|
MD5 | e68a67f7a8883491dfdbad3418963a42 |
|
BLAKE2b-256 | 4ef1392fc1a18946f42fd120430829d5241b98b9aaee22b4124956644d6665f4 |
Hashes for nbwipers-0.3.3-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a3b3839f96848fcbae58029a2ec89f882c7ece6bb14a4fbc0565d8081ba778a |
|
MD5 | af7e0a3651ba843d0f431c1bdfff974b |
|
BLAKE2b-256 | 6839740f77c7fa878f02508e88e82e6e51150a97beba0932f0f7cf5bad7e9f66 |
Hashes for nbwipers-0.3.3-py3-none-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59c8a70865f1ab5c64b456ba1ab35d749032d8eb45f57ecd55d853813035f65a |
|
MD5 | 90a313b0dd96582058a6aa01153a5c9c |
|
BLAKE2b-256 | 8ea4425cf659c9a1678743eb07bbf84ff8138b6769562ec189b39bf246d00f32 |
Hashes for nbwipers-0.3.3-py3-none-macosx_10_12_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0374ee555f964487ebef92a17ab8dad56cb0b5eede5fe5121eb0bf0e60392476 |
|
MD5 | f0dfeaf2034c0c3e4cdc2dc99f2bba9e |
|
BLAKE2b-256 | 2cc013d3ebd4192aa0ba51c66470cddd397c1e7aeed97edfeee39472a92b5451 |