Skip to main content

Strips outputs from Jupyter notebooks

Project description

nbstripout-fast

PyPI version PyPI DM Github Actions Status

A much faster version of nbstripout by writing it in rust (of course). This helps strip Jupyter Notebook output and metadata from notebooks. It is very useful as a git filter and is highly configurable.

Installation

pip install nbstripout-fast

Then replace nbstripout-fast with anywhere you use nbstripout.

Key differences

  1. While we mirrored most of nbstripout's API, we do not support every nbstripout option.
  2. There is no CLI option to install this in git for you
  3. We support repository level settings in a .git-nbconfig.yaml file. Check out our examples. On a high level, you can add a git filter in a sitewide/user level and then allow each project to enforce consistent settings.

Why Rust?

nbstripout is a excellent project, but the python startup and import time makes its usage at scale a bit painful. While this means giving up on using nbconvert under the hood and ensuring the notebook is the correct format, it does make things up to 200x faster. This matters when you have a large number of files and git filter is called sometimes more than once per file. Let's look at the data:

Cells nbstripout nbstripout_fast
1 0m0.266s 0m0.003s
10 0m0.258s 0m0.003s
100 0m0.280s 0m0.004s
1000 0m0.372s 0m0.013s
10000 0m1.649s 0m0.133s

The table above shows a large overhead per notebook (mostly python startup time). When you have 100 or more notebooks, nbstripout takes more than 40s while nbstripout-fast takes only 1s!

Example

This example illustrates how nbstripout-fast can be used to automatically clean Jupyter notebooks using Git filters (see e.g. Git Attributes). This keeps your repository clean by removing unnecessary output and clutter, while preserving your local working version. The benefits are minimised diffs and reduced repository size.

  1. Install nbstripout-fast as described above.

  2. Configure nbstripout-fast

    Create a .git-nbconfig.yaml file at the root of your repository to configure nbstripout-fast, e.g.

    nbstripout_fast:
      keep_count: false
      keep_output: false
      drop_empty_cells: true
      extra_keys: []
      keep_keys: []
    
  3. Set Git Attributes

    Create a .gitattributes file at the root of your repository if it doesn't yet exist and add this line:

    *.ipynb filter=jupyter
    

    This instructs Git to use a custom filter named "jupyter" on all .ipynb files.

  4. Configure the jupyter Filter

    Run these commands in your terminal to configure the "jupyter" filter:

    git config filter.jupyter.clean nbstripout-fast
    git config filter.jupyter.smudge cat
    
  • clean: This filter runs nbstripout-fast when adding notebooks to the version that is checked out, i.e. the clean version.
  • smudge: This filter runs cat when checking out notebooks, ensuring your local (smudged) version remains unmodified. Git filters transform files at the time of checkout and commit.
  1. Reapply Cleaning to Existing Notebooks (Optional)

    If you already have Jupyter notebooks tracked by Git, you can reapply the cleaning process to them:

    git add --renormalize . git commit -m "Cleaned Jupyter notebooks"
    

Stripping specific cell outputs

To strip cell outputs that match a regular expression, the --strip-regex option can be used in combination with --keep-output. For example, to remove cell outputs that only contain a notebook widget:

nbstripout-fast --keep-output --strip-regex "^Output\(\)$"

or to remove completed tqdm progress bars:

nbstripout-fast --keep-output --strip-regex "100%.*"

See the documentation for regex for information about supported regex syntax.

Developing

You can use cargo which will build + run the CLI:

cargo run -- -t examples/example.ipynb

You can also build with cargo and run the script with the full path:

cargo build # dev build - ./target/debug/nbstripout-fast
cargo build --release # release build - ./target/release/nbstripout-fast

Running unit tests: maturin builds this repo to include pyo3 bindings by default. This allows for us to have an extension python extension mode as well. As of today, we can't have a binary and an extension, so we use the extension only for testing (issue).

pip install -e .
maturin develop
# Should output, this way you can use RUST_LOG=debug
in-venv pytest -rP

Debugging

Use RUST_LOG=debug to debug script for example:

RUST_LOG=debug cargo run -- '--extra-keys "metadata.bar cell.baz" -t foo.ipynb'

Releasing

Manylinux, macos, and windows wheels and sdist are built by github workflows. Builds are triggered upon the creation of a pull request, creating a new release, or with a manual workflow dispatch. The wheels and sdist are only uploaded to PyPI when a new release is published. In order to create a new release:

  1. Create a commit updating the version in Cargo.toml and CHANGELOG.md, then create a git tag:
git tag vX.Y.Z
git push --tags
  1. Draft a new release in github; select the tag that you just created.
  2. Once the new release is created, the wheels and sdist will be built by a github workflow and then uploaded to PyPI automatically using the PYPI_API_TOKEN in the github secrets for the repository.

History

This plugin was contributed back to the community by the D. E. Shaw group.

D. E. Shaw Logo

License

This project is released under a BSD-3-Clause license.

We love contributions! Before you can contribute, please sign and submit this Contributor License Agreement (CLA). This CLA is in place to protect all users of this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nbstripout_fast-1.1.2.tar.gz (28.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

nbstripout_fast-1.1.2-py3-none-win_amd64.whl (1.1 MB view details)

Uploaded Python 3Windows x86-64

nbstripout_fast-1.1.2-py3-none-win32.whl (1.0 MB view details)

Uploaded Python 3Windows x86

nbstripout_fast-1.1.2-py3-none-musllinux_1_2_x86_64.whl (1.4 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

nbstripout_fast-1.1.2-py3-none-musllinux_1_2_aarch64.whl (1.3 MB view details)

Uploaded Python 3musllinux: musl 1.2+ ARM64

nbstripout_fast-1.1.2-py3-none-manylinux_2_28_x86_64.whl (1.4 MB view details)

Uploaded Python 3manylinux: glibc 2.28+ x86-64

nbstripout_fast-1.1.2-py3-none-manylinux_2_28_aarch64.whl (1.3 MB view details)

Uploaded Python 3manylinux: glibc 2.28+ ARM64

nbstripout_fast-1.1.2-py3-none-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file nbstripout_fast-1.1.2.tar.gz.

File metadata

  • Download URL: nbstripout_fast-1.1.2.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nbstripout_fast-1.1.2.tar.gz
Algorithm Hash digest
SHA256 5a5738b0b35211c2830ad3a432639f209473514564483cfcc9da55e7b87f8843
MD5 62aab0570767643079c7d3080be13ecf
BLAKE2b-256 a5e24f811cef01e9d407c26bd543c01d94247d4016128fda884ff22698c0edab

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.2.tar.gz:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.2-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for nbstripout_fast-1.1.2-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 e35b8cd67deb41613dcbe86f259157763de15b510274a5c7029f26574d6acd5f
MD5 f98ac2643e6ef1c5b13bcdac0755304a
BLAKE2b-256 0824db0352dba3eabf6cbedddd4260595329aed64d4264957beb16fb32b585a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.2-py3-none-win_amd64.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.2-py3-none-win32.whl.

File metadata

  • Download URL: nbstripout_fast-1.1.2-py3-none-win32.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nbstripout_fast-1.1.2-py3-none-win32.whl
Algorithm Hash digest
SHA256 6629ee7a568de23d1eb6beba8ebf93ee5b8abe809a696f095a6f724a6d1dce93
MD5 e7b6d264ed9e3e362db8b75f0466ea1b
BLAKE2b-256 9ca3e2915b31c16eeacdd0d8702e5f27845fd16c10f3542a35cc14152748cd47

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.2-py3-none-win32.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.2-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for nbstripout_fast-1.1.2-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 5cd63824e747a1d95b3c3b9ffaa0dc673883295ce24e35df5cc0de1dda5de939
MD5 50e034e9d779882ee4441726d5d382a1
BLAKE2b-256 27e35bf934adc82a4759f7343a6ed3da695166a508d56444b8be0df42b0981c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.2-py3-none-musllinux_1_2_x86_64.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.2-py3-none-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for nbstripout_fast-1.1.2-py3-none-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 e0efb96b48c01100e3f3f3dc7366b235ff1e4e7f1f26cdc54484f800fe87e6aa
MD5 be5a8317f3398f6796d5056819b9c508
BLAKE2b-256 9924d0f6d6bc9d009d48d2d2233ab7bdfaec31127f9aca9b60c195b3ee90a5d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.2-py3-none-musllinux_1_2_aarch64.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.2-py3-none-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for nbstripout_fast-1.1.2-py3-none-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 49f2e53a2a31007e6f3e9bd21bf88681a32f8957824d474ac308607c838757a0
MD5 28e8d78161a6f9830fe675d81d944b2c
BLAKE2b-256 4f104332c3b8a7e08690963e005ba2c31c8976d97e92bea8f26371b05ea3e1a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.2-py3-none-manylinux_2_28_x86_64.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.2-py3-none-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for nbstripout_fast-1.1.2-py3-none-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 25798de286d9455b1d4cd1e7e96e7d912ecea63e198ba27db0c7736c123f6707
MD5 dfd6c1eb08c38dc62988453e314e3dc4
BLAKE2b-256 8a68abf62ad4fff5b23d24870102e3b605bab6a57aa8c53c0b6cfbb687b436d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.2-py3-none-manylinux_2_28_aarch64.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.2-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nbstripout_fast-1.1.2-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f5d0314f02e3c3eafe871c9f3080c5890b020f7901c442ed672fc5c2b02be0e6
MD5 dc162dbab0be0588a773c8bd39be2a77
BLAKE2b-256 76299a4fc34feaeadaec7193deaf2321e789242a60cc209e32064da12a1b932e

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.2-py3-none-macosx_11_0_arm64.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page