Skip to main content

Strips outputs from Jupyter notebooks

Project description

nbstripout-fast

PyPI version PyPI DM Github Actions Status

A much faster version of nbstripout by writing it in rust (of course). This helps strip Jupyter Notebook output and metadata from notebooks. It is very useful as a git filter and is highly configurable.

Installation

pip install nbstripout-fast

Then replace nbstripout-fast with anywhere you use nbstripout.

Key differences

  1. While we mirrored most of nbstripout's API, we do not support every nbstripout option.
  2. There is no CLI option to install this in git for you
  3. We support repository level settings in a .git-nbconfig.yaml file. Check out our examples. On a high level, you can add a git filter in a sitewide/user level and then allow each project to enforce consistent settings.

Why Rust?

nbstripout is a excellent project, but the python startup and import time makes its usage at scale a bit painful. While this means giving up on using nbconvert under the hood and ensuring the notebook is the correct format, it does make things up to 200x faster. This matters when you have a large number of files and git filter is called sometimes more than once per file. Let's look at the data:

Cells nbstripout nbstripout_fast
1 0m0.266s 0m0.003s
10 0m0.258s 0m0.003s
100 0m0.280s 0m0.004s
1000 0m0.372s 0m0.013s
10000 0m1.649s 0m0.133s

The table above shows a large overhead per notebook (mostly python startup time). When you have 100 or more notebooks, nbstripout takes more than 40s while nbstripout-fast takes only 1s!

Example

This example illustrates how nbstripout-fast can be used to automatically clean Jupyter notebooks using Git filters (see e.g. Git Attributes). This keeps your repository clean by removing unnecessary output and clutter, while preserving your local working version. The benefits are minimised diffs and reduced repository size.

  1. Install nbstripout-fast as described above.

  2. Configure nbstripout-fast

    Create a .git-nbconfig.yaml file at the root of your repository to configure nbstripout-fast, e.g.

    nbstripout_fast:
      keep_count: false
      keep_output: false
      drop_empty_cells: true
      extra_keys: []
      keep_keys: []
    
  3. Set Git Attributes

    Create a .gitattributes file at the root of your repository if it doesn't yet exist and add this line:

    *.ipynb filter=jupyter
    

    This instructs Git to use a custom filter named "jupyter" on all .ipynb files.

  4. Configure the jupyter Filter

    Run these commands in your terminal to configure the "jupyter" filter:

    git config filter.jupyter.clean nbstripout-fast
    git config filter.jupyter.smudge cat
    
  • clean: This filter runs nbstripout-fast when adding notebooks to the version that is checked out, i.e. the clean version.
  • smudge: This filter runs cat when checking out notebooks, ensuring your local (smudged) version remains unmodified. Git filters transform files at the time of checkout and commit.
  1. Reapply Cleaning to Existing Notebooks (Optional)

    If you already have Jupyter notebooks tracked by Git, you can reapply the cleaning process to them:

    git add --renormalize . git commit -m "Cleaned Jupyter notebooks"
    

Stripping specific cell outputs

To strip cell outputs that match a regular expression, the --strip-regex option can be used in combination with --keep-output. For example, to remove cell outputs that only contain a notebook widget:

nbstripout-fast --keep-output --strip-regex "^Output\(\)$"

or to remove completed tqdm progress bars:

nbstripout-fast --keep-output --strip-regex "100%.*"

See the documentation for regex for information about supported regex syntax.

Developing

You can use cargo which will build + run the CLI:

cargo run -- -t examples/example.ipynb

You can also build with cargo and run the script with the full path:

cargo build # dev build - ./target/debug/nbstripout-fast
cargo build --release # release build - ./target/release/nbstripout-fast

Running unit tests: maturin builds this repo to include pyo3 bindings by default. This allows for us to have an extension python extension mode as well. As of today, we can't have a binary and an extension, so we use the extension only for testing (issue).

pip install -e .
maturin develop
# Should output, this way you can use RUST_LOG=debug
in-venv pytest -rP

Debugging

Use RUST_LOG=debug to debug script for example:

RUST_LOG=debug cargo run -- '--extra-keys "metadata.bar cell.baz" -t foo.ipynb'

Releasing

Manylinux, macos, and windows wheels and sdist are built by github workflows. Builds are triggered upon the creation of a pull request, creating a new release, or with a manual workflow dispatch. The wheels and sdist are only uploaded to PyPI when a new release is published. In order to create a new release:

  1. Create a commit updating the version in Cargo.toml and CHANGELOG.md, then create a git tag:
git tag vX.Y.Z
git push --tags
  1. Draft a new release in github; select the tag that you just created.
  2. Once the new release is created, the wheels and sdist will be built by a github workflow and then uploaded to PyPI automatically using the PYPI_API_TOKEN in the github secrets for the repository.

History

This plugin was contributed back to the community by the D. E. Shaw group.

D. E. Shaw Logo

License

This project is released under a BSD-3-Clause license.

We love contributions! Before you can contribute, please sign and submit this Contributor License Agreement (CLA). This CLA is in place to protect all users of this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nbstripout_fast-1.1.0.tar.gz (28.7 kB view details)

Uploaded Source

Built Distributions

nbstripout_fast-1.1.0-py3-none-win_amd64.whl (1.1 MB view details)

Uploaded Python 3Windows x86-64

nbstripout_fast-1.1.0-py3-none-win32.whl (1.0 MB view details)

Uploaded Python 3Windows x86

nbstripout_fast-1.1.0-py3-none-musllinux_1_2_x86_64.whl (1.4 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

nbstripout_fast-1.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

nbstripout_fast-1.1.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl (1.4 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ i686

nbstripout_fast-1.1.0-py3-none-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file nbstripout_fast-1.1.0.tar.gz.

File metadata

  • Download URL: nbstripout_fast-1.1.0.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for nbstripout_fast-1.1.0.tar.gz
Algorithm Hash digest
SHA256 29c98410d4a791944e1a23fa09dd92d0557c19f6ffc4633cdeb5e8f1f36ed575
MD5 0e9c6c10e700429973bacaf42661cb16
BLAKE2b-256 a0b5518fdee3170f25cb400cfd97ae79611bed547b55efe7384c5d0fe3bc65b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.0.tar.gz:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.0-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for nbstripout_fast-1.1.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 52b18a019e040471d25c16c73e9824e313886fe079f15fe724852a5e34a5f725
MD5 aae9ddb049db9892f5f511db8843dfa0
BLAKE2b-256 4509e220bf29b76dddcf538ad0a4d8024a5e83c979d7f8fb59435548a1b8ea54

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.0-py3-none-win_amd64.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.0-py3-none-win32.whl.

File metadata

  • Download URL: nbstripout_fast-1.1.0-py3-none-win32.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: Python 3, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for nbstripout_fast-1.1.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 ecaf7a5dab9754b630efffea4b970c74c77726eb9b58f1dbba1776cef89b3c35
MD5 e3db318d8ff1755f6408528b5aa6ea32
BLAKE2b-256 0465feffb1cb9f2c5352b4bbd76f64c5784aa9796f19cb69b5efb7fe3f0b9caa

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.0-py3-none-win32.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.0-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for nbstripout_fast-1.1.0-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 459fbd03b04dea06ca1ba0b28904ee72de17d40363f0a6b4992fd38b9bbb6814
MD5 072f2fcebb12e0cf0dd1a7b0a4e1f430
BLAKE2b-256 f2a461119f0229acce4af60d4205cdd7407c2284b814427d4322427fd7a22b8e

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.0-py3-none-musllinux_1_2_x86_64.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for nbstripout_fast-1.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6a4846d585803fcf83751632348f5bd22a9a58c20a2f2bb18bbcb85ba584021c
MD5 71d5389170977af6e3487e5de63d34c3
BLAKE2b-256 b0eee9cdaff10e2bebd368f23115b91f474e4f39f49943affcdbe34a0e9603c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for nbstripout_fast-1.1.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 f2237ee10c6ac79c7e33962a341133cfbb1bdc8d9e842a3deb91e36c56c015f0
MD5 38e69ae17d11834b6466eec0979ee1fe
BLAKE2b-256 90689b70500439fe628e1093785700353b1d6c57fec6b9db4fa83f8b0dac885d

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nbstripout_fast-1.1.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for nbstripout_fast-1.1.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 676ddeed5ccb4d2c5313f23fc82e041925407005117bd4767497a60a849ea809
MD5 ff901ebf52ec03385d0c15898aff43f8
BLAKE2b-256 eaa88c0ad8e2a400572a4e394d4e1ce82ba7210629dabb58b58518d523ebb3e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for nbstripout_fast-1.1.0-py3-none-macosx_11_0_arm64.whl:

Publisher: wheels.yml on deshaw/nbstripout-fast

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page