Strips outputs from Jupyter notebooks
Project description
nbstripout-fast
A much faster version of nbstripout by writing it in rust (of course). This helps strip Jupyter Notebook output and metadata from notebooks. It is very useful as a git filter and is highly configurable.
Installation
pip install nbstripout-fast
Then replace nbstripout-fast with anywhere you use nbstripout.
Key differences
- While we mirrored most of nbstripout's API, we do not support every nbstripout option.
- There is no CLI option to install this in git for you
- We support repository level settings in a
.git-nbconfig.yaml
file. Check out ourexamples
. On a high level, you can add a git filter in a sitewide/user level and then allow each project to enforce consistent settings.
Why Rust?
nbstripout is a excellent project, but the python startup and import time makes its usage at scale a bit painful. While this means giving up on using nbconvert under the hood and ensuring the notebook is the correct format, it does make things up to 200x faster. This matters when you have a large number of files and git filter is called sometimes more than once per file. Let's look at the data:
Cells | nbstripout | nbstripout_fast |
---|---|---|
1 | 0m0.266s | 0m0.003s |
10 | 0m0.258s | 0m0.003s |
100 | 0m0.280s | 0m0.004s |
1000 | 0m0.372s | 0m0.013s |
10000 | 0m1.649s | 0m0.133s |
The table above shows a large overhead per notebook (mostly python startup time). When you have 100 or more notebooks, nbstripout takes more than 40s while nbstripout-fast takes only 1s!
Developing
You can use cargo which will build + run the CLI:
cargo run -- -t examples/example.ipynb
You can also build with cargo and run the script with the full path:
cargo build # dev build - ./target/debug/nbstripout-fast
cargo build --release # release build - ./target/release/nbstripout-fast
Running unit tests: maturin builds this repo to include pyo3 bindings by default. This allows for us to have an extension python extension mode as well. As of today, we can't have a binary and an extension, so we use the extension only for testing (issue).
pip install -e .
maturin develop
# Should output, this way you can use RUST_LOG=debug
in-venv pytest -rP
Debugging
Use RUST_LOG=debug to debug script for example:
RUST_LOG=debug cargo run -- '--extra-keys "metadata.bar cell.baz" -t foo.ipynb'
Releasing
Manylinux, macos, and windows wheels and sdist are built by github workflows. Builds are triggered upon the creation of a pull request, creating a new release, or with a manual workflow dispatch. The wheels and sdist are only uploaded to PyPI when a new release is published. In order to create a new release:
- Create a commit updating the version in
Cargo.toml
andCHANGELOG.md
, then create a git tag:
git tag vX.Y.Z
git push --tags
- Draft a new release in github; select the tag that you just created.
- Once the new release is created, the wheels and sdist will be built by a
github workflow and then uploaded to PyPI automatically using the
PYPI_API_TOKEN
in the github secrets for the repository.
History
This plugin was contributed back to the community by the D. E. Shaw group.
License
This project is released under a BSD-3-Clause license.
We love contributions! Before you can contribute, please sign and submit this Contributor License Agreement (CLA). This CLA is in place to protect all users of this project.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for nbstripout_fast-1.0.1-cp311-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88687062e4eab7d58d0c64be8a147fb5585d8113939de64adfddaec2ff7d0fa7 |
|
MD5 | d2308bb1ac84c39b3eb320f3f9f2cd7f |
|
BLAKE2b-256 | 08675bed19192d9c7783e0c6ec65946d09294f9b0c62c2c4a257cfc8ecbf8f4c |
Hashes for nbstripout_fast-1.0.1-cp311-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6355c083d329f1a9f198712cc1b71ce9a79e5f8b0780574894291db0b2c69858 |
|
MD5 | 5b9f863f4b1033308c87a99c09e27bd9 |
|
BLAKE2b-256 | 72300f991ad948e3b553f4a6cb0e2c75532d77a143ae68b0bb397163de430f26 |
Hashes for nbstripout_fast-1.0.1-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96f28c88a4298774746596b5cce8fad1b68bc5b107c47736076de3a042185cc6 |
|
MD5 | 986e2335f31e48e7156484de89ee64d4 |
|
BLAKE2b-256 | 4fd205b2f2274df57fcc086efab49b2db039d0d17eb808b8668d95f87f582323 |
Hashes for nbstripout_fast-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 972b103bc774988f82b1baeaae90301bbad807c3d6aaf070e0f19144a90fb11c |
|
MD5 | 44a6013d832886dd815ce8efd8ddd044 |
|
BLAKE2b-256 | d1d6c5cc2ac4ef9faf83358db77315e235a4376d1b6e2127cc1dba2a7219efc1 |
Hashes for nbstripout_fast-1.0.1-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76185a3df373bc1c2c409d0124f6cf9565dd5783da5c8e57674c8ccb4f0b0c63 |
|
MD5 | a9ee5779f0c31cee9cd5ab7b72f367e4 |
|
BLAKE2b-256 | 4687daf5df9342c44775484ee926f18d266aa8224289ab651fc2c634a10ae0e3 |
Hashes for nbstripout_fast-1.0.1-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e8d91fbdc64766cf00f48d741364c0ba2936e32d0a1dde739099154a79056c9 |
|
MD5 | a1307e23b4f5fd8a2e77c7ef50b85639 |
|
BLAKE2b-256 | 1bc09debd14995f9ef5e7eee33a203362760dc14d72674c5115ab4ac7a102f2c |
Hashes for nbstripout_fast-1.0.1-cp310-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62626b85fbce404228af86434c9d5ef3c053db991c16da5b2c896394826927a7 |
|
MD5 | 70299590c915a0387d94ce742bc9bd70 |
|
BLAKE2b-256 | 6444959ea98af836a6428003455114a7a2ea16863862fef06aeb6f3deb3f751f |
Hashes for nbstripout_fast-1.0.1-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aeee7b28f4129d8cdaf2cdf3949e4866b86c9871a4a23daa6bbc43b3cc0a9140 |
|
MD5 | 4ff5e6957c5a105d35d763f086a97114 |
|
BLAKE2b-256 | 86fccb6314d63579712df1502b84a0b2ba572f760e027f7ef89bebd991aebdc9 |
Hashes for nbstripout_fast-1.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53fcf3c29ec715bbd65a4d0247a446663f0d4e9eaac8773e19583d1f31270485 |
|
MD5 | d2b94b1240478398ba3064d8e4ea063d |
|
BLAKE2b-256 | 46c2c1c91625d855fe3aa861ffa793e533b3e763a05eab42c9def1b9b0cdcc3c |
Hashes for nbstripout_fast-1.0.1-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2bf35c1faf1187250809d606091ef0dd607eb28d4b5f05f50b37f91959e1a63e |
|
MD5 | 59e0f8db6dae9cda145d7cca66d360f8 |
|
BLAKE2b-256 | 6846384444c2df3f50ae866bd192716d4ab950414fa64b1153bc5bf76ffb8447 |
Hashes for nbstripout_fast-1.0.1-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 698c2fa9d6de17025a1e89b9c7e603288d084bff583b78bcd1919d967f64610c |
|
MD5 | ad3bca7cba1447853009355faa277e02 |
|
BLAKE2b-256 | afcb20feb638383cab5e738b2128a69fe2fd86047004ebaa99cbf9962c95a327 |
Hashes for nbstripout_fast-1.0.1-cp39-none-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73b7344c0cfd96739c09f45cb676cc25107012d1de95178303100412ff760398 |
|
MD5 | e3caf7242d1757b77e4740b05a5b46d2 |
|
BLAKE2b-256 | 264ca58567f64ba50132b7114d7e5295f4b2a177e95892d1347c8b3f89294eed |
Hashes for nbstripout_fast-1.0.1-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93790d5da4b6206f8d3333164351787add11462f69dfe6dd5def10c6125d8e18 |
|
MD5 | 059a23dfb6933ab855e38688c0760d80 |
|
BLAKE2b-256 | 5675533c2d94d8166856fc54d5ad330d4ac70b7442a5c8241c7b5e278326b317 |
Hashes for nbstripout_fast-1.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 483a7348d98936f30ea6596605560637b4ef2a4311fa57de35c965a9f7c8e870 |
|
MD5 | 46cd9a068b1dfa2fe7b00efe42cce2c7 |
|
BLAKE2b-256 | e99e898ba270e663a37fc9526dda4b2cb2e31152c213907fb89d99b845000148 |
Hashes for nbstripout_fast-1.0.1-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2959a1a93a1414ee20afeecd581430c6cf3412cf2f05fc78e532e9bba9c79b27 |
|
MD5 | e4209adf182ef3a93003019f3939bd9b |
|
BLAKE2b-256 | 5965ef23dfa63c0a716cfd0bddf89e58eda71264796ed35491c1cfbff7cc3493 |