Skip to main content

Dump files to Jupyter notebook.

Project description

nbdump

Dump files to Jupyter notebook. Restore by running the notebook. Add optional extra commands to run.

Installation

# user
pip install -U nbdump

# development
pip install -e .
pip install tests/requirements.txt
pytest

Usage

In this demo, we will use src_example/ as a fake repo that you want to import to notebook.

CLI

# see help
nbdump -h

# basic usage, this will dump entire `src_example/` to `nb1.ipynb`
nbdump src_example -o nb1.ipynb

# use shell expansion, this will come in handy later
nbdump src_example/**/*.py -o nb2.ipynb

# handle multiple files/dirs, will be deduplicated
nbdump src_example src_example/main.py -o nb3.ipynb

# append extra code cell, e.g. running the `src_example/main.py`
nbdump src_example -c '%run src_example/main.py' -o nb4.ipynb

# extra cells can be more than one
nbdump src_example \
    -c '%run src_example/main.py' \
    -c '!git status' \
    -o nb5.ipynb

# use fd to skip ignored files and hidden files
nbdump $(fd -t f . src_example) -o nb6.ipynb

# clone metadata from another notebook
nbdump src_example/**/*.py -o nb7.ipynb -m tests/kaggle/modified/modified-notebook.ipynb

There is a catch, nbdump will not respect gitignore because the core functionality is just converting a bunch of files to notebook cells. This means, by using the first example on nb1.ipynb, nbdump will try to convert all files recursively, regardless of file format. The problem arises when src_example/ contains binary files such as pictures or even __pycache__/*.

Then shell expansion can be used to only select relevant files, such as the example on nb2.ipynb (make sure to enable globstar in bash to use **). Another solution is to use other tools like fd to list the files while respecting gitignore and skipping hidden files automatically.

Library

from pathlib import Path
import nbdump


target_files = list(Path("src_example").rglob("*.py"))
codes = ["!ls -lah", "!git log --oneline", "%run src_example/main.py"]
metadata_notebook = "tests/kaggle/modified/modified-notebook.ipynb"

# save to disk
with open("nb8.ipynb", "w") as f:
    nbdump.dump(f, target_files, codes, metadata_notebook)

# save as string
ipynb = nbdump.dumps(target_files, codes, metadata_notebook)
print(ipynb[:50])

Why?

Kaggle kernel with code competition type with disabled internet cannot use git clone inside the notebook. nbdump allows one to work in a standard environment but the final result can be exported to a single notebook, while still preserving the filesystem tree.

This is different than just zipping and unzipping because by using %%writefile, you can see and edit the file inside, even after the notebook creation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nbdump-0.0.3.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

nbdump-0.0.3-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file nbdump-0.0.3.tar.gz.

File metadata

  • Download URL: nbdump-0.0.3.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for nbdump-0.0.3.tar.gz
Algorithm Hash digest
SHA256 29fe4fb6ea0038490cef09f62a1f3251b8c95d791779efb9e73ecb36cd68f911
MD5 f53927e03af1cd7c61819e4253b6fc8b
BLAKE2b-256 26744edc6a6de5447235facad758034acbb54a07f88bcd8de61ad628458a9113

See more details on using hashes here.

File details

Details for the file nbdump-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: nbdump-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for nbdump-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a6be5e11d68621d4971946acb236dc3759428ef74b67afba415f472ac98ee938
MD5 482dc4dd5e9fc40eed79cbe45fbf803c
BLAKE2b-256 be07a1fad87fed92da3d4a794001909cbe8bea8c2fd2dad2c98587bf1dd94353

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page