Skip to main content

A collection of tools for using inside Jupyter Notebooks

Project description

NBTools

Collection of tools for monitoring running Jupyter Notebooks and interacting with them.

The installation should be as easy as:

pip install py-nbtools

NBstat

The main tool of this package is nbstat / nbwatch command line utility. It is added at installation and shows the detailed resource utilization for each process of each running Jupyter Notebook. A gif is worth a thousand words:

While in the watch mode, you can hit buttons to modify the displayed view:

  • tab — swaps views, from nbwatch to devicewatch and back.
  • b — toggles bar representation for some of the resources: in addition to its value, show colored bar.
  • m — toggles moving average column for some of the resources: values are averaged across the latest iterations.
  • s — toggles table separators.

We also add the devicestat and devicewatch commands that show transposed view with the same information and parameters.

For more information, check out the full user documentation: explanation of different table views, command line options and ready-to-use snippets.

Troubleshooting: PID namespaces, user permissions and zombie processes

A known problem of NVIDIA drivers is that nvidia-smi reports PIDs of processes on devices in the global namespace, not in the container namespace, which does not allow to match PIDs of container processes to their device PIDs. There are a few workarounds:

  • pass --pid=host flag to docker run.
  • patch NVIDIA driver to handle PID namespaces correctly.
  • [Linux only] fallback on manually inspecting /proc/PID/ files to identify the host PID for each process inside of the container.

While nbstat provides several fallbacks for Linux containers (and intend to provide support for more environments over time), the bullet-proof way is to use --pid=host option for docker run. Adding it resolves most of the issues immediately.

One more thing that sometimes happens to NVIDIA devices is zombie processes: by incorrectly terminating a GPU-using process you can end up in a situation where device memory is held by not-existing process. As far as I know, there are no ways of killing them without rebooting, and nbstat just marks such processes with red color.

In order to inspect certain properties of processes, we rely on having all necessary permissions already provided at command run. nbstat has some fallbacks for some attributes, and I currently work on improving error handling in cases of denied access to files.

Contribute

If you are interested to contribute, check out the developer/contributor page. It contains detailed description about inner workings of the library, my design choices and motivation behind them, as well as discussion of complexities along the way.

pylint_notebook

Shamelessly taken from pylint page:

Function that checks for errors in Jupyter Notebooks with Python code, tries to enforce a coding standard and looks for code smells. It can also look for certain type errors, it can recommend suggestions about how particular blocks can be refactored and can offer you details about the code's complexity.

Using it as easy as:

from nbtools import pylint_notebook
pylint_notebook(path_to_ipynb,             # If not provided, use path to the current notebook
                disable='invalid-name',    # Disable specified Pylint checks. Can be a list.
                enable='import-error')     # Enable  specified Pylint checks. Can be a list.

Under the hood, it converts .ipynb notebook to .py script, creates a custom .pylintrc configuration, runs the pylint and removes all temporary files. Learn more about its usage in the tutorial.

run_notebook

Provides a eval-like interface for running Jupyter Notebooks programmatically. We use it for running interactive tests, that are easy to work with: in case of any failures, one can jump right into fixing it with an already set-up environment.

from nbtools import run_notebook
run_notebook(path_to_ipynb,                       # Which notebook to run
             out_path_ipynb,                      # Where to save result
             inputs={'learning_rate': 0.05,},     # Pass variables to notebook
             outputs=['accuracy'])                # Extract variables from notebook

set_gpus

Select free device(s) and set CUDA_VISIBLE_DEVICES environment variable so that the current process sees only them.

Eliminates an enormous amount of bugs and unexpected behaviors related to GPU usage.

from nbtools import set_gpus
set_gpus(n=2,                              # Number of devices to set.
         min_free_memory=0.7,              # Minimum amount of free memory on device to consider it free.
         max_processes=3)                  # Maximum amount of  processes  on device to consider it free.

Other functions

from nbtools import (in_notebook,          # Return True if executed inside of Jupyter Notebook
                     get_notebook_path,    # If executed in Jupyter Notebook, return its absolute path
                     get_notebook_name,    # If executed in Jupyter Notebook, return its name
                     notebook_to_script)   # Convert Jupyter Notebook to an executable Python script.
                                           # Works well with magics and command line executions.

Goals

This library started as a container of tools, that I came across / developed in my years as an ML researcher. As some of the functions survived multiple refactoring iterations, I decided to share the library so it is easier to perfect them and test in different environments.

Another goal of the project is to show how to communicate with Jupyter API on real world examples: instead of reading through a number of stackoverflow threads, you can find the same information collected in one place and get a rough understanding of what is possible with it and what is not.

Acknowledgements

The nbstat module builds on gpustat package. Using the gpustat for years gave me an idea about possible improvements, which are implemented in this library. While the implementation is different, reading through the code of gpustat was essential for development.

Animated GIFs are created by using Terminalizer: aside from the usual problems with installation, the tool itself is amazing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-nbtools-0.9.12.tar.gz (46.2 kB view hashes)

Uploaded Source

Built Distribution

py_nbtools-0.9.12-py3-none-any.whl (49.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page