Skip to main content

A error notification system for remote Jupyter notebooks

Project description

remote-notebook-error-collection

This weekend project is an alpha release. It may be stay like the forever.

This aim to collect errors generated by other users using a notebook that was shared. Three classes presented here are successive steps in its construction.

In a Jupyter notebook, this is what is run:

!pip install notebook-error-reporter
from notebook_error_reporter import ErrorServer

es = ErrorServer(url='https://errors.matteoferla.com', notebook='test')
es.enable()

Then if an error is raised it gets logged (see lengthy privacy discussion below) and can be inspected:

es.retrieve_errors()

https img shields io pypi v notebook error reporter logo python https img shields io pypi pyversions notebook error reporter logo python https img shields io pypi wheel notebook error reporter logo python https img shields io pypi format notebook error reporter logo python https img shields io pypi status notebook error reporter logo python

https img shields io codeclimate maintainability matteoferla notebook error reporter logo codeclimate https img shields io codeclimate issues matteoferla notebook error reporter logo codeclimate https img shields io codeclimate tech debt matteoferla notebook error reporter logo codeclimate

https img shields io github forks matteoferla notebook error reporter label Fork style social logo github https img shields io github stars matteoferla notebook error reporter style social logo github https img shields io github watchers matteoferla notebook error reporter label Watch style social logo github https img shields io github last commit matteoferla notebook error reporter logo github https img shields io github license matteoferla notebook error reporter logo github https img shields io github commit activity m matteoferla notebook error reporter logo github https img shields io github issues matteoferla notebook error reporter logo github https img shields io github issues closed matteoferla notebook error reporter logo github

Install

pip install notebook-error-reporter

Examples in action

Fragmenstein boilerplate PyRosetta colabfold dimer PyRosetta ligand migration  PyRosetta Add missing loops by cannibilising AlphaFold2

Aims

I have a few notebooks that I have shared on Twitter and I occasionally get an email telling if the repo they use is broken or there is a case that causes an error. Similar in concept to Sentry.io, I would like to know when error happen. Most users will not email about errors, so one sees the tip of an iceberg. This is because:

  1. it is something silly they did
  2. they worry it may be something silly they did
  3. they deem the code crap

Point 1 implies there is a problem with user experience: it could have been clearer. The user is never wrong: they have simply been misled.

Point 2 and 3 is an error that needs fixing. Point 2 in particular means that better error handling is needed. Point 3 Okay, the user is never wrong. However, instead of obfuscating the crapiness, one can document the issue.

I do not want any private or confidential data from the user or user given fields —someone's target protein might be confidential. The code therefore should not contain error codes raise someone's password or credit card number or mutation.

I only want to receive

  • the error type
  • the error message
  • some traceback details (line number, function name and filename minus path)
  • the notebook name
  • the cell's first line

In a regular locally hosted notebook there is the issue that servers collect IP addresses, which point to a user's location. This is not quite GDPR data, but still. Not collecting IP addresses is a terrible idea as fail2ban etc. rely on IP addressed to block wannabe hackers.

In a colab notebook this is rather straightforward as the IP of the request is from the server running the kernel, not the browser (for that a javascript function is required to pass this info over).

Data not sent is:

  • inputted values
  • (majorly) content of a mounted Google Drive

Store

An alternative option is storing the error details error_details.

from notebook_error_reporter import ErrorStore
es = ErrorStore()
es.enable()
es.error_details

Slack

The easiest way is getting slacked on error to a channel. A Slack webhook is easy to set up (just remember the subdomain to do so is api not app).

import os
os.environ['SLACK_WEBHOOK'] = "https://hooks.slack.com/services/XXXXXXXX"

from notebook_error_reporter import ErrorSlack
es = ErrorSlack(os.environ['SLACK_WEBHOOK'])
es.enable()

A regular cell does nothing. But one that is not successful will send a Slack message.

{"error_name": "ValueError", 
 "error_message": "foo", 
 "traceback": [{"filename": "foo.py",
                "fun_name": "run_code", 
                "lineno": 666}, 
                ...
               ], 
 "first_line": "# cell that does foo",
 "execution_count": 111}

The 'filename' is stripped of the dist-packages path, because the dist-packages path in colab may have a username that could have personal identifiable data.

If a Slack webhook is shared on GitHub, there are users that search GitHub for exposed webhooks and spam with adverts for their cybersecurity courses. Also a single prankster user could make it really annoying. Therefore, a server needs to be set up ideally to collect this...

Server

For myself I have set-up https://errors.matteoferla.com This is an Intel NUC acting as my homeserver connected to my router. It could be even a Raspberry Pi. (This is a weekend project so it's outside of the University's network but privacy & confidentiality is as valued!) If you like this project and want to replicate it or use this server, just drop me an email.

A FastAPI app to get the errors is also present. This needs to be set up on a hosting server exposed to the internet.

This has the largest risk of vandalism.

So the server host would run run_app.py, which contains this code:

import uvicorn
from fastapi import FastAPI
from notebook_error_reporter.serverside import create_db, create_app

create_db()
app:FastAPI = create_app(debug=False, max_transparency=True, colab_only=False)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info")

While a user activate logging on the notebook thusly:

from notebook_error_reporter import ErrorServer

es = ErrorServer(url='http://127.0.0.1:8000', notebook='mine')
es.enable()

On error a dictionary typehintinted as EventMessageType is sent:

from notebook_error_reporter import EventMessageType

EventMessageType.__annotations__
{'execution_count': int,
 'first_line': str,
 'error_name': str,
 'error_message': str,
 'traceback': typing.List[notebook_error_reporter.error_event._traceback.TracebackDetailsType]}

and TracebackDetailsType.__annotations__ is:

{'filename': str, 'fun_name': str, 'lineno': int}

The server does keep track of IP addresses to prevent vandalism, but it's the IP address of the colab notebook. No JavaScript call is present to get the browser IP. (Annoyingly I'd love to do some JS calls to get some useful data, but best not obfuscate!) Therefore the IP will be in the range: 142.250.0.0 - 142.251.255.255.

To see the errors sent:

es.retrieve_errors()

I am unsure if to allow everyone to see the sessions and errors, hence the max_transparency argument. For an internal server, this makes sense, but for a public one, revealing the session ids may result in vandals adding errors to sessions randomly.

Colab

Colab runs on an ancient version of IPython (5.5, cf. 8.2). As a result things are done a bit differently.

.enable calls either load_ipython_extension or monkeypatch_extension depending on the ipython version. The former adds an event callback function (shell.events.callbacks), which is all proper and good. The latter monkeypatches a decorating function around shell.showtraceback, which knows about the ErrorEvent/ErrorSlack/ErrorServer/ErrorStorage instance, because it was created in a factory method of the latter. As it does not have a result object, it does not know what is the excecution count nor the first line of the cell.

!pip install notebook-error-reporter
from notebook_error_reporter import ErrorServer

es = ErrorServer(url='https://errors.matteoferla.com', notebook='test')
es.enable()
# raise an error:
raise ValueError('Foo')

The latter error can be seen to have been sent successfully:

es.retrieve_errors()

However as I am not a Seattle/Arlington multinational hellbent on collecting data, I like to make it opt in:

#@markdown Send error messages to errors.matteoferla.com for logging?
#@markdown See [notebook-error-reporter repo for more](https://github.com/matteoferla/notebook-error-reporter)
report_errors = False #@param {type:"boolean"}
if report_errors:
    !pip install notebook-error-reporter
    from notebook_error_reporter import ErrorServer

    es = ErrorServer(url='https://errors.matteoferla.com', notebook='fragmenstein')
    es.enable()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notebook-error-reporter-0.1.1.tar.gz (16.9 kB view details)

Uploaded Source

File details

Details for the file notebook-error-reporter-0.1.1.tar.gz.

File metadata

File hashes

Hashes for notebook-error-reporter-0.1.1.tar.gz
Algorithm Hash digest
SHA256 85c3b5964c5a0dca39cd028439fbdd2db31db8a5ba18b72047210b3a0c82e7ee
MD5 aede457a5470d2e0bc0fdc841f7567a6
BLAKE2b-256 dc9b557ac629be623b5d8ef1b8238779844242062cc9f4d76e1a77e50104c77a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page