A error notification system for remote Jupyter notebooks
Project description
remote-notebook-error-collection
This weekend project is an alpha release. It may be stay like the forever.
This aim to collect errors generated by other users using a notebook that was shared. Three classes presented here are successive steps in its construction.
In a Jupyter notebook, this is what is run:
!pip install notebook-error-reporter
from notebook_error_reporter import ErrorServer
es = ErrorServer(url='https://errors.matteoferla.com', notebook='test')
es.enable()
Then if an error is raised it gets logged (see lengthy privacy discussion below) and can be inspected:
es.retrieve_errors()
Install
pip install notebook-error-reporter
Examples in action
Aims
I have a few notebooks that I have shared on Twitter and I occasionally get an email telling if the repo they use is broken or there is a case that causes an error. Similar in concept to Sentry.io, I would like to know when error happen. Most users will not email about errors, so one sees the tip of an iceberg. This is because:
- it is something silly they did
- they worry it may be something silly they did
- they deem the code crap
Point 1 implies there is a problem with user experience: it could have been clearer. The user is never wrong: they have simply been misled.
Point 2 and 3 is an error that needs fixing. Point 2 in particular means that better error handling is needed. Point 3 Okay, the user is never wrong. However, instead of obfuscating the crapiness, one can document the issue.
I do not want any private or confidential data from the user or user given fields —someone's target protein might be confidential. The code therefore should not contain error codes raise someone's password or credit card number or mutation.
I only want to receive
- the error type
- the error message
- some traceback details (line number, function name and filename minus path)
- the notebook name
- the cell's first line
In a regular locally hosted notebook there is the issue that servers collect IP addresses, which point to a user's location. This is not quite GDPR data, but still. Not collecting IP addresses is a terrible idea as fail2ban etc. rely on IP addressed to block wannabe hackers.
In a colab notebook this is rather straightforward as the IP of the request is from the server running the kernel, not the browser (for that a javascript function is required to pass this info over).
Data not sent is:
- inputted values
- (majorly) content of a mounted Google Drive
Store
An alternative option is storing the error details error_details
.
from notebook_error_reporter import ErrorStore
es = ErrorStore()
es.enable()
es.error_details
Slack
The easiest way is getting slacked on error to a channel. A Slack webhook is easy to set up (just remember the subdomain to do so is api not app).
import os
os.environ['SLACK_WEBHOOK'] = "https://hooks.slack.com/services/XXXXXXXX"
from notebook_error_reporter import ErrorSlack
es = ErrorSlack(os.environ['SLACK_WEBHOOK'])
es.enable()
A regular cell does nothing. But one that is not successful will send a Slack message.
{"error_name": "ValueError",
"error_message": "foo",
"traceback": [{"filename": "foo.py",
"fun_name": "run_code",
"lineno": 666},
...
],
"first_line": "# cell that does foo",
"execution_count": 111}
The 'filename' is stripped of the dist-packages path,
because the dist-packages
path in colab may have a username that could have personal identifiable data.
If a Slack webhook is shared on GitHub, there are users that search GitHub for exposed webhooks and spam with adverts for their cybersecurity courses. Also a single prankster user could make it really annoying. Therefore, a server needs to be set up ideally to collect this...
Server
For myself I have set-up https://errors.matteoferla.com This is an Intel NUC acting as my homeserver connected to my router. It could be even a Raspberry Pi. (This is a weekend project so it's outside of the University's network but privacy & confidentiality is as valued!) If you like this project and want to replicate it or use this server, just drop me an email.
A FastAPI app to get the errors is also present. This needs to be set up on a hosting server exposed to the internet.
This has the largest risk of vandalism.
So the server host would run run_app.py
, which contains this code:
import uvicorn
from fastapi import FastAPI
from notebook_error_reporter.serverside import create_db, create_app
create_db()
app:FastAPI = create_app(debug=False, max_transparency=True, colab_only=False)
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info")
While a user activate logging on the notebook thusly:
from notebook_error_reporter import ErrorServer
es = ErrorServer(url='http://127.0.0.1:8000', notebook='mine')
es.enable()
On error a dictionary typehintinted as EventMessageType
is sent:
from notebook_error_reporter import EventMessageType
EventMessageType.__annotations__
{'execution_count': int,
'first_line': str,
'error_name': str,
'error_message': str,
'traceback': typing.List[notebook_error_reporter.error_event._traceback.TracebackDetailsType]}
and TracebackDetailsType.__annotations__
is:
{'filename': str, 'fun_name': str, 'lineno': int}
The server does keep track of IP addresses to prevent vandalism, but it's the IP address of the colab notebook. No JavaScript call is present to get the browser IP. (Annoyingly I'd love to do some JS calls to get some useful data, but best not obfuscate!) Therefore the IP will be in the range: 142.250.0.0 - 142.251.255.255.
To see the errors sent:
es.retrieve_errors()
I am unsure if to allow everyone to see the sessions and errors, hence the max_transparency
argument.
For an internal server, this makes sense, but for a public one, revealing the session ids may
result in vandals adding errors to sessions randomly.
Colab
Colab runs on an ancient version of IPython (5.5, cf. 8.2). As a result things are done a bit differently.
.enable
calls either load_ipython_extension
or monkeypatch_extension
depending on the ipython version.
The former adds an event callback function (shell.events.callbacks
), which is all proper and good.
The latter monkeypatches a decorating function around shell.showtraceback
, which knows about the
ErrorEvent/ErrorSlack/ErrorServer/ErrorStorage instance,
because it was created in a factory method of the latter. As it does not have a result object,
it does not know what is the excecution count nor the first line of the cell.
!pip install notebook-error-reporter
from notebook_error_reporter import ErrorServer
es = ErrorServer(url='https://errors.matteoferla.com', notebook='test')
es.enable()
# raise an error:
raise ValueError('Foo')
The latter error can be seen to have been sent successfully:
es.retrieve_errors()
However as I am not a Seattle/Arlington multinational hellbent on collecting data, I like to make it opt in:
#@markdown Send error messages to errors.matteoferla.com for logging?
#@markdown See [notebook-error-reporter repo for more](https://github.com/matteoferla/notebook-error-reporter)
report_errors = False #@param {type:"boolean"}
if report_errors:
!pip install notebook-error-reporter
from notebook_error_reporter import ErrorServer
es = ErrorServer(url='https://errors.matteoferla.com', notebook='fragmenstein')
es.enable()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file notebook-error-reporter-0.1.1.tar.gz
.
File metadata
- Download URL: notebook-error-reporter-0.1.1.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85c3b5964c5a0dca39cd028439fbdd2db31db8a5ba18b72047210b3a0c82e7ee |
|
MD5 | aede457a5470d2e0bc0fdc841f7567a6 |
|
BLAKE2b-256 | dc9b557ac629be623b5d8ef1b8238779844242062cc9f4d76e1a77e50104c77a |