Structured request logging in Python
Project description
Requestlog
A library for recording structured log data, with a focus on performance data.
Why this library?
Most logging libraries will emit lines of text, optionally enhanced with a bit of structured data. If multiple operations are happening in parallel like in a web server, the log lines from different requests will mix and it might be hard to find the information for any particular user request.
In contrast, in requestlog
you define the scope of a logging operation (typically
a scope corresponds to a server request), and all log entries are collected
together in a single dictionary object, so that all information pertaining to a
single server request is collected together.
requestlog
can store arbitrary keys and values (like path
, client_ip
and
status_code
), but is particularly optimized for storing performance-related
information: timers and counters. This helps answer questions like "For this
particular web request, how many calls were done to the database?", "How long
did those calls take?", and "How many records were returned?".
API
The easiest API to requestlog
is the global API:
import requestlog
# Begin and end a query log scope
requestlog.begin_global_log_record()
requestlog.end_global_log_record()
# You can also use it as a context object
with requestlog.begin_global_log_record():
# ... code to measure here ...
pass
To log values into the log record, call these functions:
import requestlog
with requestlog.begin_global_log_record():
# Log keys and values
requestlog.log_value(status_code='200', path='/user')
# Counters have the property that they add up. The code below
# will end up with 'records=7'.
requestlog.log_counter('records', 5)
requestlog.log_counter('records', 2)
# You can also use log_counters, which takes keyword arguments
requestlog.log_counters(records=5)
requestlog.log_counters(records=2)
# Timers are context objects. They will log the following keys:
# '{name}_cnt' and '{name}_ms', which are the number of times
# a timer with this name has been logged, and the total time spent
# in all of them, respectively.
with requestlog.log_time('database'):
time.sleep(1)
Decorator
The library contains a @timed
decorator that automatically logs a timer for
every invocation of the function. For example:
import requestlog
import time
@requestlog.timed
def expensive_function():
# Do something that might take a long time
time.sleep(1)
# In your regular code, just call the function
with requestlog.begin_global_log_record():
expensive_function()
This will log a timer for every invocation of that function, i.e. log
expensive_function_cnt
and expensive_function_ms
to account for the time
spent in expensive_function
.
By default @timed
will use the function name to log under. You can rename
how the function invocation is logged using @timed_as
.
# Rename how this timer is logged
@requestlog.timed_as('different_name')
def other_function_that_does_complex_things():
time.sleep(1)
Automatically logged data
requestlog
logs a bunch of fields into every record by default. These are:
Field | Contents |
---|---|
start_time |
ISO timestamp when the log record was opened. |
end_time |
ISO timestamp when the log record was finished; this will correspond to the aggregation bucket. |
pid |
PID of the current process |
loadavg |
(UNIX only) Load average when the record was opened |
user_ms |
(UNIX only) User time spent on the current thread during this operation |
sys_ms |
(UNIX only) System time spent on the current thread during this operation |
max_rss |
(UNIX only) Maximum memory used by the Python process |
inc_max_rss |
(UNIX only) Increase in maximum memory during this operation (use this to statistically find memory-hungry operations) |
fault |
0 or 1 indicating whether an exception occurred in the block |
error_message |
(On error) Error message of the exception |
error_class |
(On error) Full class name of the exception |
dyno |
(On Heroku) The identifier of the current dyno |
Example: Integration with Flask
Here is an example of how to integrate this with Flask, so that a log record is produced for every Flask request.
[!INFO] Since
requestlog
uses thread-local storage by default, this assumes you are using threaded mode or running in a process-per-server model likegunicorn
. If you are using greenlets, callrequestlog.globals.set_context_object()
to store the global log record elsewhere.
Add the following to your main Flask application:
# app.py
import requestlog
# Initialize logging, load any emergency records
requestlog.initialize(sink=...)
@app.before_request
def before_request_begin_logging():
path = (str(request.path) + '?' + request.query_string.decode('utf-8')
) if request.query_string else str(request.path)
requestlog.begin_global_log_record(
path=path,
method=request.method,
remote_ip=request.headers.get('X-Forwarded-For', request.remote_addr),
user_agent=request.headers.get('User-Agent'))
@app.after_request
def after_request_log_status(response):
requestlog.log_value(http_code=response.status_code)
return response
@app.teardown_request
def teardown_request_finish_logging(exc):
requestlog.finish_global_log_record(exc)
If you are using gunicorn
, add the following to gunicorn.conf.py
to
make sure that on server shutdown, any untransmitted records are saved
to disk. They will be automatically loaded and re-enqueued when
requestlog.initialize()
is called after the server is restarted.
# gunicorn.conf.py
def worker_exit(server, worker):
import requestlog
requestlog.emergency_shutdown()
Sinks
So far we talked about producing records, but where do those records go? By
default, they will be printed to stderr
along with a warning that you should
configure a sink.
A sink is a callable object, like a function or a class that implements the
__call__
method, that will receive a number of records and do something with
those, like sending them off to a logging server or cloud storage.
The signature of the sink function should be:
def my_sink(timestamp: int, records: list[dict]):
# Implementation here...
pass
You configure the sink by calling requestlog.initialize()
:
import requestlog
requestlog.initialize(
batch_window_s=300,
sink=my_sink)
The batch_window_s
parameter controls how often the sink function is invoked:
a background thread calls the sink function periodically, with all the records
accumulated during that period. The timestamp
parameter indicates the
timestamp that marks the end of that particular window. If batch_window_s == 0
, the sink function will be invoked synchronously whenever any log record is
written.
Example: Saving logs to S3
Here is an example sink that saves logs to S3 in JSON Lines format using boto3
:
import datetime
import threading
import boto3
# The 'boto3.client' method is not thread safe: https://github.com/boto/boto3/issues/1592
BOTO3_LOCK = threading.Lock()
def s3_sink(timestamp, records):
"""Transmit logfiles to S3 with default config."""
# No need to configure credentials, we've already confirmed they are in the environment.
with BOTO3_LOCK:
s3 = boto3.client("s3", region_name='us-east-1')
isoformat = datetime.datetime.utcfromtimestamp(timestamp).isoformat() + 'Z'
# Save every day in a different folder:
# 's3://my-bucket/my-logs/2020-01-01/13:15:00Z.jsonl'
bucket_name = 'my-bucket'
key = (
'/my-logs/'
+ isoformat.replace("T", "/")
+ '.jsonl'
)
# Store as json-lines format
body = '\n'.join(json.dumps(r) for r in records)
s3.put_object(
Bucket=bucket_name,
Key=key,
StorageClass="STANDARD_IA",
Body=body
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file requestlog-0.1.1.tar.gz
.
File metadata
- Download URL: requestlog-0.1.1.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.12 Linux/6.2.0-1015-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e78823501f6fd4eb54b06cc2ca7ebf824b91bfd26c5ec8239f95d771c9008281 |
|
MD5 | fd8cbce699565c13ab5cdf82030790c8 |
|
BLAKE2b-256 | ee108b8ad3ad5d56d86651e8658e2b76a2a8a38257b132c7e91a2c66024422a7 |
File details
Details for the file requestlog-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: requestlog-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.12 Linux/6.2.0-1015-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f1cedccdd9ae13ba4be5762cf09d6aff56ee019fc105809953c34735b6c158b |
|
MD5 | da718186cc469f0850c9b51337030d39 |
|
BLAKE2b-256 | e95a453b728023ed985c1990582c1b275ddc0b7c412a5f562fbe0f3c5c000436 |