Structured logging to JSONL with support for asynchronous logging via asyncio

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Framework
- AsyncIO
Intended Audience
- Developers
- System Administrators
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
Typing
- Typed

Project description

grammlog

GrammAcc's structured logging.

Full Documentation

Pure python package that provides structured logging to JSONL with no dependencies.

Supports asynchronous logging via asyncio. See: Asynchronous Logging

Provides wrappers around the standard debug, info, warning, error, and critical functions.

Each function accepts a Logger as its first argument, so you can provide a custom logger with your own handlers to write structured logs to any target.

This package also provides a make_logger convenience function that creates and configures a file-based logger with rotating log files and a size limit.

Installation

pip install grammlog

Basic Usage

>>> import os
>>> import grammlog
>>> logger = grammlog.make_logger("app", log_dir="logs", log_level=grammlog.Level.INFO)
>>> grammlog.info(logger, "application initialized")
>>> grammlog.info(
...     logger,
...     "env vars set",
...     {"env": os.environ},
... )
>>> grammlog.set_env(grammlog_dir="logs", default_grammlog_level=grammlog.Level.INFO)
>>> auth_log = grammlog.make_logger("auth")  # Use the log_dir and log_level from the env.
>>> try:
...     user_id = 42
...     get_user_if_logged_in(user_id)
... except NameError as e:
...     grammlog.error(
...         logger,
...         "User was not logged in",
...         {"user_id": user_id},
...         err=e,
...     )
>>> db_logger = grammlog.make_logger(
...     "db_queries", log_dir="logs/db", log_level=grammlog.Level.ERROR
... )
>>> try:
...     user_name = "invalid"
...     db.query(table="users", user_name=user_name)
... except NameError as e:
...     grammlog.error(
...         db_logger,
...         "Unknown error in db query",
...         {"queried_table": "users", "user_name": user_name},
...         err=e,
...     )

Structured Data

The logging functions all take an arbitrary logger as their first argument, which allows them to output structured logs to any handlers supported by the stdlib's logging module. But they also accept a required string msg, and an optional details dictionary as well as an optional err Exception.

These arguments will be merged together into a JSON-formatted object that will be serialized and written to a single line of JSONL logged output using the provided logger.

In addition to the data provided in the function arguments, the logged object will also include the following keys:

level: The logging level of the message. E.g. 'DEBUG', 'ERROR'.
timestamp: The timestamp of the logging event in UTC as given by datetime.datetime.timestamp()

If the err argument is not None, then the logged line will also contain the following keys:

err: The repr of the err object.
traceback: The formatted traceback of the exception.

The msg string will be merged into the object under the msg key, and all keys and values in the details dict will be merged into the resulting object as-is.

The keys in the details dict are assumed to be strings, but the values can be any type. If a value is not json-serializable, the logging function will force a string conversion by calling str(value) on it. This is applied recursively to any nested dictionaries as well.

The default of calling str on the values is fine for logging blobs of dictionary data, but usually, it's best to explicitly convert an unsupported value to a json-serializable form before logging it so that the logs contain all of the information expected. For example, when logging dates/datetimes, it may be desirable to have a very high precision POSIX timestamp, or you may want to log a more human-readable ISO-formatted date string. Converting the value to the desired format before logging it is preferred.

Processing Structured Logs

Logging in a structured format like JSON is useful because we can query the logs based on keys and values instead of simply scanning through thousands of lines of text manually or with inaccurate search heuristics.

For example, using the shell tool jq, we can filter our logs to only the lines that have an err object logged.

cat debug.log | jq 'select(.err != null)'

Or we can get a list of all log entries from the last 42 minutes.

cat debug.log | jq 'select(.timestamp >= (now - 2520))'

Or we can count the number of log entries.

cat debug.log | jq --slurp 'length'

Or we can get an array of (msg, traceback) tuples for all of the ValueErrors that we've logged.

cat debug.log | jq 'select(.err != null) | select(.err | contains("ValueError")) | [.msg, .traceback]'

Or we can use any other combination of query-like filters to examine the exact messages we're concerned with.

This means that if we include queryable keys in our logging calls in the source code, it is easy to find the specific error messages we need to debug all the nasty issues our applications give us at 3am on a Saturday.

Asynchronous Logging

There are async versions of each of the logging functions as well as register_async_logger and deregister_async_logger functions:

async_debug
async_info
async_warning
async_error
async_critical
register_async_logger
deregister_async_logger

The async logging functions only work with a logger that has been registered. This is because the logging calls themselves are synchronous, and they need to be queued in order to run them concurrently with other tasks in the event loop. Registering a logger to be used asynchronously doesn't mutate the logger in any way, and async loggers are still the same logging.Logger objects. Calling register_async_logger simply creates an asyncio.Queue and a Task to run synchronous logging functions in the background.

The async queues are managed internally in this package and run the logging events on the event loop in the background. This means that a call like await async_info(logger, msg) doesn't actually wait until the message is logged; it puts an event into the queue to be logged in the background at the discretion of the event loop's task scheduler. This means that deregister_async_logger needs to be called on any loggers registered as async before application shutdown in order to guarantee all log messages are flushed to their targets. Failing to deregister a logger will not cause any problems, but it may result in pending log messages being lost.

Similarly to how any logger with any handlers can be used with the sync functions for structured logging to any target, any logger can be registered as an async logger by passing it into the register_async_logger function. That does not mean that registering another library's logger will cause that library's logging events to run asynchronously. The asynchronous logging only works if the async_* functions are used. Registering a logger that you don't control will only add overhead due to the empty task taking CPU cycles away from other background work on the event loop.

TODO: Add an example of async usage here.

Async Performance Considerations

Using async logging won't make your logging any faster. Because writing the actual log messages is synchronous, excessive logging will still cause a CPU-bound bottleneck in your application. However, if you are using asyncio already, using async logging should make your code more efficient by giving the event loop a chance to start other background tasks in between registering a logging event and actually logging the message. In other words, the main benefit is not to make logging more efficient but instead to make sure the event loop can keep as many concurrent tasks running as possible.

One thing to consider with respect to the async event loop is the size limit for the logging queues. The queues will not block the event loop from running other tasks regardless of the size limit, but there are tradeoffs to consider. Due to the way asyncio.Queue works, when the queue is full, it will continue to pass execution to other tasks until an item is removed from the queue. This means that in situations where the application is performing excessive logging due to some unforseen usage pattern or a programming oversight, the size limit on the queue will help to throttle the CPU usage of the logging events by not continuing to enqueue more events until the oldest one is evicted. This will give the event loop more chances to start other tasks such as handling a request through the web framework or sending an async API response to the frontend. The logging events will still hog the CPU while they are running, but the size limit maximizes the chances the application has to start other IO-bound tasks in between logging events. The flipside of this is that if the async logging call is happening inside a handler for a request or before sending a response to the client, then that entire coroutine will wait until there is space in the queue to add another logging event. For this reason, some applications may want to use a large size limit for logging queues depending on their needs, but it is very unlikely that the wait time for a queue eviction would result in a more significant slowdown than the CPU load that an unbounded queue would allow the logging events to accumulate.

When in doubt, profile.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 5 - Production/Stable
Framework
- AsyncIO
Intended Audience
- Developers
- System Administrators
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
Typing
- Typed

Release history Release notifications | RSS feed

1.2.0

Oct 28, 2024

This version

1.1.0

Oct 27, 2024

1.0.1

Oct 25, 2024

1.0.0

Oct 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grammlog-1.1.0.tar.gz (16.1 kB view details)

Uploaded Oct 27, 2024 Source

Built Distribution

grammlog-1.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Oct 27, 2024 Python 3

File details

Details for the file grammlog-1.1.0.tar.gz.

File metadata

Download URL: grammlog-1.1.0.tar.gz
Upload date: Oct 27, 2024
Size: 16.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for grammlog-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`eaa1dcbb287813715e9f2487efe2c7c792dec95ec4a6584e712026e78dd1d1a4`
MD5	`2f4745bb92af33231c1072257c185f17`
BLAKE2b-256	`184673f5b22e7d948c1fbd4b724976e2eae317280861d8a4ff38ea5f9aa4694b`

See more details on using hashes here.

File details

Details for the file grammlog-1.1.0-py3-none-any.whl.

File metadata

Download URL: grammlog-1.1.0-py3-none-any.whl
Upload date: Oct 27, 2024
Size: 14.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for grammlog-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a893134166ad9e5b49f9d02d8d88028e88a4acff283f262ca2a65738112fe67`
MD5	`2db42c28dcdb32dd066a32fff8f314d6`
BLAKE2b-256	`0cfd7028b5ed19cca90ca0d04950ab716b21ef41c2b6717122d02ff5fabd8591`