Logging/encoding/decoding using CLP's IR stream format
Project description
CLP Python Logging Library
This is a Python logging library meant to supplement CLP (Compressed Log Processor).
It operates by serializing and compressing log events using the CLP Intermediate Representation (IR)
format, achieving both data streaming capabilities and effective compression ratios. Log files
serialized using the IR format can be viewed using the YScope Log Viewer. They can also be deserialized
to their original plain-text format, or programmatically analyzed with the APIs provided by
clp-ffi-py. For further information, refer to the detailed explanation in this Uber blog.
Motivation
CLP buffers a substantial volume of log files before executing compression for a better compression
ratio. However, most individual log files are actively opened for appending over an extended
duration. In their raw-text format, these log files are not space-efficient and do not support
efficient querying through standard text-based tools like grep.
To address this problem, this logging library is designed to serialize log events directly in CLP's Intermediate Representation (IR) format. A log event created with a CLP logging handler will first be encoded into the IR format, and then appended to a compressed output stream. This approach not only minimizes storage resource consumption but also facilitates the execution of high-performance, early-stage analytics using the APIs from clp-ffi-py. These compressed CLP IR files can be further processed by CLP to achieve superior compression ratios and more extensive analytics capabilities.
For a detailed understanding of the CLP IR format, refer to README-protocol.md
Quick Start
The package is hosted with pypi (https://pypi.org/project/clp-logging/), so it
can be installed with pip:
python3 -m pip install --upgrade clp-logging
Logging handlers
ClpKeyValuePairStreamHandler
⭐ New in v0.0.14
This handler enables applications to write structured log events directly into CLP's key-value pair (kv-pair) IR stream format. The handler accepts structured log events in the form of Python dictionaries, where each dictionary entry must abide by the requirements detailed below. The handler will also automatically include certain metadata (e.g., the log event's level) with each log event.
[!WARNING] This handler cannot be used with other logging handlers since it requires
msg(the first argument passed to the logging method) to be a dictionary. In contrast, standard handlers typically treatmsgas a format string. In the future, this handler may be moved or reworked to avoid confusion.
[!NOTE] Since this handler accepts structured log events, it doesn't support setting a Formatter (because the log events don't need to be formatted into a string).
[!WARNING]
ClpKeyValuePairStreamHandlercurrently doesn't support CLPLogLevelTimeout. This feature will be added in a future release.
Key-value pair requirements
ClpKeyValuePairStreamHandler requires kv-pairs abide by the following rules:
- Keys must be of type
str. - Values must be one of the following types:
- Primitives:
int,float,str,bool, orNone. - Arrays (
list), where each array:- may contain primitive values, dictionaries, or nested arrays.
- can be empty.
- Dictionaries (
dict), where each dictionary:- must adhere to the aforementioned rules for keys and values.
- can be empty.
- Primitives:
Automatically generated kv-pairs
In addition to the kv-pairs explicitly logged by the application, the handler will add kv-pairs, like the log event's level, to each log event. We refer to the former as user-generated kv-pairs and the latter as auto-generated kv-pairs.
[!NOTE] The kv-pair IR stream format stores auto-generated kv-pairs separately from user-generated kv-pairs, so users don't need to worry about key collisions with the auto-generated keys.
The handler adds the following auto-generated kv-pairs to each log event:
| Key | Value type | Description |
|---|---|---|
timestamp |
dict |
The log event's timestamp |
- unix_millisecs |
int |
The timestamp in milliseconds since the Unix epoch |
- utc_offset_secs |
int |
The timestamp's offset fom UTC, in seconds |
level |
dict |
The log event's level |
- name |
str |
The level's name |
- num |
int |
The level's numeric value |
source_location |
dict |
The source location of the logging statement |
- path |
str |
The source location's path |
- line |
int |
The source location's line number |
Example: ClpKeyValuePairStreamHandler
import logging
from pathlib import Path
from clp_logging.handlers import ClpKeyValuePairStreamHandler
clp_handler = ClpKeyValuePairStreamHandler(open(Path("example.clp.zst"), "wb"))
logger: logging.Logger = logging.getLogger(__name__)
logger.addHandler(clp_handler)
logger.info({
"message": "This is an example message",
"machine_info": {
"uid": 12345,
"ip": "127.0.0.1",
},
})
Reading kv-pair IR streams
The following options are available for reading and deserializing kv-pair IR streams generated by this handler:
- clp-ffi-py: This library provides a Deserializer to access a kv-pair IR stream in Python. This example illustrates its usage.
- YScope Log Viewer: This UI can be used to view kv-pair IR streams.
CLPStreamHandler
- Writes encoded logs directly to a stream
CLPFileHandler
- Simple wrapper around CLPStreamHandler that calls open
Example: CLPFileHandler
import logging
from pathlib import Path
from clp_logging.handlers import CLPFileHandler
clp_handler = CLPFileHandler(Path("example.clp.zst"))
logger = logging.getLogger(__name__)
logger.addHandler(clp_handler)
logger.warn("example warning")
CLPSockHandler + CLPSockListener
This library also supports multiple processes writing to the same log file. In this case, all logging processes write to a listener server process through a TCP socket. The socket name is the log file path passed to CLPSockHandler with a ".sock" suffix.
A CLPSockListener can be explicitly created (and will run as a daemon) by calling:
CLPSockListener.fork(log_path, sock_path, timezone, timestamp_format).
Alternatively CLPSockHandlers can transparently start an associated CLPSockListener
by calling CLPSockHandler with create_listener=True.
CLPSockListener must be explicitly stopped once logging is completed. There are two ways to stop the listener process:
- Calling
stop_listener()from an existing handler, e.g.,clp_handler.stop_listener(), or from a new handler with the same log path, e.g.,CLPSockHandler(Path("example.clp.zst")).stop_listener() - Kill the CLPSockListener process with SIGTERM
Example: CLPSockHandler + CLPSockListener
In the handler processes or threads:
import logging
from pathlib import Path
from clp_logging.handlers import CLPSockHandler
clp_handler = CLPSockHandler(Path("example.clp.zst"), create_listener=True)
logger = logging.getLogger(__name__)
logger.addHandler(clp_handler)
logger.warn("example warning")
In a single process or thread once logging is completed:
from pathlib import Path
from clp_logging.handlers import CLPSockHandler
CLPSockHandler(Path("example.clp.zst")).stop_listener()
Read IR streams
[!WARNING] All readers are removed from this library since v0.0.15. To read an IR stream, use clp-ffi-py instead.
Log level timeout feature: CLPLogLevelTimeout
All log handlers support a configurable timeout feature. A (user configurable) timeout will be scheduled based on logs' logging level (verbosity) that will flush the zstandard frame and allow users to run arbitrary code. This feature allows users to automatically perform log related tasks, such as periodically uploading their logs for storage. By setting the timeout in response to the logs' logging level the responsiveness of a task can be adjusted based on the severity of logging level seen. An additional timeout is always triggered on closing the logging handler.
See the class documentation for specific details.
Example code: CLPLogLevelTimeout
import logging
from pathlib import Path
from clp_logging.handlers import CLPLogLevelTimeout, CLPSockHandler
class LogTimeoutUploader:
# Store relevent information/objects to carry out the timeout task
def __init__(self, log_path: Path) -> None:
self.log_path: Path = log_path
return
# Create any methods necessary to carry out the timeout task
def upload_log(self) -> None:
# upload the logs to the cloud
return
def timeout(self) -> None:
self.upload_log()
log_path: Path = Path("example.clp.zst")
uploader = LogTimeoutUploader(log_path)
loglevel_timeout = CLPLogLevelTimeout(uploader.timeout)
clp_handler = CLPSockHandler(log_path, create_listener=True, loglevel_timeout=loglevel_timeout)
logging.getLogger(__name__).addHandler(clp_handler)
Compatibility
Tested on Python 3.7, 3.8, and 3.11 (should also work on newer versions). Built/packaged on Python 3.8 for convenience regarding type annotation.
Development
Setup environment
- Create and enter a virtual environment:
python3.8 -m venv venv; . ./venv/bin/activate - Install the project in editable mode, the development dependencies, and the test dependencies:
pip install -e .[dev,test]
Note: you may need to upgrade pip first for -e to work. If so, run: pip install --upgrade pip
Packaging
To build a package for distribution run:
python -m build
Testing
To run the unit tests run:
python -m unittest -bv
Note: the baseline comparison logging handler and the CLP handler both get unique timestamps. It is possible for these timestamps to differ, which will result in a test reporting a false positive error.
Contributing
Before submitting a pull request, run the following error-checking and formatting tools (found in requirements-dev.txt):
- mypy:
mypy src tests- mypy checks for typing errors. You should resolve all typing errors or if an
error cannot be resolved (e.g., it's due to a third-party library), you
should add a comment
# type: ignoreto silence the error.
- mypy checks for typing errors. You should resolve all typing errors or if an
error cannot be resolved (e.g., it's due to a third-party library), you
should add a comment
- docformatter:
docformatter -i src tests- This formats docstrings. You should review and add any changes to your PR.
- Black:
black src tests- This formats the code according to Black's code-style rules. You should review and add any changes to your PR.
- ruff:
ruff check --fix src tests- This performs linting according to PEPs. You should review and add any changes to your PR.
Note that docformatter should be run before black to give Black the last
word.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clp_logging-0.0.15.tar.gz.
File metadata
- Download URL: clp_logging-0.0.15.tar.gz
- Upload date:
- Size: 37.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b11a1d9946a9c8efc58ba600443843a06419a772cca1bb516eafc0e6204c617
|
|
| MD5 |
0a7c218990729ac0c8f7e86b24fb64ea
|
|
| BLAKE2b-256 |
3555a45d389ab1744cd864fa70bb616e6c8ddf385705a74d4e28019801f125c6
|
Provenance
The following attestation bundles were made for clp_logging-0.0.15.tar.gz:
Publisher:
release.yaml on y-scope/clp-loglib-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clp_logging-0.0.15.tar.gz -
Subject digest:
4b11a1d9946a9c8efc58ba600443843a06419a772cca1bb516eafc0e6204c617 - Sigstore transparency entry: 741908086
- Sigstore integration time:
-
Permalink:
y-scope/clp-loglib-py@c0fc8d71d341d79866f7c87914ec321e4ed2df82 -
Branch / Tag:
refs/tags/v0.0.15 - Owner: https://github.com/y-scope
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@c0fc8d71d341d79866f7c87914ec321e4ed2df82 -
Trigger Event:
push
-
Statement type:
File details
Details for the file clp_logging-0.0.15-py3-none-any.whl.
File metadata
- Download URL: clp_logging-0.0.15-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
910d2ccd99d53365c53d9e24e4118b5bfea253917fd097224ca431969a936ccf
|
|
| MD5 |
e8857ec05ccf583806110e4b3ac3be79
|
|
| BLAKE2b-256 |
36796bdf4ddc4458203117e6d917c8ada81cf83e4f411bfe5bbcc9ddc87112f2
|
Provenance
The following attestation bundles were made for clp_logging-0.0.15-py3-none-any.whl:
Publisher:
release.yaml on y-scope/clp-loglib-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clp_logging-0.0.15-py3-none-any.whl -
Subject digest:
910d2ccd99d53365c53d9e24e4118b5bfea253917fd097224ca431969a936ccf - Sigstore transparency entry: 741908089
- Sigstore integration time:
-
Permalink:
y-scope/clp-loglib-py@c0fc8d71d341d79866f7c87914ec321e4ed2df82 -
Branch / Tag:
refs/tags/v0.0.15 - Owner: https://github.com/y-scope
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@c0fc8d71d341d79866f7c87914ec321e4ed2df82 -
Trigger Event:
push
-
Statement type: