Skip to main content

Logging/encoding/decoding using CLP's IR stream format

Project description

CLP Python Logging Library

This is a Python logging library meant to supplement CLP (Compressed Log Processor). Logs are compressed in a streaming fashion into CLP's Internal Representation (IR) format before written to disk. More details are described in this Uber's blog.

Logs compressed in IR format can be viewed in a log viewer or programmatically analyzed using APIs provided here. They can also be decompressed back into plain-text log files using CLP (in a future release).

To achieve the best compression ratio, CLP should be used to compress large batches of logs, one batch at a time. However, individual log files are generally small and are generated across a long period of time.

This logging library helps solve this problem by logging directly in CLP's Internal Representation (IR). A log created with a CLP logging handler is first parsed and then appended to a compressed output stream in IR form. See README-protocol.md for more details on the format of CLP IR.

These log files containing the compressed CLP IR streams can then all be ingested into CLP together at a later time.

Quick Start

The package is hosted with pypi (https://pypi.org/project/clp-logging/), so it can be installed with pip:

python3 -m pip install --upgrade clp-logging

Logger handlers

CLPStreamHandler

  • Writes encoded logs directly to a stream

CLPFileHandler

  • Simple wrapper around CLPStreamHandler that calls open

Example: CLPFileHandler

import logging
from pathlib import Path
from clp_logging.handlers import CLPFileHandler

clp_handler = CLPFileHandler(Path("example.clp.zst"))
logger = logging.getLogger(__name__)
logger.addHandler(clp_handler)
logger.warn("example warning")

CLPSockHandler + CLPSockListener

This library also supports multiple processes writing to the same log file. In this case, all logging processes write to a listener server process through a TCP socket. The socket name is the log file path passed to CLPSockHandler with a ".sock" suffix.

A CLPSockListener can be explicitly created (and will run as a daemon) by calling: CLPSockListener.fork(log_path, sock_path, timezone, timestamp_format). Alternatively CLPSockHandlers can transparently start an associated CLPSockListener by calling CLPSockHandler with create_listener=True.

CLPSockListener must be explicitly stopped once logging is completed. There are two ways to stop the listener process:

  • Calling stop_listener() from an existing handler, e.g., clp_handler.stop_listener(), or from a new handler with the same log path, e.g., CLPSockHandler(Path("example.clp.zst")).stop_listener()
  • Kill the CLPSockListener process with SIGTERM

Example: CLPSockHandler + CLPSockListener

In the handler processes or threads:

import logging
from pathlib import Path
from clp_logging.handlers import CLPSockHandler

clp_handler = CLPSockHandler(Path("example.clp.zst"), create_listener=True)
logger = logging.getLogger(__name__)
logger.addHandler(clp_handler)
logger.warn("example warning")

In a single process or thread once logging is completed:

from pathlib import Path
from clp_logging.handlers import CLPSockHandler

CLPSockHandler(Path("example.clp.zst")).stop_listener()

CLP readers (decoders)

CLPStreamReader

  • Read/decode any arbitrary stream
  • Can be used as an iterator that returns each log message as an object
  • Can skip n logs: clp_reader.skip_nlogs(N)
  • Can skip to first log after given time (since unix epoch):
    • clp_reader.skip_to_time(TIME)

CLPFileReader

  • Simple wrapper around CLPStreamHandler that calls open

Example code: CLPFileReader

from pathlib import Path
from typing import List

from clp_logging.readers import CLPFileReader, Log

# create a list of all Log objects
log_objects: List[Log] = []
with CLPFileReader(Path("example.clp.zst")) as clp_reader:
    for log in clp_reader:
        log_objects.append(log)

CLPSegmentStreaming

  • Classes that inherit from CLPBaseReader can only read a single CLP IR stream from start to finish. This is necessary because, to determine the timestamp of an individual log, the starting timestamp (from the IR stream preamble) and all timestamp deltas up to that log must be known. In scenarios where an IR stream is periodically uploaded in chunks, users would need to either continuously read the entire stream or re-read the entire stream from the start.
  • The CLPSegmentStreaming class has the ability to take an input IR stream and segment it, outputting multiple independent IR streams. This makes it possible to read arbitrary segments of the original input IR stream without needing to decode it from the start.
  • In technical terms, the segment streaming reader allows the read operation to start from a non-zero offset and streams the legally encoded logs from one stream to another.
  • Each read call will return encoded metadata that can be used to resume from the current call.

Example code: CLPSegmentStreaming

from clp_logging.readers import CLPSegmentStreaming
from clp_logging.protocol import Metadata

segment_idx: int = 0
segment_max_size: int = 8192
offset: int = 0
metadata: Metadata = None
while True:
	bytes_read: int
	with open("example.clp", "rb") as fin, open(f"{segment_idx}.clp", "wb") as fout:
		bytes_read, metadata = CLPSegmentStreaming.read(
			fin,
			fout,
			offset=offset,
			max_bytes_to_write=segment_max_size,
			metadata=metadata
		)
		segment_idx += 1
		offset += bytes_read
	if metadata == None:
		break

In the example code provided, "example.clp" is streamed into segments named "0.clp", "1.clp", and so on. Each segment is smaller than 8192 bytes and can be decoded independently.

Log level timeout feature: CLPLogLevelTimeout

All log handlers support a configurable timeout feature. A (user configurable) timeout will be scheduled based on logs' logging level (verbosity) that will flush the zstandard frame and allow users to run arbitrary code. This feature allows users to automatically perform log related tasks, such as periodically uploading their logs for storage. By setting the timeout in response to the logs' logging level the responsiveness of a task can be adjusted based on the severity of logging level seen. An additional timeout is always triggered on closing the logging handler.

See the class documentation for specific details.

Example code: CLPLogLevelTimeout

import logging
from pathlib import Path
from clp_logging.handlers import CLPLogLevelTimeout, CLPSockHandler

class LogTimeoutUploader:
    # Store relevent information/objects to carry out the timeout task
    def __init__(self, log_path: Path) -> None:
        self.log_path: Path = log_path
        return

    # Create any methods necessary to carry out the timeout task
    def upload_log(self) -> None:
        # upload the logs to the cloud
        return

    def timeout(self) -> None:
        self.upload_log()

log_path: Path = Path("example.clp.zst")
uploader = LogTimeoutUploader(log_path)
loglevel_timeout = CLPLogLevelTimeout(uploader.timeout)
clp_handler = CLPSockHandler(log_path, create_listener=True, loglevel_timeout=loglevel_timeout)
logging.getLogger(__name__).addHandler(clp_handler)

Compatibility

Tested on Python 3.6, 3.8, and 3.11 (should also work on newer versions). Built/packaged on Python 3.8 for convenience regarding type annotation.

Development

Setup environment

  1. Create and enter a virtual environment: python3.8 -m venv venv; . ./venv/bin/activate
  2. Install the project in editable mode, the development dependencies, and the test dependencies: pip install -e .[dev,test]

Note: you may need to upgrade pip first for -e to work. If so, run: pip install --upgrade pip

Packaging

To build a package for distribution run: python -m build

Testing

To run the unit tests run: python -m unittest -bv

Note: the baseline comparison logging handler and the CLP handler both get unique timestamps. It is possible for these timestamps to differ, which will result in a test reporting a false positive error.

Contributing

Before submitting a pull request, run the following error-checking and formatting tools (found in requirements-dev.txt):

  • mypy: mypy src tests
    • mypy checks for typing errors. You should resolve all typing errors or if an error cannot be resolved (e.g., it's due to a third-party library), you should add a comment # type: ignore to silence the error.
  • docformatter: docformatter -i src tests
    • This formats docstrings. You should review and add any changes to your PR.
  • Black: black src tests
    • This formats the code according to Black's code-style rules. You should review and add any changes to your PR.
  • ruff: ruff check --fix src tests
    • This performs linting according to PEPs. You should review and add any changes to your PR.

Note that docformatter should be run before black to give Black the last word.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clp_logging-0.0.12.tar.gz (37.9 kB view hashes)

Uploaded Source

Built Distribution

clp_logging-0.0.12-py3-none-any.whl (27.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page