Skip to main content

Logprep allows to collect, process and forward log messages from various data sources.

Project description

Logprep

GitHub release (latest by date) GitHub Workflow Status (branch) Documentation Status GitHub contributors Coverage GitHub Repo stars

Introduction

Logprep allows to collect, process and forward log messages from various data sources. Log messages are being read and written by so-called connectors. Currently, connectors for Kafka, Opensearch, S3, HTTP and JSON(L) files exist.

The log messages are processed in serial by a pipeline of processors, where each processor modifies an event that is being passed through. The main idea is that each processor performs a simple task that is easy to carry out. Once the log message is passed through all processors in the pipeline the resulting message is sent to a configured output connector.

Logprep is primarily designed to process log messages. Generally, Logprep can handle JSON messages, allowing further applications besides log handling.

This readme provides basic information about the following topics:

More detailed information can be found in the Documentation.

About Logprep

Pipelines

Logprep processes incoming log messages with a configured pipeline that can be spawned multiple times via multiprocessing. The following chart shows a basic setup that represents this behaviour. The pipeline consists of three processors: the Dissector, Geo-IP Enricher and the Dropper. Each pipeline runs concurrently and takes one event from it's Input Connector. Once the log messages is fully processed the result will be forwarded to the Output Connector, after which the pipeline will take the next message, repeating the processing cycle.

flowchart LR
A1[Input\nConnector] --> B
A2[Input\nConnector] --> C
A3[Input\nConnector] --> D
subgraph Pipeline 1
B[Dissector] --> E[Geo-IP Enricher]
E --> F[Dropper]
end
subgraph Pipeline 2
C[Dissector] --> G[Geo-IP Enricher]
G --> H[Dropper]
end
subgraph Pipeline n
D[Dissector] --> I[Geo-IP Enricher]
I --> J[Dropper]
end
F --> K1[Output\nConnector]
H --> K2[Output\nConnector]
J --> K3[Output\nConnector]

Processors

Every processor has one simple task to fulfill. For example, the Dissector can split up long message fields into multiple subfields to facilitate structural normalization. The Geo-IP Enricher, for example, takes an ip-address and adds the geolocation of it to the log message, based on a configured geo-ip database. Or the Dropper deletes fields from the log message.

As detailed overview of all processors can be found in the processor documentation.

To influence the behaviour of those processors, each can be configured with a set of rules. These rules define two things. Firstly, they specify when the processor should process a log message and secondly they specify how to process the message. For example which fields should be deleted or to which IP-address the geolocation should be retrieved.

For performance reasons on startup all rules per processor are aggregated to a generic and a specific rule tree, respectively. Instead of evaluating all rules independently for each log message the message is checked against the rule tree. Each node in the rule tree represents a condition that has to be meet, while the leafs represent changes that the processor should apply. If no condition is met, the processor will just pass the log event to the next processor.

The following chart gives an example of such a rule tree:

flowchart TD
A[root]
A-->B[Condition 1]
A-->C[Condition 2]
A-->D[Condition 3]
B-->E[Condition 4]
B-->H(Rule 1)
C-->I(Rule 2)
D-->J(rule 3)
E-->G(Rule 4)

To further improve the performance, it is possible to prioritize specific nodes of the rule tree, such that broader conditions are higher up in the tree. And specific conditions can be moved further down. Following json gives an example of such a rule tree configuration. This configuration will lead to the prioritization of tags and message in the rule tree.

{
  "priority_dict": {
    "category": "01",
    "message": "02"
  },
  "tag_map": {
    "check_field_name": "check-tag"
  }
}

Instead of writing very specific rules that apply to single log messages, it is also possible to define generic rules that apply to multiple messages. It is possible to define a set of generic and specific rules for each processor, resulting in two rule trees.

Connectors

Connectors are responsible for reading the input and writing the result to a desired output. The main connectors that are currently used and implemented are a kafka-input-connector and a kafka-output-connector allowing to receive messages from a kafka-topic and write messages into a kafka-topic. Addionally, you can use the Opensearch or Opensearch output connectors to ship the messages directly to Opensearch or Opensearch after processing.

The details regarding the connectors can be found in the input connector documentation and output connector documentation.

Configuration

To run Logprep, certain configurations have to be provided. Because Logprep is designed to run in a containerized environment like Kubernetes, these configurations can be provided via the filesystem or http. By providing the configuration via http, it is possible to control the configuration change via a flexible http api. This enables Logprep to quickly adapt to changes in your environment.

First, a general configuration is given that describes the pipeline and the connectors, and lastly, the processors need rules in order to process messages correctly.

The following yaml configuration shows an example configuration for the pipeline shown in the graph above:

process_count: 3
timeout: 0.1

pipeline:
  - dissector:
      type: dissector
      specific_rules:
        - https://your-api/dissector/
      generic_rules:
        - rules/01_dissector/generic/
  - geoip_enricher:
      type: geoip_enricher
      specific_rules:
        - https://your-api/geoip/
      generic_rules:
        - rules/02_geoip_enricher/generic/
      tree_config: artifacts/tree_config.json
      db_path: artifacts/GeoDB.mmdb
  - dropper:
      type: dropper
      specific_rules:
        - rules/03_dropper/specific/
      generic_rules:
        - rules/03_dropper/generic/

input:
  mykafka:
    type: confluentkafka_input
    bootstrapservers: [127.0.0.1:9092]
    topic: consumer
    group: cgroup
    auto_commit: true
    session_timeout: 6000
    offset_reset_policy: smallest
output:
  opensearch:
    type: opensearch_output
    hosts:
        - 127.0.0.1:9200
    default_index: default_index
    error_index: error_index
    message_backlog_size: 10000
    timeout: 10000
    max_retries:
    user: the username
    secret: the passord
    cert: /path/to/cert.crt

The following yaml represents a dropper rule which according to the previous configuration should be in the rules/03_dropper/generic/ directory.

filter: "message"
drop:
  - message
description: "Drops the message field"

The condition of this rule would check if the field message exists in the log. If it does exist then the dropper would delete this field from the log message.

Details about the rule language and how to write rules for the processors can be found in the rule configuration documentation.

Getting Started

For installation instructions see: https://logprep.readthedocs.io/en/latest/installation.html For execution instructions see: https://logprep.readthedocs.io/en/latest/user_manual/execution.html

Reload the Configuration

A config_refresh_interval can be set to periodically and automatically refresh the given configuration. This can be useful in case of containerized environments (such as Kubernetes), when pod volumes often change on the fly.

If the configuration does not pass a consistency check, then an error message is logged and Logprep keeps running with the previous configuration. The configuration should be then checked and corrected on the basis of the error message.

Documentation

The documentation for Logprep is online at https://logprep.readthedocs.io/en/latest/ or it can be built locally via:

sudo apt install pandoc
pip install -e .[doc]
cd ./doc/
make html

A HTML documentation can be then found in doc/_build/html/index.html.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logprep-14.0.0.tar.gz (3.2 MB view details)

Uploaded Source

Built Distributions

logprep-14.0.0-cp312-cp312-musllinux_1_2_x86_64.whl (638.9 kB view details)

Uploaded CPython 3.12 musllinux: musl 1.2+ x86-64

logprep-14.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (577.7 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

logprep-14.0.0-cp311-cp311-musllinux_1_2_x86_64.whl (639.2 kB view details)

Uploaded CPython 3.11 musllinux: musl 1.2+ x86-64

logprep-14.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (577.9 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

logprep-14.0.0-cp310-cp310-musllinux_1_2_x86_64.whl (639.4 kB view details)

Uploaded CPython 3.10 musllinux: musl 1.2+ x86-64

logprep-14.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (578.1 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

File details

Details for the file logprep-14.0.0.tar.gz.

File metadata

  • Download URL: logprep-14.0.0.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for logprep-14.0.0.tar.gz
Algorithm Hash digest
SHA256 70ed2a4d0458e28ebf2469ec1a3306b9d65f469a0bde57af8e887c2494b19240
MD5 bedf332871c73a4bbe065f26dd9656f4
BLAKE2b-256 1520a4a0bd33791170c7c81a8b590637cb241358067059006b07015cac98ee9a

See more details on using hashes here.

File details

Details for the file logprep-14.0.0-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for logprep-14.0.0-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 0b46a33d92954e7fdb12b4c9193d23dbed775cbdcf6a5715d4a960cdf30d9e9c
MD5 5099a4793d6bd54c7fbff12ebeecb6b9
BLAKE2b-256 8eda60976698997c2f34b1e79ce651c1583a3c5b6bf05133a2ffc6db107e2c27

See more details on using hashes here.

File details

Details for the file logprep-14.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for logprep-14.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d56049754db608f20834ac214afbede0b4728f6a54e86e57217377e597e55a3a
MD5 0adc854713a8cc10ae9fddd8cc388d1f
BLAKE2b-256 8b9d07e2dfcada2338912914aff312640eecf1297fe8dd5e7dc00a1e5d8cb2ab

See more details on using hashes here.

File details

Details for the file logprep-14.0.0-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for logprep-14.0.0-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 9605fdd66db4d8a2f95ea49242cd579fc6842567b118965ac50a4a2b575cb15e
MD5 78cae60e3877da1c53ed76c05f48c8e7
BLAKE2b-256 7e4b633b22497252347890394f69ec05b3cdab80adacae84881b1460c788d077

See more details on using hashes here.

File details

Details for the file logprep-14.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for logprep-14.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d80112ba04aec630566c60f98219cbe6895b307c71acc3979d7644902353ff09
MD5 af65560a298292179a1a0f95c83d51c6
BLAKE2b-256 6406762781c9183a4fefe5a50994237f716d056321916871db5e65feb26efd80

See more details on using hashes here.

File details

Details for the file logprep-14.0.0-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for logprep-14.0.0-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 09fb4eca386445eb577a16741c9ebb5332c945d6a5ec1a9924f7a7c27cf159d0
MD5 e282e0afc1059bed6e97b5dfb49fecb2
BLAKE2b-256 726e040011d13a0e262ff5d164b24fcc5c549ad44f6ed905b7f76d1a9ff2df00

See more details on using hashes here.

File details

Details for the file logprep-14.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for logprep-14.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8b152105cf4398e974ec572e17b0ff80175ff9ed91dd024f018f4c46e674a0fa
MD5 5de3956ea3cd2d56e9a16cbe7298b60b
BLAKE2b-256 24535520e7afdecc0f99050e5f59fba5a8565f210924a9aa004d9191d4db576f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page