Skip to main content

A machine learning toolkit for log parsing from LOGPAI

Project description

Logparser

Python version Pypi version Pypi version Downloads License

Logparser provides a machine learning toolkit and benchmarks for automated log parsing, which is a crucial step for structured log analytics. By applying logparser, users can automatically extract event templates from unstructured logs and convert raw log messages into a sequence of structured events. The process of log parsing is also known as message template extraction, log key extraction, or log message clustering in the literature.


An example of log parsing

🌈 New updates

Log parsers available:

Publication Parser Paper Reference Benchmark
IPOM'03 SLCT A Data Clustering Algorithm for Mining Patterns from Event Logs, by Risto Vaarandi. :arrow_upper_right:
QSIC'08 AEL Abstracting Execution Logs to Execution Events for Enterprise Applications, by Zhen Ming Jiang, Ahmed E. Hassan, Parminder Flora, Gilbert Hamann. :arrow_upper_right:
KDD'09 IPLoM Clustering Event Logs Using Iterative Partitioning, by Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios. :arrow_upper_right:
ICDM'09 LKE Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis, by Qiang Fu, Jian-Guang Lou, Yi Wang, Jiang Li. [Microsoft] :arrow_upper_right:
MSR'10 LFA Abstracting Log Lines to Log Event Types for Mining Software System Logs, by Meiyappan Nagappan, Mladen A. Vouk. :arrow_upper_right:
CIKM'11 LogSig LogSig: Generating System Events from Raw Textual Logs, by Liang Tang, Tao Li, Chang-Shing Perng. :arrow_upper_right:
SCC'13 SHISO Incremental Mining of System Log Format, by Masayoshi Mizutani. :arrow_upper_right:
CNSM'15 LogCluster LogCluster - A Data Clustering and Pattern Mining Algorithm for Event Logs, by Risto Vaarandi, Mauno Pihelgas. :arrow_upper_right:
CNSM'15 LenMa Length Matters: Clustering System Log Messages using Length of Words, by Keiichi Shima. :arrow_upper_right:
CIKM'16 LogMine LogMine: Fast Pattern Recognition for Log Analytics, by Hossein Hamooni, Biplob Debnath, Jianwu Xu, Hui Zhang, Geoff Jiang, Adbullah Mueen. [NEC] :arrow_upper_right:
ICDM'16 Spell Spell: Streaming Parsing of System Event Logs, by Min Du, Feifei Li. :arrow_upper_right:
ICWS'17 Drain Drain: An Online Log Parsing Approach with Fixed Depth Tree, by Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. :arrow_upper_right:
ICPC'18 MoLFI A Search-based Approach for Accurate Identification of Log Message Formats, by Salma Messaoudi, Annibale Panichella, Domenico Bianculli, Lionel Briand, Raimondas Sasnauskas. :arrow_upper_right:
TSE'20 Logram Logram: Efficient Log Parsing Using n-Gram Dictionaries, by Hetong Dai, Heng Li, Che-Shao Chen, Weiyi Shang, and Tse-Hsun (Peter) Chen. :arrow_upper_right:
ECML-PKDD'20 NuLog Self-Supervised Log Parsing, by Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso, Odej Kao. :arrow_upper_right:
ICSME'22 ULP An Effective Approach for Parsing Large Log Files, by Issam Sedki, Abdelwahab Hamou-Lhadj, Otmane Ait-Mohamed, Mohammed A. Shehab. :arrow_upper_right:
TSC'23 Brain Brain: Log Parsing with Bidirectional Parallel Tree, by Siyu Yu, Pinjia He, Ningjiang Chen, Yifan Wu. :arrow_upper_right:

:bulb: Welcome to submit a PR to push your parser code to logparser and add your paper to the table.

Installation

We recommend installing the logparser package and requirements via pip install.

pip install logparser3

In particular, the package depends on the following requirements.

  • Python 3.6+
  • regex 2022.3.2
  • numpy
  • pandas
  • scipy
  • scikit-learn
  • deap (if using logparser.MoLFI)
  • nltk (if using logparser.SHISO)
  • gcc (if using logparser.SLCT)
  • perl (if using logparser.LogCluster)

Note that regex matching in Python is brittle, so we recommend fixing the regex library to version 2022.3.2.

Get started

  1. Run the demo:

    For each log parser, we provide a demo to help you get started. Each demo shows the basic usage of a target log parser and the hyper-parameters to configure. For example, the following command shows how to run the demo for Drain.

    cd logparser/Drain
    python demo.py
    

    After finishing running the demo, you can obtain extracted event templates and parsed structured logs in the result folder.

  2. Run the benchmark:

    For each log parser, we provide a benchmark script to run log parsing on the loghub_2k datasets for evaluating parsing accuarcy. You can also use other benchmark datasets for log parsing.

    cd logparser/Drain 
    python benchmark.py
    

    The benchmarking results can be found at the Readme file of each parser, e.g., https://github.com/logpai/logparser/tree/main/logparser/Drain#benchmark.

  3. Parse your own logs:

    It is easy to apply logparser to parsing your own log data. To do so, you need to install the logparser3 package first. Then you can develop your own script following the below code snippet to start log parsing.

    from logparser.Drain import LogParser
    
    input_dir = 'PATH_TO_LOGS/' # The input directory of log file
    output_dir = 'result/'  # The output directory of parsing results
    log_file = 'unknow.log'  # The input log file name
    log_format = '<Date> <Time> <Level>:<Content>' # Define log format to split message fields
    # Regular expression list for optional preprocessing (default: [])
    regex = [
        r'(/|)([0-9]+\.){3}[0-9]+(:[0-9]+|)(:|)' # IP
    ]
    st = 0.5  # Similarity threshold
    depth = 4  # Depth of all leaf nodes
    
    parser = LogParser(log_format, indir=input_dir, outdir=output_dir,  depth=depth, st=st, rex=regex)
    parser.parse(log_file)
    

    The full example is shown as example/parse_your_own_logs.py.

Production use

The main goal of logparser is used for research and benchmark purpose. Researchers can use logparser as a code base to develop new log parsers while practitioners could assess the performance and scalability of current log parsing methods through our benchmarking. We strongly recommend practitioners to try logparser in your production environment. But be aware that the current implementation of logparser is far from ready for production use. Whereas we currently have no plan to do that, we do have a few suggestions for developers who want to build an intelligent production-level log parser.

  • Please be aware of the licenses of third-party libraries used in logparser. We suggest to keep one parser and delete the others and then re-build the package wheel. This would not break the use of logparser.
  • Please enhance logparser with efficiency and scalability with multi-processing, add failure recovery, add persistence to disk or message queue Kafka.
  • Drain3 provides a good example for your reference that is built with practical enhancements for production scenarios.

Citation

👋 If you use our logparser tools or benchmarking results in your publication, please cite the following papers.

Discussion

Welcome to join our WeChat group for any question and discussion. Alternatively, you can open an issue here.

Scan QR code

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logparser3-1.0.4.tar.gz (94.7 kB view details)

Uploaded Source

Built Distribution

logparser3-1.0.4-py3-none-any.whl (151.8 kB view details)

Uploaded Python 3

File details

Details for the file logparser3-1.0.4.tar.gz.

File metadata

  • Download URL: logparser3-1.0.4.tar.gz
  • Upload date:
  • Size: 94.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.9.6 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.10.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for logparser3-1.0.4.tar.gz
Algorithm Hash digest
SHA256 0eaf11429014320b0683e2d00a8c4f9633d46aeb89dd0bd8edb9c7c8d1673ac1
MD5 8138661437c036251f720ee73a8bb297
BLAKE2b-256 da9962649cb4e4c2c30494d45343f30788c91c928357d9fa659bb178fed559c3

See more details on using hashes here.

File details

Details for the file logparser3-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: logparser3-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 151.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.9.6 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.10.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for logparser3-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 62de09797436c9011b2b91a094428b84ea740ca5eda159e09fc9da557521906b
MD5 9d84dafb39ebd7ed1acfb5ffd06fe6a9
BLAKE2b-256 43f98ecfc2ff4da76ee3e7b848a3dcb96e3d1804e2d3ed117aafa58529697096

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page