Skip to main content

Log file parser CLI and library (v.2)

Project description

EPAM DevOps-7 Internal Lab title logo

Work with various DBMS in Python.
Module 2: Python. Task 5.

TASK 5   License   POETRY   PYTHON  

Preface

This project contains a solution to one of the tasks of the EPAM DevOps Initial Internal Training Course #7 in 2023. Detailed information about the course, as well as reports on each of the completed tasks (including this one) can be found here /^.
This project builds upon task #4/^ with some modifications specific to this task condition. Please review the documentation/^ to familiarize yourself with these changes. Below you will find additional information on the new features in task #5.

Table of Contents

Conditions

Please review main conditions/^ from the task #4.

Additional ones:

  • Use a database to store the log.
  • Install databases MySQL, PostgreSQL, Mongo DB.
  • Import log data from a file into the database.
  • Export log data from the database to a file in log format.
  • Implement 10 requests to the database (subtasks from task 4).

Implementation

Log_Parser2/^ is a Python package that could be added to your global or virtual environment by preferable package manager pip, pipenv, poetry, etc. The project itself was managed and built using the Poetry library/^, so if you intend to clone this repo and make some changes for your own purposes, please install Poetry/^ or migrate to your preferred package management library.

Based on the need to build and the possibility of using both the library and the CLI, the code was split into a library for importing and a script for execution via the command line. Additionally, the package contains a showcase that demonstrates all use cases when run through the command line.

To enhance the command line's functionality and expand showcase capabilities, the Questionary/^ library is used and will be installed through a dependency link upon package installation.

Structure

task5/
├── README.md (You are here now)
├── pyproject.toml # Poetry package management file
└── log_parser2/ 
    ├── __init__.py # library entry point
    ├── __main__.py # CLI entry point
    ├── __version__.py
    ├── db_provider.py # implementation of work with various databases
    ├── logging.py # logging implementation
    ├── log_parser2.py # library implementation
    ├── cli/
    │   ├── __init__.py
    │   ├── __main__.py
    │   └── cli.py # CLI code implementation
    └── showcase/
            ├── __init__.py 
            ├── __main__.py # showcase entry point when using python -m log_parser2.showcase
            ├── access.log # a sample CSV file
            └── showcase.py # showcase implementation

Installation

Use your preferred installation method via different package installation managers to install Log_Parser2.

Pip

To install Log_Parser2 packet to your environment using pip manager invoke pip install taskp5.

$ pip install taskp5
Collecting taskp5
  Using cached taskp5-2.1.N-py3-none-any.whl (62 kB)
Collecting questionary<2.0.0,>=2.0.0 (from taskp5)
  Downloading questionary-2.0.0-py3-none-any.whl (31 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 31.1/31.1 kB N.N MB/s eta 0:00:00
Collecting prompt_toolkit<4.0,>=2.0 (from questionary<2.0.0,>=1.10.0->taskp5)
  Downloading prompt_toolkit-3.0.39-py3-none-any.whl (385 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 385.2/385.2 kB N.N MB/s eta 0:00:00
Collecting wcwidth (from prompt_toolkit<4.0,>=2.0->questionary<2.0.0,>=1.10.0->taskp5)
  Downloading wcwidth-0.2.6-py2.py3-none-any.whl (29 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 29.2/29.2 kB N.N MB/s eta 0:00:00
Installing collected packages: wcwidth, prompt_toolkit, questionary, taskp5
Successfully installed prompt_toolkit-3.0.39 questionary-1.10.0 taskp5-2.1.N wcwidth-0.2.6

To uninstall Log Parser from your environment invoke pip uninstall taskp5. It's important to note that the pip manager does not uninstall dependent packages. Therefore, if you wish to remove them, you'll need to take the initiative and perform the task yourself. You can do this by using the commands pip uninstall questionary prompt-toolkit wcwidth.

Poetry

To install Log Parser packet to your environment using poetry manager invoke poetry add taskp5.

$ poetry add taskp5
Using version ^2.1.N for taskp5

Updating dependencies
Resolving dependencies...

Package operations: 6 installs, 0 updates, 0 removals

   Installing wcwidth (0.2.6)
   Installing prompt-toolkit (3.0.39)
   Installing questionary (1.10.0)

Writing lock file

By taking this action, a new dependency line will be added to your pyproject.toml file.

[tool.poetry.dependencies]
taskp5 = "^2.1.N"

To uninstall Log Parser from your environment invoke poetry remove taskp5. One of the benefits of utilizing Poetry is that it allows for the removal of all dependent packages with a single command.

Usage

There are various ways to use this library, as mentioned earlier.

  • Utilize it like a library you can just import it into your .py file and use LogHelper class within your code.
  • Utilize CLI via the command shell, either as a Python module or as a standalone command.
  • Utilize CLI command in a pipe by passing stdout of other commands to the stdin of the log_parser2 command, writing stdout and stderr to files, or passing them to following commands.
  • The library also contains rich showcase command that allows you to test all the use cases and even perform them in batches.

Library

Below is a code snippet that demonstrates how to be able to use the log_parser2 library in your code.

from log_parser2.log_parser2 import logger, LogHandler
import re

with open('access.log', 'r') as f:
  db_config = dict(host='3.12.77.55',
                   user='awesome_user',
                   password='*********',
                   database='some_database_name')
  
  dbase = DBTypes.MYSQL.value(db_config, **provider_param)
  
  handler = LogHandler(log=f,
                       extractor=re.compile(extractor, flags=re.I | re.X),
                       selection=selection,
                       aggregation=aggregation,
                       filter_str=filter_str,
                       database=dbase)

print(handler.output)
dbase.close()

CLI

The CLI interface has a single command called "log_parser2". It can be invoked using two methods: python3 -m log_parser2 or simply log_parser2. Log_parser2 accepts various arguments, which are described below.

log_parser2 [-h] [--version] -d  -c  [-C] -e  [-s] [-a] [-f] [-r [INDEXES ...]] [-v] [file_name]

positional arguments:
  file_name             An input filename

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -d , --database-type  Specify a database type that you intend to use (default: MYSQL). 
                        Possible values could be: MYSQL, PostgresSQL, MongoDB
  -c , --credentials    Specify a file with credentials used to connect to the database.
  -C, --create-db       Whether create a new database to store parsed log data or not.
  -e , --extractor      Specify a regular expression for extracting certain data from each log line. 
                        The following expression patterns can be used to make      
                        writing an expression easier and reduce its size. 
                        For instance, "IP4\ \((?P<IP4_LIST>(?:(?:IP4|.*?),\ )*)(?:IP4|.*?)\)\ -\ -\
                        \[DATE_TIME_SEC\]\ \"(?:REQUEST|-)\"" 
                        
                        IP4_OCTET, MASK_OCTET, IP4, IP4_CIDR, MASK, IP6, IP6_CIDR, MAC_PART, MAC, VERSION, AGENT_AGENT,   
                        AGENT_OS, AGENT_DEVICE, AGENT_BROWSER, USER_AGENT, DOMAIN, EMAIL_NAME_CHARSET, EMAIL, URI, URL, 
                        REQUEST_TYPES, REQUEST, DATE_TIME_SEC, UUID
  -s , --selection      Specify a regular expression template to output selected data. To add previously extracted data
                        via regular expression groups use template group naming like \g<group_name>. 
                        It is allowed to use various transformation functions for substitution, extracting and even  
                        splitting data.
                        For instance, 
                        "TO_MIN(\g<DATE_TIME>) \g<URL>" or "SPLIT(RE('(?<=\().+?(?=\))', \g<IP4>), ',\ ')" 
                        
                        The list of possible functions: MIN, MAX, SPLIT, SUB, INTERVAL, SUM, COUNT, RE
  -a , --aggregation    The aggregate to be calculated during log parsing. It is used for sorting or summation.
                        For instance, "MAX(\g<STATUS>,5)" or ..... 
                        
                        The list of possible functions: MIN, MAX, SPLIT, SUB, INTERVAL, SUM, COUNT, RE
  -f , --filter         Specify a regular expression for filtering each line. It is supposed to used RE functions
                        and group naming like \g<group_name>. If the result of using regular expression return None
                        the line will be omitted. 
                        
                        For instance, 
                        "RE('50\d', \g<STATUS>)" or "RE((?:/+[a-z\d\-._~%&\'()*+,;=:@{\'}{\'}]+){2}, \g<REQUEST_PATH>)"
  -r [INDEXES ...], --rows [INDEXES ...]
                        The row range from the log file to be parsed. You can pass values in the following formats: particular indexes: index1 index2 ...       
                        indexN range of indexes: index1-index2 from the beginning up to index: -index from index to the end: index-
  -v                    Increase verbosity level (add more v)

It's mandatory to create a file with credential information to connect to a database and pass it via the -c flag. Also, it's allowed to choose database type from MYSQL, PostgreSQL, and MongoDB.

To handle the arguments, the argparse/^ module is used. If you are already acquainted with it, you will have no difficulty in passing the arguments along with their values and comprehending their behavior.

The examples in a more convenient form you could find in the showcase.

Pipes and files

The log_parser2 command could be used inside the pipe of the BASH commands. It can be used in various ways within a pipeline:

  • receiving input
  • direct output to a file
  • direct logging also to a file

This command isn't intended for use in pipe sequence due to the table output format but is allowed.

Showcase

To showcase the behavior of the log_parser2 library, an interactive command called "log_parser2_showcase" has been created. This command utilizes both the log_parser2 CLI and the log_parser2 library. It's an interactive command you can invoke via log_parser2_showcase or python3 -m log_parser2.showcase. It has an optional flag that allows you to view all use cases at once without any interaction. You can use the command log_parser2_showcase --all to activate this feature. There are also a ready-made Apache Tomcat log file presents - access.log. You can use the showcase to test your own regular expressions and regular expression templates.

showcase_demo.gif

General provisions

All materials provided and/or made available contain EPAM’s proprietary and confidential information and must not to be copied, reproduced or disclosed to any third party or to any other person, other than those persons who have a bona fide need to review it for the purpose of participation in the online courses being provided by EPAM. The intellectual property rights in all materials (including any trademarks) are owned by EPAM Systems Inc or its associated companies, and a limited license, terminable at the discretion of EPAM without notice, is hereby granted to you solely for the purpose of participating in the online courses being provided by EPAM. Neither you nor any other party shall acquire any intellectual property rights of any kind in such materials.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taskp5-2.1.2.tar.gz (4.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

taskp5-2.1.2-py3-none-any.whl (4.3 MB view details)

Uploaded Python 3

File details

Details for the file taskp5-2.1.2.tar.gz.

File metadata

  • Download URL: taskp5-2.1.2.tar.gz
  • Upload date:
  • Size: 4.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.4 Windows/10

File hashes

Hashes for taskp5-2.1.2.tar.gz
Algorithm Hash digest
SHA256 8df7e5b599b8d7744778b2ea9ead7246ec52d1105d9ae458387cdfb419aedda8
MD5 7692e68b42097044b806c7860213165c
BLAKE2b-256 148bae83d7ba52bc52752840218d264dfd19b84254270229bc54e68808f62798

See more details on using hashes here.

File details

Details for the file taskp5-2.1.2-py3-none-any.whl.

File metadata

  • Download URL: taskp5-2.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.4 Windows/10

File hashes

Hashes for taskp5-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a996afb4dce90a95c88eb8ab4a0b4d61e1b9ab5b79b5b7cdbe10a920c9dd7f01
MD5 8b17a228bddd0da23134b31848a5f456
BLAKE2b-256 e9d750887f885ef76069e20bae5cc29e26570defcb0ddbda1963cb083b630c3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page