Skip to main content

Small and powerful real-time python nginx access log parser and analyzer with top-like style support

Project description

Inspired by ngxtop

nginxpla is console nginx’s log parser and analyser written in python. Fully configurable reports and templates. Like ngxtop it allows build top-like custom reports by chosen metrics. I have tried my best to do it customizable and extendable.

nginxpla is very powefull for troubleshooting and monitoring your Nginx server here and now. It is not suitable for long-term monitoring, because under the hood it has sqlite3. Performance can degrade when large amounts of data accumulate. So, you warned.

nginxpla is config-based utility. It means after first run it create in users home directory folder .nginxpla with config file in yaml format. When you run nginxpla it loads configuration, such as log_format for file wich you try to analyze and templates with modules. The program is flexible enough in configuration to analyze almost any line-by-line logs that can be parsed by regular expressions. Modular structure with several modules included.

1. Installation

pip install nginxpla
nginxpla --install
nano ~/.nginxpla/nginxpla.yaml

2. Usage

    nginxpla <access-log-file> [options]
    nginxpla <access-log-file> [options] (print) <var> ...
    nginxpla (-h | --help)
    nginxpla --version

    -l <file>, --access-log <file>  access log file to parse.
    -f <format>, --log-format <format>  log format as specify in log_format directive. [default: combined]
    -i <seconds>, --interval <seconds>  report interval when running in --top mode [default: 2.0]
    -t <template>, --template <template>  use template from config file [default: main]
    -m <modules>, --modules <modules>  comma separated module list [default: all]

    --info  print configuration info for access_log
    --top  watch for new lines as they are written to the access log.

    -g <var>, --group-by <var>  group by variable [default: ]
    -w <var>, --having <expr>  having clause [default: 1]
    -o <var>, --order-by <var>  order of output for default query [default: count]
    -n <number>, --limit <number>  limit the number of records included in report [default: 10]
    -a <exp> ..., --a <exp> ...  add exp (must be aggregation exp: sum, avg, min, max, etc.) into output

    -v, --verbose  more verbose output
    -d, --debug  print every line and parsed record
    -h, --help  print this help message.
    --version  print version information.

    -c <file>, --config <file>  nginxpla config file path.
    -e <filter-expression>, --filter <filter-expression>  filter in, records satisfied given expression are processed.
    -p <filter-expression>, --pre-filter <filter-expression>  in-filter expression to check in pre-parsing phase.
    -s <sql-request>, --sql <sql-request>  raw Sql in sqlite format. Table with data is log
    --fields <fields>  Fields to import in sqllite log table, for example, --fields user_agent,status

    Print statistics for default template
    $ nginxpla access_log

    Select All indexed data from base
    $ nginxpla access_log --sql select * from log

    Select All indexed data from base
    $ nginxpla access_log --sql 'SELECT user_agent, status, count(1) AS count FROM log GROUP BY user_agent, status ORDER BY count DESC LIMIT 100' --fields user_agent,status


After install configure logs-section:

        log_path_regexp: 'mydomain\.access\.log'
        format: "default"
        log_path_regexp: 'second_domain_name\.access\.log'
        format: "custom"
        log_path_regexp: '.*'
        format: "combined"

If you use custom nginx log_format or you want configure something different you can define formats in section:

    default: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_x_forwarded_for"'
    combined: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
    custom: '$http_x_forwarded_for - [$time_local] "$host" "$request" $status ($bytes_sent) "$http_referer" "$uri $args" "$http_user_agent" [$request_time] [$upstream_response_time]'

Important: After parse $variables will be columns in databse with same name and you can operate them

regex_formats-section do the same as formats. If you regex-guru you can speed-up parse by regex. regex_formats is prefered than simple way, if defined format and regex_format with the same name, regex_format will be used.

SQL suffix

For better visualization I have add suffixes. Just add it to column name in SQL and all row of data will be formatted. Sql suffix itself will be removed from result table column name.

_human_size — size-formatter, convert digits like this 4399151 to this 4,20Mb


$ nginxpla access_log --fields request_path,body_bytes_sent query SELECT request_path, sum(body_bytes_sent) as bytes_sent_human_size GROUP BY request_path ORDER BY bytes_sent_human_size DESC LIMIT 10

Report Table Column Human Name

All column names from SQL will be transform to string with space-separated words. But in your sql you should use original column names.

$ nginxpla access_log --fields se,request_path --filter="se=='Google Bot'" query 'SELECT request_path as request_path_by_google_bot, count(1) as count FROM log GROUP BY request_path ORDER BY count DESC LIMIT 10'

| Request Path By Google Bot   |   Count |
| /c/202060826/new             |      68 |
| /c/202060826/discount        |      29 |
| /c/202001900                 |      28 |
| /c/202001107                 |      22 |
| /c/1000008746                |      17 |
| /c/202060845                 |      17 |
| /c/202000010                 |      16 |
| /c/202061131                 |      16 |
| /c/202062183/new             |      16 |
| /c/202061132                 |      15 |

running for 18 seconds, 33923 records processed: 1789.62 req/sec

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nginxpla-0.0.6.tar.gz (23.8 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page