Skip to main content

Download cloudfront logs from S3 and merge them into a single log file for the day.

Project description

merges3logs

Download cloudfront logs from S3 and merge them into a single log file for the day.

Installation

pip install merges3logs

Usage

Set up the config file as documented below.

Run:

merges3logs path/to/config.ini

This will pick up the logs from yesterday (UTC). To run for a different date, run:

merges3logs path/to/config.ini --date YYYY-MM-DD

When to Run

Some log lines for a given date will show up in the log files dated the following day. This is because the last bit of one day doesn't get flushed until the following day, and the file written will have that days date. These dates are all in UTC. Depending on your logging configuration, it could take hours for all the last logs for a day to get flushed to S3.

So, you probably want to run this program at least a few hours after midnight, UTC.

Any logs you download for the following day are cached and not re-downloaded the next day.

Configuration

The bulk of the configuration is done via a ".ini"-style configuration file. Here is an example:

[AWS]
AccessKey = XXX
SecretKey = XXX

[S3]
BucketName = mylogsbucket
Prefix = path/to/logfileprefix_log-%%Y-%%m-%%d-
MaxWorkers = 10

[Local]
CacheDir = /path/to/mylogsbucketcache
DestDir = /path/to/merged-logs
DestFilename = webworkers-cloudfront-%%Y-%%m-%%d.gz

Details:

  • AWS specifies your access and secret keys.
  • S3.BucketName is the name of your bucket.
  • S3.Prefix is the "prefix" of your log file names for a certain date. An S3 prefix is everything after the bucket name up to and including the date, with the date encoded using "strftime()" format, however the "%"s need to be doubled (because INI format otherwise interprets them).
  • S3.MaxWorkers is the number of download jobs that will run to get logs. Depending on your logging configuration in Cloudfront and how widely your services are accessed, this can be tens or hundreds of thousands of log files a day. So running downloads in parallel can really speed it up.
  • Local.CacheDir is the path to a directory to store the downloaded log files. This directory will need to have a cleanup job set up to prevent it from blowing up.
  • Local.DestDir is the directory that the merged log files will be written to.
  • Local.DestFilename is the name of the file that will be written in the DestDir with "strftime()" format to specify the date.

Cleanup

merges3logs will download the log files into a cache directory, and then work from the files there. You will need a cleanup job to prevent that from growing, for example:

find /path/to/cachedir -type f -mtime +3 -exec rm {} +

You will also need to clean up the destination log directory, though it does grow much more slowly (logs are compressed and only one file per day). Something like using logrotate or:

find /path/to/merged-logs -type f -mtime +60 -exec rm {} +

Author

Written by Sean Reifschneider, Oct 2023.

License

CC0 1.0 Universal, see LICENSE file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merges3logs-0.1.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

merges3logs-0.1-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file merges3logs-0.1.tar.gz.

File metadata

  • Download URL: merges3logs-0.1.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for merges3logs-0.1.tar.gz
Algorithm Hash digest
SHA256 f299a0081480d63da5c430bf0bbd4d6fe0b707e70c734ca41880ce6f0e15cff6
MD5 e1276af9aafdf930eb220613a40ba8be
BLAKE2b-256 045ac2800d0b7c4e3df347ead4638a0a7acca35e5b2ff4a4a6f998278a84bfa0

See more details on using hashes here.

File details

Details for the file merges3logs-0.1-py3-none-any.whl.

File metadata

  • Download URL: merges3logs-0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for merges3logs-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 94ee7b256e55c51ab84b889050b354e90033a768c7cd2afcade5bd0b98a2cafa
MD5 d18036af71be841a8dfd58c7f64c3280
BLAKE2b-256 dcf2176f9a4f0b6df14a192c21c0fa58a2c37d04902c9bf1dbbffb2181400686

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page