Skip to main content

Download cloudfront logs from S3 and merge them into a single log file for the day.

Project description

merges3logs

Download cloudfront logs from S3 and merge them into a single log file for the day.

Installation

pip install merges3logs

Usage

Set up the config file as documented below.

Run:

merges3logs path/to/config.ini

This will pick up the logs from yesterday (UTC). To run for a different date, run:

merges3logs path/to/config.ini --date YYYY-MM-DD

When to Run

Some log lines for a given date will show up in the log files dated the following day. This is because the last bit of one day doesn't get flushed until the following day, and the file written will have that days date. These dates are all in UTC. Depending on your logging configuration, it could take hours for all the last logs for a day to get flushed to S3.

So, you probably want to run this program at least a few hours after midnight, UTC.

Any logs you download for the following day are cached and not re-downloaded the next day.

Configuration

The bulk of the configuration is done via a ".ini"-style configuration file. Here is an example:

[AWS]
AccessKey = XXX
SecretKey = XXX

[S3]
#  Bucket to download log files from
BucketName = mylogsbucket
#  The prefix of the logfiles to download.
#  The "%" must be doubled, strftime format specifiers may be used
Prefix = path/to/logfileprefix_log-%%Y-%%m-%%d-
#  Number of parallel downloads to do
MaxWorkers = 10

[Local]
#  Directory to write files downloaded from S3
CacheDir = /path/to/mylogsbucketcache
#  Directory to write merged logfiles to
DestDir = /path/to/merged-logs
#  .gz is added to this logfile name
#  The "%" must be doubled, strftime format specifiers may be used
DestFilename = webworkers-cloudfront-%%Y-%%m-%%d.log
#  Remove the day's cached logfiles after a successful run?
RemoveFiles = False

Details:

  • AWS specifies your access and secret keys.
  • S3.BucketName is the name of your bucket.
  • S3.Prefix is the "prefix" of your log file names for a certain date. An S3 prefix is everything after the bucket name up to and including the date, with the date encoded using "strftime()" format, however the "%"s need to be doubled (because INI format otherwise interprets them).
  • S3.MaxWorkers is the number of download jobs that will run to get logs. Depending on your logging configuration in Cloudfront and how widely your services are accessed, this can be tens or hundreds of thousands of log files a day. So running downloads in parallel can really speed it up.
  • Local.CacheDir is the path to a directory to store the downloaded log files. This directory will need to have a cleanup job set up to prevent it from growing unbounded. See also "Local.RemoveFiles".
  • Local.DestDir is the directory that the merged log files will be written to.
  • Local.DestFilename is the name of the file that will be written in the DestDir with "strftime()" format to specify the date.
  • Local.RemoveFiles, if "True" will delete the days files from the cache directory after a successful run. If "False", they are kept and you will need to set up a cron job or similar to delete them. Probably most useful for testing, so repeated downloads are unnecessary. Default is "True".

Cleanup

merges3logs will download the log files into a cache directory, and then work from the files there. You can use "Local.RemoveFiles" to delete them after the run, or set up a cron job for example:

find /path/to/cachedir -type f -mtime +3 -exec rm {} +

It is probably worthwhile to set up cleaning of the cache anyway, as it can be large and may accumulate files if the program fails for any reason.

You will also need to clean up the destination log directory, though it does grow much more slowly (logs are compressed and only one file per day). Something like using logrotate or:

find /path/to/merged-logs -type f -mtime +60 -exec rm {} +

Author

Written by Sean Reifschneider, Oct 2023.

License

CC0 1.0 Universal, see LICENSE file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merges3logs-1.0.1.tar.gz (9.2 kB view hashes)

Uploaded Source

Built Distribution

merges3logs-1.0.1-py3-none-any.whl (11.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page