Skip to main content
Python Software Foundation 20th Year Anniversary Fundraiser  Donate today!

Line stream s3 files into ~uniform lumps in S3

Project description

s3lncoll
========

Read files from S3 as defined by a key prefix and map them by lines to
a set of optionally gzip compressed output files in S3, with the
output files limited by (pre-compressed) file size. The string "{}"
in the output key will be substituted with the (zero-based) index of
the output files.

::

s3lncoll: Line stream s3 files into ~uniform lumps in S3

Usage: s3lncoll {{arguments}} {{options}}

Arguments:
from [text] S3 URL prefix to clump
to [text] S3 URL for target clump ('{}' will be the count)

Options:
-h, --help Show this help message and exit
-H, --HELP Help for all sub-commands
-D, --debug Enable debug logging
-d, --delete Delete source files/keys
-j, --json Validate each line as JSONM
-q, --quiet Be quiet, be vewy vewy quiet
-V, --version Report installed version
-z, --compress Ccompress (gzip) the target(s)
-b, --blocksize [int] Maximum size of pre-compressed output files in bytes. (default: 1048576)


Architecture
============

s3lncoll has a pipe and filter architecture which streams a set of keys as defined by a prefix
through a `LineStream`. `LineStream` reads the files under the keys and spits out a single line
via an iterator. `RotatingFileCtx` receives that stream of lines and aggregates them into chunked
files (of a maximum size or a single line, whichever is the larger), followed by flushing the
lines out to a provided S3 path.

+----------------------------------------------------------------------------------------------------+
| |
| +------------+ +-------------------------+ |
| | | | | |
| bucket Keys | LineStream | Lines | RotatingFileCtx | S3 Files |
| ------------> | | ------------> | | ------------> |
| | | | | |
| +------------+ +-------------------------+ |
| |
| +---------------------------+ |
| | | |
| | cmd.py: Scheduler | |
| | | |
| +---------------------------+ |
| |
| s3lncoll |
+----------------------------------------------------------------------------------------------------+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for s3lncoll, version 0.1.post16
Filename, size File type Python version Upload date Hashes
Filename, size s3lncoll-0.1.post16.tar.gz (24.6 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page