Skip to main content

Line stream s3 files into ~uniform lumps in S3

Project description

s3lncoll
========

Read files from S3 as defined by a key prefix and map them by lines to
a set of optionally gzip compressed output files in S3, with the
output files limited by (pre-compressed) file size. The string "{}"
in the output key will be substituted with the (zero-based) index of
the output files.

::

s3lncoll: Line stream s3 files into ~uniform lumps in S3

Usage: s3lncoll {{arguments}} {{options}}

Arguments:
from [text] S3 URL prefix to clump
to [text] S3 URL for target clump ('{}' will be the count)

Options:
-h, --help Show this help message and exit
-H, --HELP Help for all sub-commands
-D, --debug Enable debug logging
-d, --delete Delete source files/keys
-j, --json Validate each line as JSONM
-q, --quiet Be quiet, be vewy vewy quiet
-V, --version Report installed version
-z, --compress Ccompress (gzip) the target(s)
-b, --blocksize [int] Maximum size of pre-compressed output files in bytes. (default: 1048576)


Architecture
============

s3lncoll has a pipe and filter architecture which streams a set of keys as defined by a prefix
through a `LineStream`. `LineStream` reads the files under the keys and spits out a single line
via an iterator. `RotatingFileCtx` receives that stream of lines and aggregates them into chunked
files (of a maximum size or a single line, whichever is the larger), followed by flushing the
lines out to a provided S3 path.

+----------------------------------------------------------------------------------------------------+
| |
| +------------+ +-------------------------+ |
| | | | | |
| bucket Keys | LineStream | Lines | RotatingFileCtx | S3 Files |
| ------------> | | ------------> | | ------------> |
| | | | | |
| +------------+ +-------------------------+ |
| |
| +---------------------------+ |
| | | |
| | cmd.py: Scheduler | |
| | | |
| +---------------------------+ |
| |
| s3lncoll |
+----------------------------------------------------------------------------------------------------+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3lncoll-0.1.post16.tar.gz (24.6 kB view details)

Uploaded Source

File details

Details for the file s3lncoll-0.1.post16.tar.gz.

File metadata

File hashes

Hashes for s3lncoll-0.1.post16.tar.gz
Algorithm Hash digest
SHA256 78e56edecdf62d85e8f7e84ba9d3b0b7eb71ea6067421d3b2e20b4be07e604ff
MD5 a0dbfc1d10eefc716ee0c22ffe9145ff
BLAKE2b-256 cc491a4541f72f3f07a5e8d7808a5b8e0f6a9d9bb9728d98fded2fb6306adcba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page