Line stream s3 files into ~uniform lumps in S3
Project description
s3lncoll
========
Read files from S3 as defined by a key prefix and map them by lines to
a set of optionally gzip compressed output files in S3, with the
output files limited by (pre-compressed) file size. The string "{}"
in the output key will be substituted with the (zero-based) index of
the output files.
::
s3lncoll: Line stream s3 files into ~uniform lumps in S3
Usage: s3lncoll {{arguments}} {{options}}
Arguments:
from [text] S3 URL prefix to clump
to [text] S3 URL for target clump ('{}' will be the count)
Options:
-h, --help Show this help message and exit
-H, --HELP Help for all sub-commands
-D, --debug Enable debug logging
-d, --delete Delete source files/keys
-j, --json Validate each line as JSONM
-q, --quiet Be quiet, be vewy vewy quiet
-V, --version Report installed version
-z, --compress Ccompress (gzip) the target(s)
-b, --blocksize [int] Maximum size of pre-compressed output files in bytes. (default: 1048576)
Architecture
============
s3lncoll has a pipe and filter architecture which streams a set of keys as defined by a prefix
through a `LineStream`. `LineStream` reads the files under the keys and spits out a single line
via an iterator. `RotatingFileCtx` receives that stream of lines and aggregates them into chunked
files (of a maximum size or a single line, whichever is the larger), followed by flushing the
lines out to a provided S3 path.
+----------------------------------------------------------------------------------------------------+
| |
| +------------+ +-------------------------+ |
| | | | | |
| bucket Keys | LineStream | Lines | RotatingFileCtx | S3 Files |
| ------------> | | ------------> | | ------------> |
| | | | | |
| +------------+ +-------------------------+ |
| |
| +---------------------------+ |
| | | |
| | cmd.py: Scheduler | |
| | | |
| +---------------------------+ |
| |
| s3lncoll |
+----------------------------------------------------------------------------------------------------+
========
Read files from S3 as defined by a key prefix and map them by lines to
a set of optionally gzip compressed output files in S3, with the
output files limited by (pre-compressed) file size. The string "{}"
in the output key will be substituted with the (zero-based) index of
the output files.
::
s3lncoll: Line stream s3 files into ~uniform lumps in S3
Usage: s3lncoll {{arguments}} {{options}}
Arguments:
from [text] S3 URL prefix to clump
to [text] S3 URL for target clump ('{}' will be the count)
Options:
-h, --help Show this help message and exit
-H, --HELP Help for all sub-commands
-D, --debug Enable debug logging
-d, --delete Delete source files/keys
-j, --json Validate each line as JSONM
-q, --quiet Be quiet, be vewy vewy quiet
-V, --version Report installed version
-z, --compress Ccompress (gzip) the target(s)
-b, --blocksize [int] Maximum size of pre-compressed output files in bytes. (default: 1048576)
Architecture
============
s3lncoll has a pipe and filter architecture which streams a set of keys as defined by a prefix
through a `LineStream`. `LineStream` reads the files under the keys and spits out a single line
via an iterator. `RotatingFileCtx` receives that stream of lines and aggregates them into chunked
files (of a maximum size or a single line, whichever is the larger), followed by flushing the
lines out to a provided S3 path.
+----------------------------------------------------------------------------------------------------+
| |
| +------------+ +-------------------------+ |
| | | | | |
| bucket Keys | LineStream | Lines | RotatingFileCtx | S3 Files |
| ------------> | | ------------> | | ------------> |
| | | | | |
| +------------+ +-------------------------+ |
| |
| +---------------------------+ |
| | | |
| | cmd.py: Scheduler | |
| | | |
| +---------------------------+ |
| |
| s3lncoll |
+----------------------------------------------------------------------------------------------------+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
s3lncoll-0.1.post16.tar.gz
(24.6 kB
view details)
File details
Details for the file s3lncoll-0.1.post16.tar.gz
.
File metadata
- Download URL: s3lncoll-0.1.post16.tar.gz
- Upload date:
- Size: 24.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78e56edecdf62d85e8f7e84ba9d3b0b7eb71ea6067421d3b2e20b4be07e604ff |
|
MD5 | a0dbfc1d10eefc716ee0c22ffe9145ff |
|
BLAKE2b-256 | cc491a4541f72f3f07a5e8d7808a5b8e0f6a9d9bb9728d98fded2fb6306adcba |