stream and (de)serialize s3 objects with no local footprint
s3-streaming: handling (big) S3 files like regular files
Storing, retrieving and using files in S3 is a regular activity so it should be easy. It should also ...
- stream the data
- have an api that is python file-io like
- handle some of the desearization and compression stuff because why not
pip install s3-streaming
Streaming S3 objects like regular files
Opening and reading S3 objects is similar to regular python io. The only difference is that you need to provide a
boto3.session.Session instance to handle the bucket access.
import boto3 from s3streaming import s3_open with s3_open('s3://bucket/key', boto_session=boto3.session.Session()) as f: for next_line in f: print(next_line)
Injecting deserialization and compression handling in stream
Consider a file that is
gzip compressed and contains lines of
json. There's some boilerplate in dealing with that,
but why bother? Just handle that in stream.
from s3streaming import s3_open, deserialize, compression reader_settings = dict( boto_session=boto3.session.Session(), deserializer=deserialize.json_lines, compression=compression.gzip ) with s3_open('s3://bucket/key.gzip', **reader_settings) as f: for next_line in f: print(next_line.keys()) # because the file was decompressed ... print(next_line.values()) # ... and the json is now a loaded dict!
deserialize options include
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size s3_streaming-0.0.1-py2-none-any.whl (3.9 kB)||File type Wheel||Python version py2||Upload date||Hashes View hashes|
Hashes for s3_streaming-0.0.1-py2-none-any.whl