Skip to main content

stream and (de)serialize s3 objects with no local footprint

Project description

License: MIT

s3-streaming: handling (big) S3 files like regular files

Storing, retrieving and using files in S3 is a regular activity so it should be easy. It should also ...

  • stream the data
  • have an api that is python file-io like
  • handle some of the desearization and compression stuff because why not

Install

pip install s3-streaming

Streaming S3 objects like regular files

The basics

Opening and reading S3 objects is similar to regular python io. The only difference is that you need to provide a boto3.session.Session instance to handle the bucket access.

import boto3
from s3streaming import s3_open


with s3_open('s3://bucket/key', boto_session=boto3.session.Session()) as f:
    for next_line in f:
        print(next_line)

Injecting deserialization and compression handling in stream

Consider a file that is gzip compressed and contains lines of json. There's some boilerplate in dealing with that, but why bother? Just handle that in stream.

from s3streaming import s3_open, deserialize, compression


reader_settings = dict(
  boto_session=boto3.session.Session(),
  deserializer=deserialize.json_lines, 
  compression=compression.gzip
)

with s3_open('s3://bucket/key.gzip', **reader_settings) as f:
    for next_line in f:
        print(next_line.keys())    # because the file was decompressed ...
        print(next_line.values())  #   ... and the json is now a loaded dict!

Other deserialize options include

  • csv
  • csv_as_dict
  • tsv
  • tsv_as_dict
  • string

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

s3_streaming-0.0.3-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file s3_streaming-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: s3_streaming-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.16

File hashes

Hashes for s3_streaming-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4cb79c5271e58d973fd887c7b7332033cfd0b400e98923676fc25b42d85e91d3
MD5 53f47f1b9cd5595c2f460839cc690347
BLAKE2b-256 7622ea30383e94b45c333b26f3c1b4515c520466d7f76766134ac12905f08c9f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page