Skip to main content

Package short description.

Project description

Documentation Status https://travis-ci.org/MacHu-GWU/s3iotools-project.svg?branch=master https://codecov.io/gh/MacHu-GWU/s3iotools-project/branch/master/graph/badge.svg https://img.shields.io/pypi/v/s3iotools.svg https://img.shields.io/pypi/l/s3iotools.svg https://img.shields.io/pypi/pyversions/s3iotools.svg https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social
https://img.shields.io/badge/Link-Document-blue.svg https://img.shields.io/badge/Link-API-blue.svg https://img.shields.io/badge/Link-Source_Code-blue.svg https://img.shields.io/badge/Link-Install-blue.svg https://img.shields.io/badge/Link-GitHub-blue.svg https://img.shields.io/badge/Link-Submit_Issue-blue.svg https://img.shields.io/badge/Link-Request_Feature-blue.svg https://img.shields.io/badge/Link-Download-blue.svg

Welcome to s3iotools Documentation

Usage

import boto3
import pandas as pd
from s3iotools.io.dataframe import S3Dataframe

session = boto3.Session(profile_name="xxx")
s3 = session.resource("s3")
bucket_name = "my-bucket"
s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)

s3df.to_csv(key="data.csv")
s3df.to_csv(key="data.csv.gz", gzip_compressed=True)

s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.csv")
s3df_new.read_csv()
s3df_new.df # access data

s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.csv.gz")
s3df_new.read_csv(gzip_compressed=True)
s3df_new.df # access data

json IO is similar.

s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)
s3df.to_json(key="data.json.gz", gzip_compressed=True)
s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.json.gz")
s3df_new.read_json(gzip_compressed=True)
s3df_new.df # access data

parquet is a columnar storage format, which is very efficient for OLAP query. You can just put data on S3, then use AWS Athena to query parquet files. parquet IO in s3iotools is easy:

s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)
s3df.to_parquet(key="data.parquet", compression="gzip")
s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.parquet")
s3df_new.read_parquet()
s3df_new.df # access data

s3iotools doesn’t automatically install pyarrow, you can install it with pip install pyarrow.

Install

s3iotools is released on PyPI, so all you need is:

$ pip install s3iotools

To upgrade to latest version:

$ pip install --upgrade s3iotools

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3iotools-0.0.2.tar.gz (32.5 kB view hashes)

Uploaded Source

Built Distribution

s3iotools-0.0.2-py2.py3-none-any.whl (54.0 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page