s3iotools

Make S3 file object read/write easier, support raw file, csv, parquet, pandas.DataFrame.

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
Programming Language

Project description

https://travis-ci.org/MacHu-GWU/s3iotools-project.svg?branch=master

https://codecov.io/gh/MacHu-GWU/s3iotools-project/branch/master/graph/badge.svg

https://img.shields.io/pypi/v/s3iotools.svg

https://img.shields.io/pypi/l/s3iotools.svg

https://img.shields.io/pypi/pyversions/s3iotools.svg

https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social

https://img.shields.io/badge/Link-Document-blue.svg

https://img.shields.io/badge/Link-API-blue.svg

https://img.shields.io/badge/Link-Source_Code-blue.svg

https://img.shields.io/badge/Link-Install-blue.svg

https://img.shields.io/badge/Link-GitHub-blue.svg

https://img.shields.io/badge/Link-Submit_Issue-blue.svg

https://img.shields.io/badge/Link-Request_Feature-blue.svg

https://img.shields.io/badge/Link-Download-blue.svg

Welcome to s3iotools Documentation

Usage

Copy local file to s3 and download file object from s3 to local is easy:

from s3iotools import S3FileObject

s3obj = S3FileObject(bucket="my-bucket", key="hello.txt", path="hello.txt")

# get started, now we don't have file either on local or on s3
if s3obj.path_obj.exists():
    s3obj.path_obj.remove()
assert s3obj.exists_on_local() is False
assert s3obj.exists_on_s3() is False

s3obj.path_obj.write_text("hello world", encoding="utf-8)
assert s3obj.exists_on_local() is True

s3obj.copy_to_s3()
assert s3obj.exists_on_s3() is True

s3obj.path_obj.remove()
assert s3obj.exists_on_local() is False

s3obj.copy_to_local()
assert s3obj.exists_on_local() is True

You can manipulate s3 backed pandas.DataFrame easily:

import boto3
import pandas as pd
from s3iotools import S3Dataframe

session = boto3.Session(profile_name="xxx")
s3 = session.resource("s3")
bucket_name = "my-bucket"
s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)

s3df.to_csv(key="data.csv")
s3df.to_csv(key="data.csv.gz", gzip_compressed=True)

s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.csv")
s3df_new.read_csv()
s3df_new.df # access data

s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.csv.gz")
s3df_new.read_csv(gzip_compressed=True)
s3df_new.df # access data

json IO is similar.

s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)
s3df.to_json(key="data.json.gz", gzip_compressed=True)
s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.json.gz")
s3df_new.read_json(gzip_compressed=True)
s3df_new.df # access data

parquet is a columnar storage format, which is very efficient for OLAP query. You can just put data on S3, then use AWS Athena to query parquet files. parquet IO in s3iotools is easy:

s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)
s3df.to_parquet(key="data.parquet", compression="gzip")
s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.parquet")
s3df_new.read_parquet()
s3df_new.df # access data

s3iotools doesn’t automatically install pyarrow, you can install it with pip install pyarrow.

Install

s3iotools is released on PyPI, so all you need is:

$ pip install s3iotools

To upgrade to latest version:

$ pip install --upgrade s3iotools

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
Programming Language

Release history Release notifications | RSS feed

This version

0.0.3

May 20, 2019

0.0.2

Mar 15, 2019

0.0.1

Feb 27, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3iotools-0.0.3.tar.gz (27.0 kB view details)

Uploaded May 20, 2019 Source

Built Distribution

s3iotools-0.0.3-py2.py3-none-any.whl (34.5 kB view details)

Uploaded May 20, 2019 Python 2 Python 3

File details

Details for the file s3iotools-0.0.3.tar.gz.

File metadata

Download URL: s3iotools-0.0.3.tar.gz
Upload date: May 20, 2019
Size: 27.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.13

File hashes

Hashes for s3iotools-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`689ba157b39fadc44e1884344474a57662d80e005238274d184f10b56af6019b`
MD5	`b61c9dfebc1d4716f5e1dc775023a2aa`
BLAKE2b-256	`f414f954cae8a80ce3708e53ffce76af826f2651aba55ef3c48e7947055152a6`

See more details on using hashes here.

File details

Details for the file s3iotools-0.0.3-py2.py3-none-any.whl.

File metadata

Download URL: s3iotools-0.0.3-py2.py3-none-any.whl
Upload date: May 20, 2019
Size: 34.5 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.13

File hashes

Hashes for s3iotools-0.0.3-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`35f606a3284835dc1f13353d582d287be6a1fe6f322dc869cd074a0d4236e62d`
MD5	`4f1d86fb3ed351e3ded2adad133e2ad8`
BLAKE2b-256	`e24756a8722cfb95009ff6b33cf10eb064005bedf898c49a32edd983ea1ebdb9`