Skip to main content

Make S3 file object read/write easier, support raw file, csv, parquet, pandas.DataFrame.

Project description

Documentation Status https://travis-ci.org/MacHu-GWU/s3iotools-project.svg?branch=master https://codecov.io/gh/MacHu-GWU/s3iotools-project/branch/master/graph/badge.svg https://img.shields.io/pypi/v/s3iotools.svg https://img.shields.io/pypi/l/s3iotools.svg https://img.shields.io/pypi/pyversions/s3iotools.svg https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social
https://img.shields.io/badge/Link-Document-blue.svg https://img.shields.io/badge/Link-API-blue.svg https://img.shields.io/badge/Link-Source_Code-blue.svg https://img.shields.io/badge/Link-Install-blue.svg https://img.shields.io/badge/Link-GitHub-blue.svg https://img.shields.io/badge/Link-Submit_Issue-blue.svg https://img.shields.io/badge/Link-Request_Feature-blue.svg https://img.shields.io/badge/Link-Download-blue.svg

Welcome to s3iotools Documentation

Usage

Copy local file to s3 and download file object from s3 to local is easy:

from s3iotools import S3FileObject

s3obj = S3FileObject(bucket="my-bucket", key="hello.txt", path="hello.txt")

# get started, now we don't have file either on local or on s3
if s3obj.path_obj.exists():
    s3obj.path_obj.remove()
assert s3obj.exists_on_local() is False
assert s3obj.exists_on_s3() is False

s3obj.path_obj.write_text("hello world", encoding="utf-8)
assert s3obj.exists_on_local() is True

s3obj.copy_to_s3()
assert s3obj.exists_on_s3() is True

s3obj.path_obj.remove()
assert s3obj.exists_on_local() is False

s3obj.copy_to_local()
assert s3obj.exists_on_local() is True

You can manipulate s3 backed pandas.DataFrame easily:

import boto3
import pandas as pd
from s3iotools import S3Dataframe

session = boto3.Session(profile_name="xxx")
s3 = session.resource("s3")
bucket_name = "my-bucket"
s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)

s3df.to_csv(key="data.csv")
s3df.to_csv(key="data.csv.gz", gzip_compressed=True)

s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.csv")
s3df_new.read_csv()
s3df_new.df # access data

s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.csv.gz")
s3df_new.read_csv(gzip_compressed=True)
s3df_new.df # access data

json IO is similar.

s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)
s3df.to_json(key="data.json.gz", gzip_compressed=True)
s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.json.gz")
s3df_new.read_json(gzip_compressed=True)
s3df_new.df # access data

parquet is a columnar storage format, which is very efficient for OLAP query. You can just put data on S3, then use AWS Athena to query parquet files. parquet IO in s3iotools is easy:

s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)
s3df.to_parquet(key="data.parquet", compression="gzip")
s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.parquet")
s3df_new.read_parquet()
s3df_new.df # access data

s3iotools doesn’t automatically install pyarrow, you can install it with pip install pyarrow.

Install

s3iotools is released on PyPI, so all you need is:

$ pip install s3iotools

To upgrade to latest version:

$ pip install --upgrade s3iotools

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3iotools-0.0.3.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

s3iotools-0.0.3-py2.py3-none-any.whl (34.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file s3iotools-0.0.3.tar.gz.

File metadata

  • Download URL: s3iotools-0.0.3.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.13

File hashes

Hashes for s3iotools-0.0.3.tar.gz
Algorithm Hash digest
SHA256 689ba157b39fadc44e1884344474a57662d80e005238274d184f10b56af6019b
MD5 b61c9dfebc1d4716f5e1dc775023a2aa
BLAKE2b-256 f414f954cae8a80ce3708e53ffce76af826f2651aba55ef3c48e7947055152a6

See more details on using hashes here.

File details

Details for the file s3iotools-0.0.3-py2.py3-none-any.whl.

File metadata

  • Download URL: s3iotools-0.0.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.13

File hashes

Hashes for s3iotools-0.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 35f606a3284835dc1f13353d582d287be6a1fe6f322dc869cd074a0d4236e62d
MD5 4f1d86fb3ed351e3ded2adad133e2ad8
BLAKE2b-256 e24756a8722cfb95009ff6b33cf10eb064005bedf898c49a32edd983ea1ebdb9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page