Make S3 file object read/write easier, support raw file, csv, parquet, pandas.DataFrame.
Project description
Welcome to s3iotools Documentation
Usage
Copy local file to s3 and download file object from s3 to local is easy:
from s3iotools import S3FileObject
s3obj = S3FileObject(bucket="my-bucket", key="hello.txt", path="hello.txt")
# get started, now we don't have file either on local or on s3
if s3obj.path_obj.exists():
s3obj.path_obj.remove()
assert s3obj.exists_on_local() is False
assert s3obj.exists_on_s3() is False
s3obj.path_obj.write_text("hello world", encoding="utf-8)
assert s3obj.exists_on_local() is True
s3obj.copy_to_s3()
assert s3obj.exists_on_s3() is True
s3obj.path_obj.remove()
assert s3obj.exists_on_local() is False
s3obj.copy_to_local()
assert s3obj.exists_on_local() is True
You can manipulate s3 backed pandas.DataFrame easily:
import boto3
import pandas as pd
from s3iotools import S3Dataframe
session = boto3.Session(profile_name="xxx")
s3 = session.resource("s3")
bucket_name = "my-bucket"
s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)
s3df.to_csv(key="data.csv")
s3df.to_csv(key="data.csv.gz", gzip_compressed=True)
s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.csv")
s3df_new.read_csv()
s3df_new.df # access data
s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.csv.gz")
s3df_new.read_csv(gzip_compressed=True)
s3df_new.df # access data
json IO is similar.
s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)
s3df.to_json(key="data.json.gz", gzip_compressed=True)
s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.json.gz")
s3df_new.read_json(gzip_compressed=True)
s3df_new.df # access data
parquet is a columnar storage format, which is very efficient for OLAP query. You can just put data on S3, then use AWS Athena to query parquet files. parquet IO in s3iotools is easy:
s3df = S3Dataframe(s3_resource=s3, bucket_name=bucket_name)
s3df.df = pd.DataFrame(...)
s3df.to_parquet(key="data.parquet", compression="gzip")
s3df_new = S3Dataframe(s3_resource=s3, bucket_name=bucket_name, key="data.parquet")
s3df_new.read_parquet()
s3df_new.df # access data
s3iotools doesn’t automatically install pyarrow, you can install it with pip install pyarrow.
Install
s3iotools is released on PyPI, so all you need is:
$ pip install s3iotools
To upgrade to latest version:
$ pip install --upgrade s3iotools
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file s3iotools-0.0.3.tar.gz
.
File metadata
- Download URL: s3iotools-0.0.3.tar.gz
- Upload date:
- Size: 27.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 689ba157b39fadc44e1884344474a57662d80e005238274d184f10b56af6019b |
|
MD5 | b61c9dfebc1d4716f5e1dc775023a2aa |
|
BLAKE2b-256 | f414f954cae8a80ce3708e53ffce76af826f2651aba55ef3c48e7947055152a6 |
File details
Details for the file s3iotools-0.0.3-py2.py3-none-any.whl
.
File metadata
- Download URL: s3iotools-0.0.3-py2.py3-none-any.whl
- Upload date:
- Size: 34.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/2.7.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35f606a3284835dc1f13353d582d287be6a1fe6f322dc869cd074a0d4236e62d |
|
MD5 | 4f1d86fb3ed351e3ded2adad133e2ad8 |
|
BLAKE2b-256 | e24756a8722cfb95009ff6b33cf10eb064005bedf898c49a32edd983ea1ebdb9 |