for saving dictionaries using s3 with bz2 compression
Project description
S3Bz
save and load dictionary to s3 using bz compression
full docs here https://thanakijwanavit.github.io/s3bz/
Install
pip install s3bz
How to use
Create a bucket and make sure that it has transfer acceleration enabled
create a buket
aws s3 mb s3://<bucketname>
put transfer acceleration
aws s3api put-bucket-accelerate-configuration --bucket <bucketname> --accelerate-configuration Status=Enabled
First, import the s3 module
import package
from importlib import reload
from s3bz.s3bz import S3
set up dummy data
BZ2 compression
save object using bz2 compression
result = S3.save(key = key,
objectToSave = sampleDict,
bucket = bucket,
user=USER,
pw = PW,
accelerate = True)
print(('failed', 'success')[result])
success
load object with bz2 compression
result = S3.load(key = key,
bucket = bucket,
user = USER,
pw = PW,
accelerate = True)
print(result[0])
{'ib_prcode': '23238', 'ib_brcode': '1015', 'ib_cf_qty': '703', 'new_ib_vs_stock_cv': '768'}
other compressions
Zl : zlib compression with json string encoding pklzl : zlib compression with pickle encoding
print(bucket)
%time S3.saveZl(key,sampleDict,bucket)
%time S3.loadZl(key,bucket)
%time S3.savePklZl(key,sampleDict,bucket)
%time result =S3.loadPklZl(key,bucket)
pybz-test
CPU times: user 23.9 ms, sys: 559 µs, total: 24.5 ms
Wall time: 155 ms
CPU times: user 28.3 ms, sys: 3.04 ms, total: 31.4 ms
Wall time: 154 ms
CPU times: user 21.6 ms, sys: 228 µs, total: 21.9 ms
Wall time: 151 ms
CPU times: user 31.6 ms, sys: 0 ns, total: 31.6 ms
Wall time: 114 ms
Bring your own compressor and encoder
import gzip, json
compressor=lambda x: gzip.compress(x)
encoder=lambda x: json.dumps(x).encode()
decompressor=lambda x: gzip.decompress(x)
decoder=lambda x: json.loads(x.decode())
%time S3.generalSave(key, sampleDict, bucket = bucket, compressor=compressor, encoder=encoder )
%time result = S3.generalLoad(key, bucket , decompressor=decompressor, decoder=decoder)
assert result == sampleDict, 'not the same as sample dict'
CPU times: user 31 ms, sys: 0 ns, total: 31 ms
Wall time: 155 ms
CPU times: user 32.5 ms, sys: 51 µs, total: 32.5 ms
Wall time: 115 ms
check if an object exist
result = S3.exist('', bucket, user=USER, pw=PW, accelerate = True)
print(('doesnt exist', 'exist')[result])
exist
presign download object
url = S3.presign(key=key,
bucket=bucket,
expiry = 1000,
user=USER,
pw=PW)
print(url)
https://pybz-test.s3-accelerate.amazonaws.com/test.dict?AWSAccessKeyId=AKIAVX4Z5TKDSNNNULGB&Signature=BR8Laz3uvkNKGh%2FBZ8x7IhRE3OU%3D&Expires=1616667887
download using signed link
from s3bz.s3bz import Requests
result = Requests.getContentFromUrl(url)
File operations
save without compression
inputPath = '/tmp/tmpFile.txt'
key = 'tmpFile'
downloadPath = '/tmp/downloadTmpFile.txt'
with open(inputPath , 'w')as f:
f.write('hello world')
S3.saveFile(key =key ,path = inputPath,bucket = bucket)
##test
S3.exist(key,bucket)
True
load without compression
S3.loadFile(key= key , path = downloadPath, bucket = bucket)
##test
with open(downloadPath, 'r') as f:
print(f.read())
hello world
delete
result = S3.deleteFile(key, bucket)
## test
S3.exist(key,bucket)
False
save and load pandas dataframe
### please install in pandas,
### this is not include in the requirements to minimize the size impact
import pandas as pd
df = pd.DataFrame({'test':[1,2,3,4,5],'test2':[2,3,4,5,6]})
S3.saveDataFrame(bucket,key,df)
S3.loadDataFrame(bucket,key)
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
Unnamed: 0 | test | test2 | |
---|---|---|---|
0 | 0 | 1 | 2 |
1 | 1 | 2 | 3 |
2 | 2 | 3 | 4 |
3 | 3 | 4 | 5 |
4 | 4 | 5 | 6 |
presign post with conditions
from s3bz.s3bz import ExtraArgs, S3
bucket = 'pybz-test'
key = 'test.dict'
fields = {**ExtraArgs.jpeg}
S3.presignUpload(bucket, key, fields=fields)
{'url': 'https://pybz-test.s3-accelerate.amazonaws.com/',
'fields': {'Content-Type': 'image/jpeg',
'key': 'test.dict',
'AWSAccessKeyId': 'AKIAVX4Z5TKDSNNNULGB',
'policy': 'eyJleHBpcmF0aW9uIjogIjIwMjEtMDMtMjVUMTA6MjQ6NTJaIiwgImNvbmRpdGlvbnMiOiBbeyJidWNrZXQiOiAicHliei10ZXN0In0sIHsia2V5IjogInRlc3QuZGljdCJ9XX0=',
'signature': 'hwC8kIjmjNPU0KT3BE54/TUQ/7w='}}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
s3bz-0.1.27.tar.gz
(15.7 kB
view details)
Built Distribution
s3bz-0.1.27-py3-none-any.whl
(12.8 kB
view details)
File details
Details for the file s3bz-0.1.27.tar.gz
.
File metadata
- Download URL: s3bz-0.1.27.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 527310b574d979a0725cc70d6c752e7db6ccee130ef31c9f880109062f39f115 |
|
MD5 | 84d3d84970ffcf9ad0b84fa68931773b |
|
BLAKE2b-256 | 737ef62e4155ab1260abe69f9ab922be13358b37202ca622516b45a3940b8050 |
File details
Details for the file s3bz-0.1.27-py3-none-any.whl
.
File metadata
- Download URL: s3bz-0.1.27-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3dd38675f1b44bac90b6328c9fdb7e4ea207b7c72a20a39198e72a4c8bef6263 |
|
MD5 | f097ac55abaf572b87ce3ff70276a2d6 |
|
BLAKE2b-256 | b563c3c9709ce515c827888be0137d045e36464db946896fd7f697d7ddd3e54d |