for saving dictionaries using s3 with bz2 compression
Project description
S3Bz
save and load dictionary to s3 using bz compression
full docs here https://thanakijwanavit.github.io/s3bz/
Install
pip install s3bz
How to use
Create a bucket and make sure that it has transfer acceleration enabled
create a buket
aws s3 mb s3://<bucketname>
put transfer acceleration
aws s3api put-bucket-accelerate-configuration --bucket <bucketname> --accelerate-configuration Status=Enabled
First, import the s3 module
import package
from importlib import reload
from s3bz.s3bz import S3
set up dummy data
BZ2 compression
save object using bz2 compression
result = S3.save(key = key,
objectToSave = sampleDict,
bucket = bucket,
user=USER,
pw = PW,
accelerate = True)
print(('failed', 'success')[result])
success
load object with bz2 compression
result = S3.load(key = key,
bucket = bucket,
user = USER,
pw = PW,
accelerate = True)
print(result[0])
{'ib_prcode': '23238', 'ib_brcode': '1015', 'ib_cf_qty': '703', 'new_ib_vs_stock_cv': '768'}
other compressions
Zl : zlib compression with json string encoding pklzl : zlib compression with pickle encoding
print(bucket)
%time S3.saveZl(key,sampleDict,bucket)
%time S3.loadZl(key,bucket)
%time S3.savePklZl(key,sampleDict,bucket)
%time result =S3.loadPklZl(key,bucket)
pybz-test
CPU times: user 23.9 ms, sys: 559 µs, total: 24.5 ms
Wall time: 155 ms
CPU times: user 28.3 ms, sys: 3.04 ms, total: 31.4 ms
Wall time: 154 ms
CPU times: user 21.6 ms, sys: 228 µs, total: 21.9 ms
Wall time: 151 ms
CPU times: user 31.6 ms, sys: 0 ns, total: 31.6 ms
Wall time: 114 ms
Bring your own compressor and encoder
import gzip, json
compressor=lambda x: gzip.compress(x)
encoder=lambda x: json.dumps(x).encode()
decompressor=lambda x: gzip.decompress(x)
decoder=lambda x: json.loads(x.decode())
%time S3.generalSave(key, sampleDict, bucket = bucket, compressor=compressor, encoder=encoder )
%time result = S3.generalLoad(key, bucket , decompressor=decompressor, decoder=decoder)
assert result == sampleDict, 'not the same as sample dict'
CPU times: user 31 ms, sys: 0 ns, total: 31 ms
Wall time: 155 ms
CPU times: user 32.5 ms, sys: 51 µs, total: 32.5 ms
Wall time: 115 ms
check if an object exist
result = S3.exist('', bucket, user=USER, pw=PW, accelerate = True)
print(('doesnt exist', 'exist')[result])
exist
presign download object
url = S3.presign(key=key,
bucket=bucket,
expiry = 1000,
user=USER,
pw=PW)
print(url)
https://pybz-test.s3-accelerate.amazonaws.com/test.dict?AWSAccessKeyId=AKIAVX4Z5TKDSNNNULGB&Signature=BR8Laz3uvkNKGh%2FBZ8x7IhRE3OU%3D&Expires=1616667887
download using signed link
from s3bz.s3bz import Requests
result = Requests.getContentFromUrl(url)
File operations
save without compression
inputPath = '/tmp/tmpFile.txt'
key = 'tmpFile'
downloadPath = '/tmp/downloadTmpFile.txt'
with open(inputPath , 'w')as f:
f.write('hello world')
S3.saveFile(key =key ,path = inputPath,bucket = bucket)
##test
S3.exist(key,bucket)
True
load without compression
S3.loadFile(key= key , path = downloadPath, bucket = bucket)
##test
with open(downloadPath, 'r') as f:
print(f.read())
hello world
delete
result = S3.deleteFile(key, bucket)
## test
S3.exist(key,bucket)
False
save and load pandas dataframe
### please install in pandas,
### this is not include in the requirements to minimize the size impact
import pandas as pd
df = pd.DataFrame({'test':[1,2,3,4,5],'test2':[2,3,4,5,6]})
S3.saveDataFrame(bucket,key,df)
S3.loadDataFrame(bucket,key)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| Unnamed: 0 | test | test2 | |
|---|---|---|---|
| 0 | 0 | 1 | 2 |
| 1 | 1 | 2 | 3 |
| 2 | 2 | 3 | 4 |
| 3 | 3 | 4 | 5 |
| 4 | 4 | 5 | 6 |
presign post with conditions
from s3bz.s3bz import ExtraArgs, S3
bucket = 'pybz-test'
key = 'test.dict'
fields = {**ExtraArgs.jpeg}
S3.presignUpload(bucket, key, fields=fields)
{'url': 'https://pybz-test.s3-accelerate.amazonaws.com/',
'fields': {'Content-Type': 'image/jpeg',
'key': 'test.dict',
'AWSAccessKeyId': 'AKIAVX4Z5TKDSNNNULGB',
'policy': 'eyJleHBpcmF0aW9uIjogIjIwMjEtMDMtMjVUMTA6MjQ6NTJaIiwgImNvbmRpdGlvbnMiOiBbeyJidWNrZXQiOiAicHliei10ZXN0In0sIHsia2V5IjogInRlc3QuZGljdCJ9XX0=',
'signature': 'hwC8kIjmjNPU0KT3BE54/TUQ/7w='}}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file s3bz-0.1.27.tar.gz.
File metadata
- Download URL: s3bz-0.1.27.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
527310b574d979a0725cc70d6c752e7db6ccee130ef31c9f880109062f39f115
|
|
| MD5 |
84d3d84970ffcf9ad0b84fa68931773b
|
|
| BLAKE2b-256 |
737ef62e4155ab1260abe69f9ab922be13358b37202ca622516b45a3940b8050
|
File details
Details for the file s3bz-0.1.27-py3-none-any.whl.
File metadata
- Download URL: s3bz-0.1.27-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3dd38675f1b44bac90b6328c9fdb7e4ea207b7c72a20a39198e72a4c8bef6263
|
|
| MD5 |
f097ac55abaf572b87ce3ff70276a2d6
|
|
| BLAKE2b-256 |
b563c3c9709ce515c827888be0137d045e36464db946896fd7f697d7ddd3e54d
|