Skip to main content

for saving dictionaries using s3 with bz2 compression

Project description

S3Bz

save and load dictionary to s3 using bz compression

full docs here https://thanakijwanavit.github.io/s3bz/

Install

pip install s3bz

How to use

Create a bucket and make sure that it has transfer acceleration enabled

create a buket

aws s3 mb s3://<bucketname>

put transfer acceleration

aws s3api put-bucket-accelerate-configuration --bucket <bucketname> --accelerate-configuration Status=Enabled

First, import the s3 module

import package

from importlib import reload
from s3bz.s3bz import S3

set up dummy data

BZ2 compression

save object using bz2 compression

result = S3.save(key = key, 
       objectToSave = sampleDict,
       bucket = bucket,
       user=USER,
       pw = PW,
       accelerate = True)
print(('failed', 'success')[result])
success

load object with bz2 compression

result = S3.load(key = key,
       bucket = bucket,
       user = USER,
       pw = PW,
       accelerate = True)
print(result[0])
{'ib_prcode': '23238', 'ib_brcode': '1015', 'ib_cf_qty': '703', 'new_ib_vs_stock_cv': '768'}

other compressions

Zl : zlib compression with json string encoding pklzl : zlib compression with pickle encoding

print(bucket)
%time S3.saveZl(key,sampleDict,bucket)
%time S3.loadZl(key,bucket)
%time S3.savePklZl(key,sampleDict,bucket)
%time result =S3.loadPklZl(key,bucket)
pybz-test
CPU times: user 23.9 ms, sys: 559 µs, total: 24.5 ms
Wall time: 155 ms
CPU times: user 28.3 ms, sys: 3.04 ms, total: 31.4 ms
Wall time: 154 ms
CPU times: user 21.6 ms, sys: 228 µs, total: 21.9 ms
Wall time: 151 ms
CPU times: user 31.6 ms, sys: 0 ns, total: 31.6 ms
Wall time: 114 ms

Bring your own compressor and encoder

import gzip, json
compressor=lambda x: gzip.compress(x)
encoder=lambda x: json.dumps(x).encode()
decompressor=lambda x: gzip.decompress(x)
decoder=lambda x: json.loads(x.decode())

%time S3.generalSave(key, sampleDict, bucket = bucket, compressor=compressor, encoder=encoder )
%time result = S3.generalLoad(key, bucket , decompressor=decompressor, decoder=decoder)
assert result == sampleDict, 'not the same as sample dict'
CPU times: user 31 ms, sys: 0 ns, total: 31 ms
Wall time: 155 ms
CPU times: user 32.5 ms, sys: 51 µs, total: 32.5 ms
Wall time: 115 ms

check if an object exist

result = S3.exist('', bucket, user=USER, pw=PW, accelerate = True)
print(('doesnt exist', 'exist')[result])
exist

presign download object

url = S3.presign(key=key,
              bucket=bucket,
              expiry = 1000,
              user=USER,
              pw=PW)
print(url)
https://pybz-test.s3-accelerate.amazonaws.com/test.dict?AWSAccessKeyId=AKIAVX4Z5TKDSNNNULGB&Signature=BR8Laz3uvkNKGh%2FBZ8x7IhRE3OU%3D&Expires=1616667887

download using signed link

from s3bz.s3bz import Requests
result = Requests.getContentFromUrl(url)

File operations

save without compression

inputPath = '/tmp/tmpFile.txt'
key = 'tmpFile'
downloadPath = '/tmp/downloadTmpFile.txt'
with open(inputPath , 'w')as f:
  f.write('hello world')
S3.saveFile(key =key ,path = inputPath,bucket = bucket)
##test
S3.exist(key,bucket)
True

load without compression

S3.loadFile(key= key , path = downloadPath, bucket = bucket)
##test
with open(downloadPath, 'r') as f:
  print(f.read())
hello world

delete

result = S3.deleteFile(key, bucket)
## test
S3.exist(key,bucket)
False

save and load pandas dataframe

### please install in pandas, 
### this is not include in the requirements to minimize the size impact
import pandas as pd
df = pd.DataFrame({'test':[1,2,3,4,5],'test2':[2,3,4,5,6]})
S3.saveDataFrame(bucket,key,df)
S3.loadDataFrame(bucket,key)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Unnamed: 0 test test2
0 0 1 2
1 1 2 3
2 2 3 4
3 3 4 5
4 4 5 6

presign post with conditions

from s3bz.s3bz import ExtraArgs, S3
bucket = 'pybz-test'
key = 'test.dict'
fields = {**ExtraArgs.jpeg}
S3.presignUpload(bucket, key, fields=fields)
{'url': 'https://pybz-test.s3-accelerate.amazonaws.com/',
 'fields': {'Content-Type': 'image/jpeg',
  'key': 'test.dict',
  'AWSAccessKeyId': 'AKIAVX4Z5TKDSNNNULGB',
  'policy': 'eyJleHBpcmF0aW9uIjogIjIwMjEtMDMtMjVUMTA6MjQ6NTJaIiwgImNvbmRpdGlvbnMiOiBbeyJidWNrZXQiOiAicHliei10ZXN0In0sIHsia2V5IjogInRlc3QuZGljdCJ9XX0=',
  'signature': 'hwC8kIjmjNPU0KT3BE54/TUQ/7w='}}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3bz-0.1.27.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

s3bz-0.1.27-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file s3bz-0.1.27.tar.gz.

File metadata

  • Download URL: s3bz-0.1.27.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for s3bz-0.1.27.tar.gz
Algorithm Hash digest
SHA256 527310b574d979a0725cc70d6c752e7db6ccee130ef31c9f880109062f39f115
MD5 84d3d84970ffcf9ad0b84fa68931773b
BLAKE2b-256 737ef62e4155ab1260abe69f9ab922be13358b37202ca622516b45a3940b8050

See more details on using hashes here.

File details

Details for the file s3bz-0.1.27-py3-none-any.whl.

File metadata

  • Download URL: s3bz-0.1.27-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for s3bz-0.1.27-py3-none-any.whl
Algorithm Hash digest
SHA256 3dd38675f1b44bac90b6328c9fdb7e4ea207b7c72a20a39198e72a4c8bef6263
MD5 f097ac55abaf572b87ce3ff70276a2d6
BLAKE2b-256 b563c3c9709ce515c827888be0137d045e36464db946896fd7f697d7ddd3e54d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page