Skip to main content

A simple, intuitive, flexible interface for Amazon S3

Project description

yes3

A library for intuitive reading, writing, listing, and caching with AWS S3 (Simple Storage Service).

This library wraps the boto3 S3 API boilerplate with a simple and intuitive interface, path flexibility, and powerful utilities for easily listing, reading, and writing data on/from/to S3.

Installation

Using a virtual environment is recommended. The simplest way to install is with pip: latest

pip install yes3

Alternatively, you can install the latest version from github:

pip install git+https://github.com/eschombu/yes3.git

To run tests and test scripts, and/or contribute to yes3, clone this repository from https://github.com/eschombu/yes3.git, and install the dev requirements:

git clone https://github.com/eschombu/yes3.git
cd yes3
# Optionally create a virtual environment:
# python3.1x -m venv .venv/yes3
# source .venv/yes3/bin/activate
pip install -e .[dev]
pytest

TODO

  1. Documentation
  2. Replace message printing with loggers

Usage

S3 Locations and Paths

The boto3 APIs for S3 typically consider the 'bucket' and 'key' of an S3 object:

import boto3
s3_client = boto3.client('s3')
s3_client.download_file('my-bucket', 'key/to/object', 'path/to/local/file')

The awscli uses urls:

aws s3 cp s3://my-bucket/key/to/object path/to/local/file

In yes3, we accept either, attempting to flexibly interpret input arguments as S3 locations and local paths, converting S3 locations into S3Location objects:

from yes3 import s3, S3Location

# The following download calls are equivalent
s3.download('s3://my-bucket/key/to/object', 'path/to/local/file')
s3.download('my-bucket', 'key/to/object', 'path/to/local/file')

s3_loc = S3Location('s3://my-bucket/key/to/object')
print(s3_loc.bucket)  # 'my-bucket'
print(s3_loc.key)  # 'key/to/object'
print(s3_loc.exists())  # True
print(s3_loc.is_bucket())  # False
print(s3_loc.is_dir())  # False
print(s3_loc.is_object())  # True
s3.download(s3_loc, 'path/to/local/file')

If the local path is to a directory, the object will be downloaded with the filename inferred from the S3 path. Recursive downloads are also supported.

s3_dir = S3Location('s3://my-bucket/path/to/dir')
print(s3_dir.is_dir())  # True
print(s3_dir.is_object())  # False
s3.download(s3_dir, 'local_dir/')  # raises ValueError because s3_dir is not a single S3 object
s3.download(s3_dir, 'local_dir/', recursive=True)  # downloads all objects to the `local_dir` directory (which is created if it does not already exist)

Direct read/write functions are also available: s3.read, s3.write_to_s3 (which actually creates a local temp file, which is removed afterwards), and s3.touch.

Convenient object and directory listing methods are available:

  • s3.list_objects: list all objects with the specified prefix
  • s3.list_dir: List objects and directories only up to the specified depth (default: 1). S3 does not actually have a directory structure, but this function works as if it does.

Easy key-based caching utilities, for local, S3, and multi-location caches

To quickly and easily cache data, and allow for such a cache to be synced across devices, this package includes Cache classes, which include LocalDiskCache and S3Cache subclasses, as well as a MultiCache which can utilize multiple cache locations. Caching is key-value based, with customizable serializers that can store objects with pickle or alternative data/file formats.

A helper function, setup_cache, provides a simple interface to create a Cache object with the default PickleSerializer serializer:

from yes3.caching import setup_cache

local_cache = setup_cache('path/to/cache/dir')
s3_cache = setup_cache('s3://my-bucket/cache/dir/prefix')

if 'data' in s3_cache:
    data = s3_cache['data']
else:
    data = expensive_data_processing(args)
if 'data' not in local_cache:
    local_cache['data'] = data

multi_cache = MultiCache([local_cache, s3_cache])
multi_cache.sync_now()  # Add any data missing found in either cache to the one in which it is missing
multi_cache.sync_always()  # Keep the caches synced moving forward

new_data = get_more_data()
multi_cache.put('new_data', new_data)
print('new_data' in local_cache)  # True
print('new_data' in s3_cache)  # True

from yes3 import s3
for loc in s3.list_objects(s3_cache.path):
    print(loc.s3_uri)
# 's3://my-bucket/cache/dir/prefix/data.meta'
# 's3://my-bucket/cache/dir/prefix/data.pkl'
# 's3://my-bucket/cache/dir/prefix/new_data.meta'
# 's3://my-bucket/cache/dir/prefix/new_data.pkl'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yes3-0.1.5.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yes3-0.1.5-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file yes3-0.1.5.tar.gz.

File metadata

  • Download URL: yes3-0.1.5.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for yes3-0.1.5.tar.gz
Algorithm Hash digest
SHA256 3ee484ee2587d170a61f786cdb6d423155b822e27eb9bd0ce5cb900d0221e7f4
MD5 11485422e2a3ae540aef67a66d6ae07f
BLAKE2b-256 b57916d19714dfe28ff6fe1160a05dcd772a6d8a2b8536a631fdcb5e96ce9801

See more details on using hashes here.

File details

Details for the file yes3-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: yes3-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for yes3-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 bb9d0d826c48cae9c3237dc5792da7dbef9641e0e35ae7ff4a021f6badd621b7
MD5 4667d6d9b116d6d36e447894aee4f599
BLAKE2b-256 6ee842b92914b2335a15db28286f68266ff9d2acbfebd78d2928c7a4c59a9c5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page