Skip to main content

Store and cache things anywhere

Project description

anystore on pypi Python test and package pre-commit Coverage Status GPL-3.0 License

anystore

Store anything anywhere. A wrapper around wrappers to avoid boilerplate code (because we are lazy).

anystore helps you to transfer data from and to a various range of sources (local filesystem, http, s3, redis, sql, ...) with a unified high-level interface. It's main use case is to store data pipeline outcomes in a distributed cache, so that different programs or coworkers can access intermediate results based on different settings (e.g. testing: use local cache store, production: cache to s3 bucket)

Why?

In our several data engineering projects we always wrote boilerplate code that handles the featureset of anystore but not in a reusable way.

This library shall be a thin and stable foundation for data wrangling related python programs.

Overview

anystore is built on top of fsspec and provides an easy wrapper for reading and writing content from and to arbitrary locations:

command line

anystore -i ./local/foo.txt -o s3://mybucket/other.txt

echo "hello" | anystore -o sftp://user:password@host:/tmp/world.txt

anystore -i https://investigativedata.io > index.html

python

from anystore.io import smart_read, smart_write

data = smart_read("s3://mybucket/data.txt")
smart_write(".local/data", data)

Simple key/value store

anystore can use a configurable store:

command line

anystore --store .cache put foo "bar"

anystore --store .cache get foo
# "bar"

python

from anystore import Store

# pass through `fsspec` configuration for specific storage backend:
store = Store(uri="s3://mybucket/data", backend_config={"client_kwargs":{
    "aws_access_key_id": "my-key",
    "aws_secret_access_key": "***",
    "endpoint_url": "https://s3.local"
}})

store.get("/2023/1.txt")
store.put("/2023/2.txt", my_data)

Decorate your functions

When working on scripts, one sometimes wants just a simple cache setup. Maybe it should be persistent, maybe even somewhere in the cloud so that another coworker can take over. Maybe we want a different storage during testing our scripts... everything easily handled by anystore:

from anystore import anycache

# use decorator
@anycache(uri="s3://mybucket/cache")
def download_file(url):
    # a very time consuming task
    return result

# 1. time: slow
res = download_file("https://example.com/foo.txt")

# 2. time: fast, as now cached
res = download_file("https://example.com/foo.txt")

Install

pip install anystore

development

This package is using poetry for packaging and dependencies management, so first install it.

Clone this repository to a local destination.

Within the root directory, run

poetry install --with dev

This installs a few development dependencies, including pre-commit which needs to be registered:

poetry run pre-commit install

Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml)

test

make test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anystore-0.1.2.tar.gz (23.7 kB view hashes)

Uploaded Source

Built Distribution

anystore-0.1.2-py3-none-any.whl (26.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page