Skip to main content

NoSQL Abstraction Library

Project description

NoSQL Abstraction Library

Basic CRUD and query support for NoSQL databases, allowing for portable cloud native applications

  • AWS DynamoDB
  • Azure Cosmos NoSQL

This library is not intended to create databases/tables, use Terraform/ARM/CloudFormation etc for that

Why not just use the name 'nosql' or 'pynosql'? because they already exist on pypi :-)

testscodecov

Installation

pip install 'abnosql[dynamodb]'
pip install 'abnosql[cosmos]'

For optional client side field level envelope encryption

pip install 'abnosql[aws-kms]'
pip install 'abnosql[azure-kms]'

By default, abnosql does not include database dependencies. This is to facilitate packaging abnosql into AWS Lambda or Azure Functions (for example), without over-bloating the packages

Usage

from abnosql import table
import os

os.environ['ABNOSQL_DB'] = 'dynamodb'

item = {
    'hk': '1',
    'rk': 'a',
    'num': 5,
    'obj': {
        'foo': 'bar',
        'num': 5,
        'list': [1, 2, 3],
    },
    'list': [1, 2, 3],
    'str': 'str'
}

tb = table('mytable')

tb.put_item(item)
tb.put_items([item])

# note partition/hash key should be first kwarg
assert tb.get_item(hk='1', rk='a') == item

assert tb.query({'hk': '1'})['items'] == [item]

# scan
assert tb.query()['items'] == [item]

# be careful not to use cloud specific statements!
assert tb.query_sql(
    'SELECT * FROM mytable WHERE hk = @hk AND num > @num',
    {'@hk': '1', '@num': 4}
)['items'] == [item]

tb.delete_item({'hk': '1', 'rk': 'a'})

API Docs

See API Docs

Querying

query() performs DynamoDB Query using KeyConditionExpression (if key supplied) and exact match on FilterExpression if filters are supplied. For Cosmos, SQL is generated. This is the safest/most cloud agnostic way to query and probably OK for most use cases.

query_sql() performs Dynamodb ExecuteStatement passing in the supplied PartiQL statement. Cosmos uses the NoSQL SELECT syntax.

During mocked tests, SQLGlot is used to execute the statement, so results may differ...

Care should be taken with query_sql() to not to use SQL features that are specific to any specific provider (breaking the abstraction capability of using abnosql in the first place)

Indexes

Beyond partition and range keys defined on the table, indexes are not currently supported - and these will likey differ between providers anyway (eg DynamoDB supports Secondary Indexes, whereas Cosmos has Range, Spatial and Composite.

Partition Keys

A few methods such as get_item(), delete_item() and query() need to know partition/hash keys as defined on the table. To avoid having to configure this or lookup from the provider, the convention used is that the first kwarg or dictionary item is the partition key, and if supplied the 2nd is the range/sort key.

Client Side Encryption

If configured in table config with kms attribute, abnosql will perform client side encryption using AWS KMS or Azure KeyVault

Each attribute value defined in the config is encrypted with a 256-bit AES-GCM data key generated for each attribute value:

  • aws uses AWS Encryption SDK for Python
  • azure uses python cryptography to generate AES-GCM data key, encrypt the attribute value and then uses an RSA CMK in Azure Keyvault to wrap/unwrap (envelope encryption) the AES-GCM data key. The module uses the azure-keyvaults-keys python SDK for wrap/unrap functionality of the generated data key (Azure doesnt support generate data key as AWS does)

Both providers use a 256-bit AES-GCM generated data key with AAD/encryption context (Azure provider uses a 96-nonce). AES-GCM is an Authenticated symmetric encryption scheme used by both AWS and Azure (and Hashicorp Vault)

See also AWS Encryption Best Practices

Example config:

{
    'kms': {
        'key_ids': ['https://foo.vault.azure.net/keys/bar/45e36a1024a04062bd489db0d9004d09'],
        'key_attrs': ['hk', 'rk'],
        'attrs': ['obj', 'str']
    }
}

Where:

  • key_ids: list of AWS KMS Key ARNs or Azure KeyVault identifier (URL to RSA CMK). This is picked up via ABNOSQL_KMS_KEYS env var as a comma separated list (NOTE: env var recommended to avoid provider specific code)
  • key_attrs: list of key attributes in the item from which the AAD/encryption context is set
  • attrs: list of attributes keys to encrypt
  • key_bytes: optional for azure, use your own AESGCM key if specified, otherwise generate one

If kms config attribute is present, abnosql will look for the ABNOSQL_KMS provider to load the appropriate provider KMS module (eg "aws" or "azure"), and if not present use default depending on the database (eg cosmos will use azure, dynamodb will use aws)

In example above, the key_attrs ['hk', 'rk'] are used to define the encryption context / AAD used, and attrs ['obj', 'str'] what attributes to encrypt/decrypt

With an item:

{
    'hk': '1',
    'rk': 'b',
    'obj': {'foo':'bar'},
    'str': 'foobar'
}

The encryption context / AAD is set to hk=1 and rk=b and obj and str values are encrypted

If you don't want to use any of these providers, then you can use put_item_pre and get_item_post hooks to perform your own client side encryption

See also AWS Multi-region encryption keys and set ABNOSQL_KMS_KEYS env var as comma list of ARNs

Pagination

query and query_sql accept limit and next optional kwargs and return next in response. Use these to paginate.

This works for AWS DyanmoDB, however Azure Cosmos has a limitation with continuation token for cross partitions queries (see Python SDK documentation). For Cosmos, abnosql appends OFFSET and LIMIT in the SQL statement if not already present, and returns next. limit is defaulted to 100. See the tests for examples

Audit

put_item() and put_items() take an optional user kwarg. If supplied, absnosql will add the following to the item:

  • created_by - value of user, added if does not exist in item supplied to put_item()
  • created_date - UTC ISO timestamp string, added if does not exist
  • modified_by - value of user always added
  • modified_date - UTC ISO timestamp string, always added

Because abnosql doesnt first check if the item already exists, and doesn't support update expressions, there can be a risk with the created* values being re-added if the existing item is not read first and then supplied to put_item(). Its up to application logic to do this - using this feature or not :-)

Configuration

It is recommended to use environment variables where possible to avoid provider specific application code

AWS DynamoDB

Set the following environment variable and use the usual AWS environment variables that boto3 uses

  • ABNOSQL_DB = "dynamodb"

Or set the boto3 session in the config

from abnosql import table
import boto3

tb = table(
    'mytable',
    config={'session': boto3.Session()},
    database='dynamodb'
)

Azure Cosmos NoSQL

Set the following environment variables:

  • ABNOSQL_DB = "cosmos"
  • ABNOSQL_COSMOS_ACCOUNT = your database account
  • ABNOSQL_COSMOS_ENDPOINT = drived from ABNOSQL_COSMOS_ACCOUNT if not set
  • ABNOSQL_COSMOS_CREDENTIAL = your cosmos credential, use Azure Key Vault References if using Azure Functions
  • ABNOSQL_COSMOS_DATABASE = cosmos database

OR - use the connection string format:

  • ABNOSQL_DB = "cosmos://account@credential:database"

Or define in config (though ideally you want to use env vars to avoid application specific code).

from abnosq import table

tb = table(
    'mytable',
    config={'account': 'foo', 'credential': 'someb64key', 'database': 'bar'},
    database='cosmos'
)

Plugins and Hooks

abnosql uses pluggy and registers in the abnosql.table namespace

The following hooks are available

  • set_config - set config
  • get_item_post - called after get_item(), can return modified data
  • put_item_pre
  • put_item_post
  • put_items_post
  • delete_item_post

See the TableSpecs and example test_hooks()

Testing

AWS DynamoDB

Use moto package and abnosql.mocks.mock_dynamodbx

mock_dynamodbx is used for query_sql and only needed if/until moto provides better partiql support

Example:

from abnosql.mocks import mock_dynamodbx 
from moto import mock_dynamodb

@mock_dynamodb
@mock_dynamodbx  # needed for query_sql only
def test_something():
    ...

More examples in tests/test_dynamodb.py

Azure Cosmos NoSQL

Use requests package and abnosql.mocks.mock_cosmos

Example:

from abnosql.mocks import mock_cosmos
import requests

@mock_cosmos
@responses.activate
def test_something():
    ...

More examples in tests/test_cosmos.py

CLI

Small abnosql CLI installed with few of the commands above

Usage: abnosql [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  delete-item
  get-item
  put-item
  put-items
  query
  query-sql

To install dependencies

pip install 'abnosql[cli]'

Example querying table in Azure Cosmos, with cosmos.json config file containing endpoint, credential and database

$ abnosql query-sql mytable 'SELECT * FROM mytable' -d cosmos -c cosmos.json
partkey      id      num  obj                                          list       str
-----------  ----  -----  -------------------------------------------  ---------  -----
p1           p1.1      5  {'foo': 'bar', 'num': 5, 'list': [1, 2, 3]}  [1, 2, 3]  str
p2           p2.1      5  {'foo': 'bar', 'num': 5, 'list': [1, 2, 3]}  [1, 2, 3]  str
p2           p2.2      5  {'foo': 'bar', 'num': 5, 'list': [1, 2, 3]}  [1, 2, 3]  str

Future Enhancements / Ideas

  • client side encryption
  • test pagination & exception handling
  • Google Firestore support, ideally in the core library (though could be added outside via use of the plugin system). Would need something like FireSQL implemented for oython, maybe via sqlglot
  • Google Vault KMS support
  • Hashicorp Vault KMS support
  • Simple caching (maybe) using globals (used for AWS Lambda / Azure Functions)
  • PostgresSQL support using JSONB column (see here for example). Would be nice to avoid an ORM and having to define a model for each table...
  • blob storage backend? could use something similar to NoDB but maybe combined with smart_open and DuckDB's Hive Partitioning
  • Redis..
  • Hook implementations to write to ElasticSearch / OpenSearch for better searching. Useful when not able to use AWS Stream Processors Azure Change Feed, or Elasticstore. Why? because not all databases support stream processing, and if they do you don't want the hastle of using CDC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abnosql-0.0.3.tar.gz (30.9 kB view hashes)

Uploaded Source

Built Distribution

abnosql-0.0.3-py3-none-any.whl (32.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page