NoSQL Abstraction Library
Project description
NoSQL Abstraction Library
Basic CRUD and query support for NoSQL databases, allowing for portable cloud native applications
- AWS DynamoDB
- Azure Cosmos NoSQL
This library is not intended to create databases/tables, use Terraform/ARM/CloudFormation etc for that
Why not just use the name 'nosql' or 'pynosql'? because they already exist on pypi :-)
- NoSQL Abstraction Library
- Usage
- Configuration
- Plugins and Hooks
- Testing
- CLI
- Future Enhancements / Ideas
Installation
pip install 'abnosql[dynamodb]'
pip install 'abnosql[cosmos]'
For optional client side field level envelope encryption
pip install 'abnosql[aws-kms]'
pip install 'abnosql[azure-kms]'
By default, abnosql does not include database dependencies. This is to facilitate packaging abnosql into AWS Lambda or Azure Functions (for example), without over-bloating the packages
Usage
from abnosql import table
import os
os.environ['ABNOSQL_DB'] = 'dynamodb'
os.environ['ABNOSQL_KEY_ATTRS'] = 'hk,rk'
item = {
'hk': '1',
'rk': 'a',
'num': 5,
'obj': {
'foo': 'bar',
'num': 5,
'list': [1, 2, 3],
},
'list': [1, 2, 3],
'str': 'str'
}
tb = table('mytable')
# create/replace
tb.put_item(item)
# update - using ABNOSQL_KEY_ATTRS
updated_item = tb.put_item(
{'hk': '1', 'rk': 'a', 'str': 'STR'},
update=True
)
assert updated_item['str'] == 'STR'
# bulk
tb.put_items([item])
# note partition/hash key should be first kwarg
assert tb.get_item(hk='1', rk='a') == item
assert tb.query({'hk': '1'})['items'] == [item]
# scan
assert tb.query()['items'] == [item]
# be careful not to use cloud specific statements!
assert tb.query_sql(
'SELECT * FROM mytable WHERE mytable.hk = @hk AND mytable.num > @num',
{'@hk': '1', '@num': 4}
)['items'] == [item]
tb.delete_item({'hk': '1', 'rk': 'a'})
API Docs
See API Docs
Querying
query()
performs DynamoDB Query using KeyConditionExpression (if key
supplied) and exact match on FilterExpression if filters are supplied. For Cosmos, SQL is generated. This is the safest/most cloud agnostic way to query and probably OK for most use cases.
query_sql()
performs Dynamodb ExecuteStatement passing in the supplied PartiQL statement. Cosmos uses the NoSQL SELECT syntax.
During mocked tests, SQLGlot is used to execute the statement, so results may differ...
Care should be taken with query_sql()
to not to use SQL features that are specific to any specific provider (breaking the abstraction capability of using abnosql in the first place)
Indexes
Beyond partition and range keys defined on the table, indexes currently have limited support within abnosql
- The DynamoDB implemention of
query()
allows a secondary index to be specified via optionalindex
kwarg - Cosmos has Range, Spatial and Composite indexes, however the abnosql library does not do anything yet with
index
kwarg inquery()
implementation.
Updates
put_item()
and put_items()
support update
boolean attribute, which if supplied will do an update_item()
on DynamoDB, and a patch_item()
on Cosmos. For this to work however, you must specify the key attribute names, either via ABNOSQL_KEY_ATTRS
env var as a comma separated list (eg perhaps multiple tables all share common partition/range key scheme), or as the key_attrs
config item when instantiating the table, eg:
tb = table('mytable', {'key_attrs': ['hk', 'rk']})
If you don't need to do any updates and only need to do create/replace, then these key attribute names do not need to be supplied
All items being updated must actually exist first, or else exception raised
Partition Keys
A few methods such as get_item()
, delete_item()
and query()
need to know partition/hash keys as defined on the table. To avoid having to configure this or lookup from the provider, the convention used is that the first kwarg or dictionary item is the partition key, and if supplied the 2nd is the range/sort key.
Pagination
query
and query_sql
accept limit
and next
optional kwargs and return next
in response. Use these to paginate.
This works for AWS DyanmoDB, however Azure Cosmos has a limitation with continuation token for cross partitions queries (see Python SDK documentation). For Cosmos, abnosql appends OFFSET and LIMIT in the SQL statement if not already present, and returns next
. limit
is defaulted to 100. See the tests for examples
Audit
put_item()
and put_items()
take an optional audit_user
kwarg. If supplied, absnosql will add the following to the item:
createdBy
- value ofaudit_user
, added if does not exist in item supplied to put_item()createdDate
- UTC ISO timestamp string, added if does not existmodifiedBy
- value ofaudit_user
always addedmodifiedDate
- UTC ISO timestamp string, always added
NOTE: created* will only be added if update
is not True in a put_item()
operation
If you prefer snake_case over CamelCase, you can set env var ABNOSQL_CAMELCASE
= FALSE
Change Feed / Stream Support
AWS DynamoDB Streams allow Lambda functions to be triggered upon create, update and delete table operations. The event sent to the lambda (see aws docs) contains eventName
and eventSourceARN
, where:
eventName
- name of event, egINSERT
,MODIFY
orREMOVE
(see here)eventSourceARN
- ARN of the table name
This allows a single stream processor lambda to process events from multiple tables (eg for writing into ElasticSearch)
Like DynamoDB, Azure CosmosDB supports change feeds, however the event sent to the function (currently) omits the event source (table name) and only delete event names are available if a preview change feed mode is enabled, which needs explicit enablement for.
Because both the eventName and eventSource are ideally needed (irrespective of preview mode or not), abnosql library automatically adds the changeMetadata
to an item during create, update and delete, eg:
item = {
"hk": "1",
"rk": "a",
"changeMetadata": {
"eventName": "INSERT",
"eventSource": "sometable"
}
}
Because no REMOVE event is sent at all without preview change feed mode above - abnosql must first update the item, and then delete it. This is also needed for the eventSource / table name to be captured in the event, so unfortunately until Cosmos supports both attributes, update is needed before a delete. 5 second synchronous sleep is added by default between update and delete to allow CosmosDB to send the update event (0 seconds results in no update event). This can be controlled with ABNOSQL_COSMOS_CHANGE_META_SLEEPSECS
env var (defaults to 5
seconds), and disabled by setting to 0
This behaviour is enabled by default, however can be disabled by setting ABNOSQL_COSMOS_CHANGE_META
env var to FALSE
or cosmos_change_meta=False
in table config. ABNOSQL_CAMELCASE
= FALSE
env var can also be used to change attribute names used to snake_case if needed
To write an Azure Function / AWS Lambda that is able to process both DynamoDB and Cosmos events, look for changeMetadata
first and if present use that otherwise look for eventName
and eventSourceARN
in the event payload assuming its DynamoDB
Client Side Encryption
If configured in table config with kms
attribute, abnosql will perform client side encryption using AWS KMS or Azure KeyVault
Each attribute value defined in the config is encrypted with a 256-bit AES-GCM data key generated for each attribute value:
aws
uses AWS Encryption SDK for Pythonazure
uses python cryptography to generate AES-GCM data key, encrypt the attribute value and then uses an RSA CMK in Azure Keyvault to wrap/unwrap (envelope encryption) the AES-GCM data key. The module uses the azure-keyvaults-keys python SDK for wrap/unrap functionality of the generated data key (Azure doesnt support generate data key as AWS does)
Both providers use a 256-bit AES-GCM generated data key with AAD/encryption context (Azure provider uses a 96-nonce). AES-GCM is an Authenticated symmetric encryption scheme used by both AWS and Azure (and Hashicorp Vault)
See also AWS Encryption Best Practices
Example config:
{
'kms': {
'key_ids': ['https://foo.vault.azure.net/keys/bar/45e36a1024a04062bd489db0d9004d09'],
'key_attrs': ['hk', 'rk'],
'attrs': ['obj', 'str']
}
}
Where:
key_ids
: list of AWS KMS Key ARNs or Azure KeyVault identifier (URL to RSA CMK). This is picked up viaABNOSQL_KMS_KEYS
env var as a comma separated list (NOTE: env var recommended to avoid provider specific code)key_attrs
: list of key attributes in the item from which the AAD/encryption context is set. Taken fromABNOSQL_KEY_ATTRS
env var or tablekey_attrs
if defined thereattrs
: list of attributes keys to encryptkey_bytes
: optional for azure, use your own AESGCM key if specified, otherwise generate one
If kms
config attribute is present, abnosql will look for the ABNOSQL_KMS
provider to load the appropriate provider KMS module (eg "aws" or "azure"), and if not present use default depending on the database (eg cosmos will use azure, dynamodb will use aws)
In example above, the key_attrs ['hk', 'rk']
are used to define the encryption context / AAD used, and attrs ['obj', 'str']
what attributes to encrypt/decrypt
With an item:
{
'hk': '1',
'rk': 'b',
'obj': {'foo':'bar'},
'str': 'foobar'
}
The encryption context / AAD is set to hk=1 and rk=b and obj and str values are encrypted
If you don't want to use any of these providers, then you can use put_item_pre
and get_item_post
hooks to perform your own client side encryption
See also AWS Multi-region encryption keys and set ABNOSQL_KMS_KEYS
env var as comma list of ARNs
Configuration
It is recommended to use environment variables where possible to avoid provider specific application code
if ABNOSQL_DB
env var is not set, abnosql will attempt to apply defaults based on available environment variables:
AWS_DEFAULT_REGION
- sets database todynamodb
(see aws docs)FUNCTIONS_WORKER_RUNTIME
- sets database tocosmos
(see azure docs)
AWS DynamoDB
Set the following environment variable and use the usual AWS environment variables that boto3 uses
ABNOSQL_DB
= "dynamodb"
Or set the boto3 session in the config
from abnosql import table
import boto3
tb = table(
'mytable',
config={'session': boto3.Session()},
database='dynamodb'
)
Azure Cosmos NoSQL
Set the following environment variables:
ABNOSQL_DB
= "cosmos"ABNOSQL_COSMOS_ACCOUNT
= your database accountABNOSQL_COSMOS_ENDPOINT
= drived fromABNOSQL_COSMOS_ACCOUNT
if not setABNOSQL_COSMOS_CREDENTIAL
= your cosmos credential, use Azure Key Vault References if using Azure FunctionsABNOSQL_COSMOS_DATABASE
= cosmos database
OR - use the connection string format:
ABNOSQL_DB
= "cosmos://account@credential:database"
Or define in config (though ideally you want to use env vars to avoid application specific code).
from abnosq import table
tb = table(
'mytable',
config={'account': 'foo', 'credential': 'someb64key', 'database': 'bar'},
database='cosmos'
)
Plugins and Hooks
abnosql uses pluggy and registers in the abnosql.table
namespace
The following hooks are available
set_config
- set configget_item_post
- called afterget_item()
, can return modified dataput_item_pre
put_item_post
put_items_post
delete_item_post
See the TableSpecs and example test_hooks()
Testing
AWS DynamoDB
Use moto
package and abnosql.mocks.mock_dynamodbx
mock_dynamodbx is used for query_sql and only needed if/until moto provides full partiql support
Example:
from abnosql.mocks import mock_dynamodbx
from moto import mock_dynamodb
@mock_dynamodb
@mock_dynamodbx # needed for query_sql only
def test_something():
...
More examples in tests/test_dynamodb.py
Azure Cosmos NoSQL
Use requests
package and abnosql.mocks.mock_cosmos
Example:
from abnosql.mocks import mock_cosmos
import requests
@mock_cosmos
@responses.activate
def test_something():
...
More examples in tests/test_cosmos.py
CLI
Small abnosql CLI installed with few of the commands above
Usage: abnosql [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
delete-item
get-item
put-item
put-items
query
query-sql
To install dependencies
pip install 'abnosql[cli]'
Example querying table in Azure Cosmos, with cosmos.json config file containing endpoint, credential and database
$ abnosql query-sql mytable 'SELECT * FROM mytable' -d cosmos -c cosmos.json
partkey id num obj list str
----------- ---- ----- ------------------------------------------- --------- -----
p1 p1.1 5 {'foo': 'bar', 'num': 5, 'list': [1, 2, 3]} [1, 2, 3] str
p2 p2.1 5 {'foo': 'bar', 'num': 5, 'list': [1, 2, 3]} [1, 2, 3] str
p2 p2.2 5 {'foo': 'bar', 'num': 5, 'list': [1, 2, 3]} [1, 2, 3] str
Future Enhancements / Ideas
- client side encryption
- test pagination & exception handling
- Google Firestore support, ideally in the core library (though could be added outside via use of the plugin system). Would need something like FireSQL implemented for oython, maybe via sqlglot
- Google Vault KMS support
- Hashicorp Vault KMS support
- Simple caching (maybe) using globals (used for AWS Lambda / Azure Functions)
- PostgresSQL support using JSONB column (see here for example). Would be nice to avoid an ORM and having to define a model for each table...
- blob storage backend? could use something similar to NoDB but maybe combined with smart_open and DuckDB's Hive Partitioning
- Redis..
- Hook implementations to write to ElasticSearch / OpenSearch for better searching. Useful when not able to use AWS Stream Processors Azure Change Feed, or Elasticstore. Why? because not all databases support stream processing, and if they do you don't want the hastle of using CDC
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file abnosql-0.0.12.tar.gz
.
File metadata
- Download URL: abnosql-0.0.12.tar.gz
- Upload date:
- Size: 36.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6059815830fe59177ea31341c1337194b8a45b3d6768a771d91660cee8f1810 |
|
MD5 | e31cc1bdb7af1fec5c3dff1e66f42b50 |
|
BLAKE2b-256 | b6a6253405cb34493725b2acbff9ead4dea5fa804804b71ed7beef82c1b6c1a8 |
File details
Details for the file abnosql-0.0.12-py3-none-any.whl
.
File metadata
- Download URL: abnosql-0.0.12-py3-none-any.whl
- Upload date:
- Size: 36.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ea4a4d464aa03d8d4fe982541ddbab1864e4ec6f00326e727cbf57f5d357eac |
|
MD5 | a61e4defe30665dc7eaf83043481c9cd |
|
BLAKE2b-256 | c5e904ad9f5198138428cfb67a83d705a669a7c604e7af0a930f2f877fd3279f |