A wrapper around ArangoDB CRUD HTTP API
Project description
arango_crud
This respository wraps the basic CRUD operations on ArangoDB for practical use. This is not an official library; the official python library is pyArango. The main reason to use this over just requests is authorization and server failure back-off. The main reason to use this over pyArango is thread-safety, simpler interfaces, a more narrow focus, pep8 naming conventions, and complete support for either environment variable configuration or code-as-configuration.
The main reason to use pyArango over arango_crud is field validation and access to AQL. If you want to use ArangoDB as a database, use pyArango or similar. If you want to use ArangoDB as a disk-based cache, use arango_crud or similar.
Note: This package recommends a time-to-live semantic. The TTL may be set to "-1" in environment variables to be disabled, or "None" in code to be disabled. A TTL index is only created when collections are initialized, so if this library is used with TTL disabled and then TTL is enabled, one must manually add the TTL indexes. Besides standard TTL usages, using a TTL means that if there was a bug that leaked keys which was since patched, those keys won't stay around forever. Furthermore, it means a small amount of key leakage, such as through extremely unlikely race conditions which would be expensive in either performance or developer time to fix, is not harmful to the long-term health of the project. https://www.arangodb.com/arangodb-training-center/ttl-indexes/
Note: This is not intended to provide much configurability for creating databases or collections, providing only sane defaults for this particular use-case. It's recommended to use a migration structure where databases and collections are only initialized once and to call the appropriate HTTP endpoint directly: https://www.arangodb.com/docs/stable/http/collection-creating.html
Usage
Installation
Supports python 3.7 or higher.
pip install arango_crud
Initialize
Code-as-configuration BasicAuth
from arango_crud import (
Config, BasicAuth, RandomCluster, StepBackOffStrategy
)
config = Config(
cluster=RandomCluster(urls=['http://127.0.0.1:8529']), # see Cluster Styles
timeout_seconds=3,
back_off=StepBackOffStrategy([0.1, 0.5, 1, 1, 1]), # see Back Off Strategies
auth=BasicAuth(username='root', password=''),
ttl_seconds=31622400
)
Code-as-configuration JWT
from arango_crud import (
Config, JWTAuth, JWTDiskCache, RandomCluster, StepBackOffStrategy
)
config = Config(
cluster=RandomCluster(urls=['http://localhost:8529']),
timeout_seconds=3,
back_off=StepBackOffStrategy(steps=[0.1, 0.5, 1, 1, 1]),
ttl_seconds=31622400,
auth=JWTAuth(
username='root',
password='',
cache=JWTDiskCache( # See JWT Caches
lock_file='.arango_jwt.lock',
lock_time_seconds=10,
store_file='.arango_jwt'
)
)
)
# encouraged for easier performance tracing, not required. happens on first
# request otherwise. Fetches the JWT token if it does not exist.
config.prepare()
Environment variables BasicAuth
test.py
from arango_crud import env_config
config = env_config()
config.prepare() # recommended, not required
run.sh
#!/usr/bin/env bash
# Cluster urls are separated by a comma
export ARANGO_CLUSTER=http://localhost:8529
export ARANGO_CLUSTER_STYLE=random
export ARANGO_TIMEOUT_SECONDS=3
export ARANGO_BACK_OFF=step
export ARANGO_BACK_OFF_STEPS=0.1,0.5,1,1,1
export ARANGO_TTL_SECONDS=31622400
export ARANGO_AUTH=basic
export ARANGO_AUTH_USERNAME=root
export ARANGO_AUTH_PASSWORD=
python test.py
Environment variables JWT
test.py
from arango_crud import env_config
config = env_config()
run.sh
#!/usr/bin/env bash
# Cluster urls are separated by a comma
export ARANGO_CLUSTER=http://localhost:8529
export ARANGO_CLUSTER_STYLE=random
export ARANGO_TIMEOUT_SECONDS=3
export ARANGO_BACK_OFF=step
export ARANGO_BACK_OFF_STEPS=0.1,0.5,1,1,1
export ARANGO_TTL_SECONDS=31622400
export ARANGO_AUTH=jwt
export ARANGO_AUTH_USERNAME=root
export ARANGO_AUTH_PASSWORD=
export ARANGO_AUTH_CACHE=disk
export ARANGO_AUTH_CACHE_LOCK_FILE=.arango_jwt.lock
export ARANGO_AUTH_CACHE_LOCK_TIME_SECONDS=10
export ARANGO_AUTH_CACHE_STORE_FILE=.arango_jwt
python test.py
CRUD
To make these runnable environment variables must be set and ArangoDB needs to be reachable. Here are the configurations for ArangoDB running locally on default development settings:
Windows:
SET ARANGO_CLUSTER=http://localhost:8529
SET ARANGO_CLUSTER_STYLE=random
SET ARANGO_TIMEOUT_SECONDS=3
SET ARANGO_BACK_OFF=step
SET ARANGO_BACK_OFF_STEPS=0.1,0.5,1,1,1
SET ARANGO_TTL_SECONDS=31622400
SET ARANGO_AUTH=basic
SET ARANGO_AUTH_USERNAME=root
SET ARANGO_AUTH_PASSWORD=
*Nix:
#!/usr/bin/env bash
export ARANGO_CLUSTER=http://localhost:8529
export ARANGO_CLUSTER_STYLE=random
export ARANGO_TIMEOUT_SECONDS=3
export ARANGO_BACK_OFF=step
export ARANGO_BACK_OFF_STEPS=0.1,0.5,1,1,1
export ARANGO_TTL_SECONDS=31622400
export ARANGO_AUTH=basic
export ARANGO_AUTH_USERNAME=root
export ARANGO_AUTH_PASSWORD=
from arango_crud import env_config
import time
config = env_config()
config.prepare()
db = config.database('my_db')
db.create_if_not_exists()
coll = db.collection('users')
coll.create_if_not_exists()
# The simplest interface
coll.create_or_overwrite_doc('tj', {'name': 'TJ'})
coll.read_doc('tj') # {'name': 'TJ'}
coll.force_delete_doc('tj') # True
# non-expiring
coll.create_or_overwrite_doc('tj', {'name': 'TJ'}, ttl=None)
coll.force_delete_doc('tj')
# custom expirations with touching. Note that touching a document is not
# a supported atomic operation on ArangoDB and is hence faked with
# read -> compare_and_swap. Presumably if the CAS fails the document was
# touched recently anyway.
coll.create_or_overwrite_doc('tj', {'name': 'TJ'}, ttl=30) # None
coll.touch_doc('tj', ttl=60) # True
coll.force_delete_doc('tj') # True
# Alternative interface. For anything except one-liners, usually nicer.
doc = coll.document('tj')
doc.body['name'] = 'TJ'
doc.create() # True
doc.body['note'] = 'Pretty cool'
doc.compare_and_swap() # True
# We may use etags to avoid redownloading an unchanged document, but be careful
# if you are modifying the body.
# Happy case:
doc2 = coll.document('tj')
doc2.read() # loads {'name': 'TJ', 'note': 'Pretty cool'} from network
doc.read_if_remote_newer() # 304 not modified, returns False
doc2.read_if_remote_newer() # 304 not modified, returns False
doc.body['note'] = 'bar'
doc.compare_and_swap()
doc.read_if_remote_newer() # 304 not modified, returns False
doc2.read_if_remote_newer() # loads {'name': 'TJ', 'note': 'bar'} from network, returns True
# Where it can get dangerous
doc.body['note'] = 'foo'
print(doc.body) # {'name': 'TJ', 'note': 'foo'}
doc.read() # always a complete download
print(doc.body) # {'name': 'TJ', 'note': 'bar'}
doc.read_if_remote_newer() # no changes on server since last read; 304 not modified, returns False
print(doc.body) # {'name': 'TJ', 'note': 'bar'}
doc.body['note'] = 'foo'
print(doc.body) # {'name': 'TJ', 'note': 'foo'}
doc.read_if_remote_newer() # no changes on server since last read; 304 not modified, returns False
print(doc.body) # {'name': 'TJ', 'note': 'foo'}
doc.read()
print(doc.body) # {'name': 'TJ', 'note': 'bar'}
doc.compare_and_delete() # True
# Simple caching
for i in range(2):
doc = coll.document('tj')
hit = doc.read()
if hit:
doc.compare_and_swap() # refreshes TTL, usefulness depends
else:
# .... expensive computation ....
doc.body = {'name': 'TJ', 'note': 'Pretty cool'}
doc.create_or_overwrite()
print(f'cached value (loop {i + 1}/2) (hit: {hit}): {doc.body}')
The following is in a separate code-block and is commented out to prevent
accidentally copy+paste into somewhere it should not be pasted. When running
tests it's helpful to cleanup the collections and databases afterward. It's
encouraged that if you do not need to delete collections and databases on
production these operations are disabled to help prevent developer error, which
is done by setting ARANGO_DISABLE_DATABASE_DELETE
and
ARANGO_DISABLE_COLLECTION_DELETE
to true
These environment variables are
treated as true
unless explicitly set to false
. This is not a substitute
for good backups and should not be considered a security feature.
# coll.force_delete()
# db.force_delete()
Contributing
This package adheres to pep8 guidelines unless an exception is listed in
.flake8
. Comments are explicitly line-broken at 80 characters. Code
complexity measures (AbcComplexity, etc) are not used. This measures code
coverage and a build of below 70% code coverage is considered failing. Note
that the word "unit test" is avoided - if it's possible to test a line of
code without mocking or accessing private variables that is preferred. PRs
which reduce code coverage must include an explanation of why.
The examples directory should not contain non-functional lines of code, so instead of
bar = foo()
if bar is None:
print('Foo gave none!') # prints Foo
else:
print('Something went wrong!')
it should be the easier to read assert variant, which plays friendlier with automated testing that the examples actually work:
bar = foo()
assert bar is None
Hence any PR where the coverage in the examples directory is less than 98%
when running coverage run --rcfile=.coveragerc_examples examples/run_all.py
will have changes requested. The lines which do not run should be only due to
random chance.
This repository is focused specifically on using ArangoDB as a disk-based cache. Functionality which doesn't support that use-case will have their PR closed with the recommendation that they fork. So AQL or graph support would likely be closed, but (bulk) get/set operations or concurrency-safe patches will likely be merged.
Inheritance is to be avoided, preferring delegation which respects contracts.
Interfaces are not included in this, where an interface is a class where all
the functions simply raise NotImplementedError
and there is no constructor.
Setup Development (Windows)
Install ArangoDB on default development settings.
python -m venv venv
python -m pip install --upgrade pip
"venv/Scripts/activate.bat"
python -m pip install -r dev_requirements.txt
"scripts/windows_dev_env.bat"
coverage run -m unittest discover -s tests
coverage combine
coverage report
coverage run --rcfile=.coveragerc_examples examples/run_all.py
coverage report
Setup Development (*Nix)
docker pull arangodb/arangodb
docker run -e ARANGO_NO_AUTH=1 -p 8529/tcp arangodb/arangodb arangod --server-endpoint tcp://0.0.0.0:8529
python -m venv venv
python -m pip install --upgrade pip
. venv/bin/activate
. scripts/nix_dev_env.sh
python -m pip install -r dev_requirements.txt
# This pulls from .coveragerc to handle multiprocessing
coverage run -m unittest discover -s tests
coverage combine
coverage report
coverage run --rcfile=.coveragerc_examples examples/run_all.py
coverage report
Cluster Styles
When working with an ArangoDB cluster, it's important that the clients
distribute their requests amongst the various coordinators. The request
styles supported are random
and weighted-random
. Round-robin and
similar are avoided as they cannot be made thread-safe and performant
without context.
Random
A random url is selected from the cluster for each request with equal probability among all urls.
Weighted Random
A random node in the cluster is selected on each request, except there may be a different probability for different urls. This is useful if, for example, one of the coordinators is running on a larger server than the rest.
Example:
from arango_crud import WeightedRandomCluster
cluster = WeightedRandomCluster(
urls=['http://localhost:8529', 'http://localhost:8530', 'http://localhost:8531'],
weights=[1, 2, 1]
)
This will select port 8529 1/4 of the time, 8530 1/2 of the time, and 8531 1/4
of the time. If one prefers to set the exact percentages just ensure the
weights sum to one (i.e., 0.25, 0.5, 0.25
)
Example environment variables:
#!/usr/bin/env bash
export ARANGO_CLUSTER=http://localhost:8529,http://localhost:8530,http://localhost:8531
export ARANGO_CLUSTER_STYLE=weighted-random
export ARANGO_CLUSTER_WEIGHTS=1,2,1
Alternatives to Environment Variables
Although environment variables are sometimes extremely convenient, they can
also be painful in other development environments. One can painlessly switch
these out for their preferred storage mechanism since env_config
accepts
a dictionary which it uses to load variables from. Note that env_config
will exclusively use that dictionary - it will not fall back and use an
environment variable if something is missing.
The only caveat is that for simplicity of development and to reuse the same
documentation, the keys need to be screaming snake case and it will not
make use of nesting. If one prefers they can massage the data into this format
after loading to get more conventional looking configuration files. One can
also simply massage the data into the arguments for Config
directly.
arango_config.json
{
"ARANGO_CLUSTER": "http://localhost:8529,http://localhost:8530,http://localhost:8531",
"ARANGO_CLUSTER_STYLE": "weighted-random",
"__comment": "... see src/arango_crud/env_config.py for complete argument docs ..."
}
Which allows loading as follows:
from arango_crud import env_config
import json
with open('arango_config.json') as fin:
cfg = json.load(fin)
arango_config = env_config(cfg)
Server Failures
When a request fails due to a server-side issue it's usually desirable to try
again on a new coordinator. A small sleep is also helpful to avoid suddenly
massively spiking traffic to the coordinators whenever they hiccup. This
supports only a step-back-off
policy. If the steps are [0.1, 0.5, 1]
then
on the first server error this waits 0.1 seconds then tries again. If that
also fails this waits 0.5 seconds then tries again. If that fails this waits
1 second then tries again. If that fails, an error is raised.
JWT Locking and Store
It's usually not a good idea to create a lot of new tokens when a client is misbehaving, as token generation is generally meant to be expensive in order to be secure. Hence JWT is necessarily stateful on the Config - rather than just being able to create network requests we first need to fetch the JWT. Furthermore, we may need to refresh the token on arbitrary requests.
The recommended way to handle JWT's cache is JWTDiskCache
. A file will contain
the JWT and some metadata about it, which will be accessed in a safe way for
even highly concurrent environments, meaning that every instance running
arango_crud on the same machine using the same config will share JWT tokens
and will only create/renew the token once per renewal period. This overhead is
extremely minor for non-concurrent environments.
If you're very confident that JWT generation is not going to be a significant
source of load and there is no multithreading, a naive approach can be enabled
with the cache style None
. See the examples jwt_disk_example.py and
jwt_none_example.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arango_crud-1.0.5.tar.gz
.
File metadata
- Download URL: arango_crud-1.0.5.tar.gz
- Upload date:
- Size: 31.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 881b7c493884fec9fa31c0bcaa2e2f914357031f0a4d4a75de514a2ca54a8f1c |
|
MD5 | 816cf8b2bcb2bf183b7d8c9d62d63225 |
|
BLAKE2b-256 | 4217bf375c04f7af5848f7d0fac519438be9d8d80b2aa6c48462517be3d1623c |
File details
Details for the file arango_crud-1.0.5-py3-none-any.whl
.
File metadata
- Download URL: arango_crud-1.0.5-py3-none-any.whl
- Upload date:
- Size: 30.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.4.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c47a6ff1652e1c0918a64bc04ac3c9a505edf48c42f3c3acb5c8c54dac90d093 |
|
MD5 | 62555bc0b4763f3ece145464f5d41360 |
|
BLAKE2b-256 | 276f6c339a18d4120b6088b54e8f1bc1373ec26a461ffb278bb91a29c6a0b457 |