The little business intelligence engine that could.
Project description
Nerium
A lightweight Responder-based microservice that submits queries to a database and returns machine-readable serialized results (typically JSON). By analogy with static site generators, Nerium reads its queries and serialization formats from local files, stored on the filesystem. The idea is that report analysts should be able to write queries in their preferred local editor, and upload or mount them where Nerium can use them.
OAO uses Nerium to easily and quickly provide JSON APIs with report results from our PostgreSQL data warehouse.
Nerium features an extendable architecture, allowing support for multiple query types and output formats.
Currently supports SQL queries using the excellent Records library. In keeping with Records usage, query parameters can be specified in key=value
format, and (safely!) injected into your query in :key
format.
In theory, other query types can be added (under nerium/resultset
) for non-SQL query languages. This is a promising area for future development.
Default JSON output represents data
as an array of objects, one per result row, with database column names as keys. The default schema also provides top-level nodes for name
, metadata
, and params
(details below). A compact
JSON output format may also be requested, with separate column
(array of column names) and data
(array of row value arrays) nodes for compactness. Additional formats can be added by adding marshmallow schema definitions to format_files
.
Nerium supports any backend that SQLAlchemy can, but since none of these are hard dependencies, drivers aren't included in Pipfile, and the Dockerfile only supports PostgreSQL. If you want Nerium to work with other databases, you can install Python connectors with pip
, either in a virtualenv or by creating your own Dockerfile using FROM oaodev/nerium
. (To ease installation, options for nerium[mysql]
and nerium[pg]
are provided in setup.py
)
Install/Run
Using Docker
$ docker run -d --name=nerium \
--envfile=.env \
-v /local/path/to/query_files:/app/query_files \
-v /local/path/to/format_files:/app/format_files \
-p 5000:5000 oaodev/nerium
$ curl http://localhost:5000/v1/<query_name>?<params>
Local install
pipenv install nerium[pg] --pre
Then add a query_files
(and, optionally, format_files
) directory to your project, write your queries, and configure the app as described in the next section. The command nerium
starts a local uvicorn
server running the app, listening on port 5042.
Configuration
DATABASE_URL
for query connections must be set in the environment (or in a local .env
file). This is the simplest configuration option.
Because the web service uses Responder, it will also respect PORT
if set in the environment.
Script file paths
By default, Nerium looks for query and format schema files in query_files
and format_files
respectively, in the current working directory from which the service is launched. QUERY_PATH
and FORMAT_PATH
environment variables can optionally be set in order to use files from other locations on the filesystem.
Data Sources
If you want to query multiple databases from a single Nerium installation, any individual query file can define its own database_url
as a key in YAML front matter (see below).
Alternatively, to handle multiple files with the same connection, create a subdirectory for each database under the $QUERY_PATH
, place the related files under their respective directory, and include a separate db.yaml
file per subdirectory, which defines the database_url
key.
NOTE: A database_url
setting in front matter of a particular file will override subdirectory-level db.yaml
setting as well as DATABASE_URL
in the environment.
Usage
Query files
As indicated above, queries are simply text files placed in local query_files
directory, or another arbitrary filesystem location specified by QUERY_PATH
in the environment. The base name of the file (stem
in Python pathlib
parlance) will determine the {query_name}
portion of the matching API endpoint; the file extension (or suffix
) maps to the query type (literally the name of the nerium.resultset
module that will handle the query).
Note that Nerium loads the query files on server start; while adding additional query scripts does not require any code or config changes, the server needs to be restarted in order to use them.
Front matter
Query files can optionally include a YAML front matter block. The front matter goes at the top of the file, set off by triple-dashed lines, as in this example:
---
Author: Joelle van Dyne
Description: Returns all active usernames in the system
---
select username from user;
At present, the Nerium service doesn't do much with the front matter. As noted above, it can be used to specify a database connection for the query. For other keys, the default response format simply passes the keys and values along in a metadata
object. (The compact
formatter simply ignores the metadata.) This mechanism can theoretically be used to pass relevant information about the query along to any clients of the service: for example, the data types of the columns in the results or what have you. Possibilities include whatever a reporting service and front end developer want to coordinate on. Front matter could also be used in more detailed ways in formatters yet to be devised.
Jinja templating
Nerium supports Jinja templating syntax in queries. The most typical use case would be for adding optional filter clauses to a query so that the same SELECT
statement can be reused without having to be repeated in multiple files, for example:
select username
, user_id
, display_name
from user
{% if group %}
where user.group = :group
{% endif %}
Jinja filters and other logic can be applied to inputs, as well.
WARNING!
The Jinja template is rendered in a SandboxedEnvironment
, which should protect against server-side template injection and most SQL injection tactics. It should not be considered perfectly safe, however. Use this feature sparingly, stick with SQLAlchemy-style :key
named parameters for bind value substitutions, and test your specific queries carefully. It should almost go without saying that database permission grants to the user Nerium connects as should be well-restricted, whether one is using Jinja syntax or not.
One known dangerous case is if your entire query file just does a Jinja variable expansion and nothing else: {{ my_whole_query }}
. This will allow execution of arbitrary SQL and you should never make a template like this available.
Custom format files
For serialization formats besides the built-in default and compact
, schema definitions can be added to your format_files
directory, using the Python marshmallow library. Similarly to query files, the app will look for a format module name matching the {format}
specified in the endpoint URL. The app expects a marshmallow.Schema
subclass named ResultSchema
. Available attributes passed to this schema are all those in the original query
object with additional result
and params
attributes added. (See nerium/schema
for examples of how this is done by built-in formats.)
Query type extensions
A resultset
module is expected to have a result
method that takes a query
object and optional keyword argument (kwargs
) dictionary, connects to a data source, and returns tabular results as a serializable Python structure (most typically a list of dictionaries).
A Nerium query
object is a munchified dictionary, with elements found in get_query()
.
Query files to be passed to this module should be named with a file extension that matches the module name (for example, foo.sql
will be handed to the resultset/sql.py
module).
API
URLs
/v1/<string:query_name>?<query_params>
/v1/<string:query_name>/<string:format>?<query_params>
query_name
should match the name of a given query script file, minus the file extension. URL querystring parameters are passed to the invoked data source query, matched to any keys specified in the query file. If any parameters expected by the query are missing, an error will be returned. Extra/unrecognized parameters are silently ignored (this might seem surprising, but it's standard SQLAlchemy behavior for parameter substitution).
format
path may be included as an optional formatter name, and defaults to 'default'. Other supported formatter
options are described in Content section below.
Unknown values passed to query_extension
or format
will silently fall back to defaults.
Method
GET
Success Response
Code: 200
Content:
'default': {"name": "<query_name>", "data": [{<column_name>:<row_value>, etc..., }, {etc...}, ], "metadata": {<key>: <value>, etc..., }, "params": {<array of name-value pairs submitted to query with request>}}
'compact': {"columns": [<list of column names>], "data": [<array of row value arrays>]}
Of course, it is possible that a database query might return no results. In this case, Nerium will respond with an empty JSON array []
regardless of specified format. This is not considered an error, and clients should be prepared to handle it appropriately.
/v1/<string:query_name>/csv
Method
GET
Success Response
Code: 200
Content:
<csv formatted string (with \r\n newline)>
Error Responses
Code: 400
Content: {"error": <exception.repr from Python>}
Sketchy Roadmap/TODOs
(in no particular order)
- More detailed documentation, especially about usage
- Parameter discovery endpoint
- Report listing endpoint
Dynamic filteringImprove/mature plugin architectureSeparate base classes to a libraryImplementation subclasses incontrib
packageRefactor plugin approach to use modules with an interface standard, instead of abstract class inheritance
Configurable/flexible JSON output formatters ([Implemented via marshmallow schemas]AffixFormatter
could do with less hard-coding)- Static output file generator (and other caching)
- Swagger docs
Health check/default query endpoint(Own git commit hash report(?))Convert app.py to Responder
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for Nerium-0.5.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be1a7124c37c0fed5eea329333fb42c83fdc2fd3dd9c95ebd96bc12f22fcb561 |
|
MD5 | 90d77912b71ae8e7f30486c99eecac4a |
|
BLAKE2b-256 | 3249357e6c75d3c47a7be82f2084369f5f12d4597351a4350d44e0f8968b77b1 |