A Python library for applying computations to a JSON object using a Subject-Verb-Object grammar.
Project description
Derived Attributes
A Python library for applying computations to a JSON object using a Subject-Verb-Object grammar.
What does this library do, and why is it useful?
Suppose you have a large, complex JSON (or JSON-like) object. Perhaps it represents one or more medical records, customer records, or financial records.
The object contains data that you want to work with, but not necessarily in its raw form.
It is common, in such a case, to pass the object through a processing layer or ETL job that parses the raw data and performs some operations on it to produce derived attributes, which are the data you actually care about.
For instance, if your JSON object contains a list of customer transactions, one useful derived value might be average_order_value
.
This library provides a succint way of defining and computing these derived attributes. (Essentially, the library becomes your processing layer.) The derived attributes you define can be stored and managed in a variety of formats: in CSV files, in a database table, or in the codebase itself.
Example
Suppose you have the following JSON-like object, which contains vendor expense data for multiple businesses:
source = {
"records": [
{
"business_name": "ABC Electronics",
"vendors": [
{
"vendor_name": "Tech Solutions",
"has_contract": False,
"budget": 15000,
"expenses": 8000,
},
{
"vendor_name": "Office Supplies Inc.",
"has_contract": True,
"budget": 2000,
"expenses": 1500,
},
],
},
{
"business_name": "XYZ Marketing",
"vendors": [
{
"vendor_name": "AdvertiseNow",
"has_contract": True,
"budget": 10000,
"expenses": 9000,
},
{
"vendor_name": "Print House",
"has_contract": True,
"budget": 3000,
"expenses": 3000,
},
],
},
]
}
Suppose you would like to derive the following attributes based on this data:
total_vendor_count
: The number of vendors across all businesses.max_budget_only_contract
: The highest budget for vendors with a contract.median_used_budget
: The median percentage of the monthly budget that has been used.
One approach to computing these derived values might be to normalize the data, create two-dimensional representations via database tables or data frames, then query and aggregate the data using tools like SQL or Pandas.
Derived Attributes allows you to instead work with the data in its JSON form, specifying the computions using a Subject-Verb-Object grammar that accepts JSONPath syntax:
Attribute | Subject | Verb | Object |
---|---|---|---|
total_vendor_count |
source |
parse_len |
$.records[*].vendors[*] |
max_budget_only_contract |
source |
parse_max |
$.records[*].vendors[?has_contract == true].budget |
_used_budget |
source |
parse_list |
$.records[*].vendors[*].expenses / $.records[*].vendors[*].budget |
median_used_budget |
_used_budget |
parse_median |
When these S-V-O sentences are evaluated, it produces the following derived attributes:
{
"total_vendor_count": 4,
"max_budget_only_contract": 10000,
"median_used_budget": 0.825,
}
Note: Attributes prefixed with an underscore are considered private and are useful for holding the results of intermediate calculations. They are not returned.
Subject-Verb-Object grammar
In the simple Subject-Verb-Object grammar this library uses:
-
The Subject is a reference to a raw value (e.g. the source data), or to another derived attribute.
-
The Verb is a unary or binary function to be performed against that value (e.g. an operator or aggregator).
-
An optional Object value can be supplied as a second parameter to the Verb function.
Each S-V-O combination forms a simple sentence, the output of which is a Derived Attribute.
The grammar supports the ability to nest operations. Each Derived Attribute can be used as inputs to other sentences.
Supported Verbs
Verb | Definition |
---|---|
> | Returns true if the Subject value is greater than the Object value; else false. |
< | Returns true if the Subject value is less than the Object value; else false. |
= | Returns true if the Subject value equals the Object value; else false. |
eq | Returns true if the (non-numeric) Subject value equals the Object value; else false. |
and | Returns true if the Subject value and Object value are both truthy; else false. |
or | Returns true if either the Subject or Object value is truthy; else false. |
len | Returns the length of a list provided as a Subject value. |
sum | Returns the sum of a list of numeric values provided as a Subject value. |
min | Returns the minimum number in a list of numeric values provided as a Subject value. |
max | Returns the maximum number in a list of numeric values provided as a Subject value. |
median | Returns the median of a list of numeric values provided as a Subject value. |
parse | Parse a JSONPath expression that matches a single scalar value, then return that value. |
parse_list | Parse a JSONPath expression that matches a list of values, then return that list. |
parse_len | Returns the number of values that match a JSONPath expression. |
parse_sum | Returns the sum of values that match a JSONPath expression. |
parse_min | Returns the minimum numeric value from all values that match a JSONPath expression. |
parse_max | Returns the maximum numeric value from all values that match a JSONPath expression. |
parse_median | Returns the median numeric value from all values that match a JSONPath expression. |
JSONPath syntax in Objects
By default, the Derived Attributes library uses jsonpath-ng to parse JSONPath expressions.
Please see that project's JSONPath Syntax section for more details about how to construct these expressions.
If you would prefer to use Jsonata syntax to query your data, that can be achieved by specifying parse_jsonata
as the Verb.
Derived Rules
This library also provides the ability to construct a simple rules engine using similar mechanics to Derived Attributes.
Derived Rules employ the same S-V-O sentence structure for defining rules, but instead of returning the attributes, it treats them as rules that evaluate to True
or False
.
This allows flexible implementations that employ built-in Python methods such as:
any()
if at least one of the specified rule needs to matchall()
if all of the specified rules need to matchsum()
for a scorecard approach, where the number of rules that evaluate toTrue
needs to exceed some threshold
Derived Triggers
When you want to trigger events based on the evaluated data, you can use Derived Triggers.
A trigger uses the same S-V-O sentence structure as Derived Attributes, but it also supports two additional inputs: an event name that should be sent to a supplied event handler, and a list of parameters that should be included in the event.
For example, consider the following list of triggers:
Attribute | Subject | Verb | Object | Action | Params |
---|---|---|---|---|---|
_color |
source |
parse |
$.color |
||
_id |
source |
parse |
$.id |
||
is_red |
_color |
= |
red |
record_color |
["_color", "_id"] |
is_blue |
_color |
= |
blue |
record_color |
["_color", "_id"] |
is_green |
_color |
= |
green |
record_color |
["_color", "_id"] |
When a trigger evaluates to True
, an action name and optional parameters are passed to an event handler, which can further process the event.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for derived_attributes-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5b5ff950b82ced89bf869af8466ffa397b6dbf18cb936803104864dab58849a |
|
MD5 | ad34b7a596ffc86b76bac47ba8549a82 |
|
BLAKE2b-256 | a505496f5c74a6b95e98a4e9a9a80eabf73b2361e2c95ea3fef75a15687fb5d2 |