A Python library for applying computations to a JSON object using a Subject-Verb-Object grammar.
Project description
Derived Attributes
A Python library for applying computations to a JSON object using a Subject-Verb-Object grammar.
What does this library do, and why is it useful?
Suppose you have a large, complex JSON (or JSON-like) object. Perhaps it represents one or more medical records, customer records, or financial records.
The object contains data that you want to work with, but not necessarily in its raw form.
It is common, in such a case, to pass the object through a processing layer or ETL job that parses the raw data and performs some operations on it to produce derived attributes, which are the data you actually care about.
For instance, if your JSON object contains a list of customer transactions, one useful derived value might be average_order_value
.
This library provides a succint way of defining and computing these derived attributes. (Essentially, the library becomes your processing layer.) The sentences you define to generate these attributes can be stored and managed in a variety of formats: in CSV files, in a database table, or in the codebase itself.
Example
Suppose you have the following JSON-like object, which contains vendor expense data for multiple businesses:
source = {
"records": [
{
"business_name": "ABC Electronics",
"vendors": [
{
"vendor_name": "Tech Solutions",
"has_contract": False,
"budget": 15000,
"expenses": 8000,
},
{
"vendor_name": "Office Supplies Inc.",
"has_contract": True,
"budget": 2000,
"expenses": 1500,
},
],
},
{
"business_name": "XYZ Marketing",
"vendors": [
{
"vendor_name": "AdvertiseNow",
"has_contract": True,
"budget": 10000,
"expenses": 9000,
},
{
"vendor_name": "Print House",
"has_contract": True,
"budget": 3000,
"expenses": 3000,
},
],
},
]
}
Suppose you would like to derive the following attributes based on this data:
total_vendor_count
: The number of vendors across all businesses.max_budget_only_contract
: The highest budget for vendors with a contract.median_used_budget
: The median percentage of the monthly budget that has been used.
One approach to computing these derived values might be to normalize the data, create two-dimensional representations via database tables or data frames, then query and aggregate the data using tools like SQL or Pandas.
Derived Attributes allows you to instead work with the data in its JSON form -- essentially a deeply nested dictionary -- by specifying the computions using a Subject-Verb-Object grammar that accepts JSONPath syntax:
Attribute | Subject | Verb | Object |
---|---|---|---|
total_vendor_count |
source |
parse_len |
$.records[*].vendors[*] |
max_budget_only_contract |
source |
parse_max |
$.records[*].vendors[?has_contract == true].budget |
_used_budget |
source |
parse_list |
$.records[*].vendors[*].expenses / $.records[*].vendors[*].budget |
median_used_budget |
_used_budget |
parse_median |
When these S-V-O sentences are evaluated, it produces the following derived attributes:
{
"total_vendor_count": 4,
"max_budget_only_contract": 10000,
"median_used_budget": 0.825,
}
Note: Attributes prefixed with an underscore are considered private and are useful for holding the results of intermediate calculations. They are not returned.
For another example of how to use Derived Attributes in a real-world scenario, see examples.
Subject-Verb-Object grammar
In the simple Subject-Verb-Object grammar this library uses:
-
The Subject is a reference to a raw value (e.g. the source data), or to another derived attribute.
-
The Verb is a unary or binary function to be performed against that value (e.g. an operator or aggregator).
-
An optional Object value can be supplied as a second parameter to the Verb function.
Each S-V-O combination forms a simple sentence, the output of which is a Derived Attribute.
The grammar supports the ability to nest operations. Each Derived Attribute can be used as inputs to other sentences.
Supported Verbs
Verb | Definition |
---|---|
> | Returns true if the Subject value is greater than the Object value; else false. |
< | Returns true if the Subject value is less than the Object value; else false. |
= | Returns true if the Subject value equals the Object value; else false. |
eq | Returns true if the (non-numeric) Subject value equals the Object value; else false. |
and | Returns true if the Subject value and Object value are both truthy; else false. |
or | Returns true if either the Subject or Object value is truthy; else false. |
len | Returns the length of a list provided as a Subject value. |
sum | Returns the sum of a list of numeric values provided as a Subject value. |
min | Returns the minimum number in a list of numeric values provided as a Subject value. |
max | Returns the maximum number in a list of numeric values provided as a Subject value. |
median | Returns the median of a list of numeric values provided as a Subject value. |
parse | Parse a JSONPath expression that matches a single scalar value, then return that value. |
parse_list | Parse a JSONPath expression that matches a list of values, then return that list. |
parse_len | Returns the number of values that match a JSONPath expression. |
parse_sum | Returns the sum of values that match a JSONPath expression. |
parse_min | Returns the minimum numeric value from all values that match a JSONPath expression. |
parse_max | Returns the maximum numeric value from all values that match a JSONPath expression. |
parse_median | Returns the median numeric value from all values that match a JSONPath expression. |
JSONPath and JSonata syntaxes
By default, the Derived Attributes library uses jsonpath-ng to parse JSONPath expressions.
Please see that project's JSONPath Syntax section for more details about how to construct these expressions.
If you would prefer to use Jsonata syntax to query your data, that can be achieved by specifying parse_jsonata
as the Verb.
Because Jsonata has its own function library, you can generate some derived attributes in a single step using Jsonata syntax that might take multiple steps using JSONPath syntax.
Derived Rules
This library also provides the ability to construct a simple rules engine using similar mechanics to Derived Attributes.
Derived Rules employ the same S-V-O sentence structure for defining rules, but instead of returning the attributes themselves, it treats each sentence that evaluates to True
or False
as a rule.
This allows flexible implementations that employ built-in Python methods such as:
any()
if at least one of the specified rule needs to matchall()
if all of the specified rules need to matchsum()
for a scorecard approach, where the number of rules that evaluate toTrue
needs to exceed some threshold
For an example of how to use Derived Rules in a real-world scenario, see examples.
Derived Triggers
When you want to trigger events based on the evaluated data, you can use Derived Triggers.
A trigger uses the same S-V-O sentence structure as Derived Attributes, but it also supports two additional inputs: an event name that should be sent to a supplied event handler, and a list of parameters that should be included in the event.
For example, consider the following list of triggers:
Attribute | Subject | Verb | Object | Action | Params |
---|---|---|---|---|---|
_color |
source |
parse |
$.color |
||
_id |
source |
parse |
$.id |
||
is_red |
_color |
= |
red |
record_color |
["_color", "_id"] |
is_blue |
_color |
= |
blue |
record_color |
["_color", "_id"] |
is_green |
_color |
= |
green |
record_color |
["_color", "_id"] |
When a trigger evaluates to True
, an action name and optional parameters are passed to an event handler, which can further process the event.
For an example of how to use Derived Triggers in a real-world scenario, see examples.
Transform Objects
In some cases, you don't need to derive new attributes based on a JSON-like source object; instead, you need to apply modifications directly to the source object (or a copy of that object).
Using a Subject-Verb-Object grammar similar to Derived Attributes, the TransformObject
class allows you to specify which modifications to perform using JSONPath syntax.
For this class:
-
The Subject is a JSONPath query that points to one or more nodes in the source object.
-
The Verb is a function to be performed on the specified node(s).
-
The Object, if required by the Verb, is a parameter supplied to the Verb function.
The class can be initialized with in_place=False
, which will return a new object, or with in_place=True
, which will directly modify the source object.
Supported Transform Verbs
Transform Verb | Definition |
---|---|
replace_vals | Replaces the node(s)' value with the Object value. |
remove_nodes | Removes the specified node(s). No Object value required. |
add_node | Adds the specified node and assigns it the Object value. |
add_to_list | Appends the Object value to the list specified by the query |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file derived-attributes-0.3.1.tar.gz
.
File metadata
- Download URL: derived-attributes-0.3.1.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00312f1a7fbf8813677a64cde9ee860af3bd0192e14063b812b85efc55b7625f |
|
MD5 | f597be1b154a3b19c207e51fb7522907 |
|
BLAKE2b-256 | 3608102d5ef67b9102fcabd91bce2ed0797a34da20e43e40faca1aaaa9efe4a3 |
File details
Details for the file derived_attributes-0.3.1-py3-none-any.whl
.
File metadata
- Download URL: derived_attributes-0.3.1-py3-none-any.whl
- Upload date:
- Size: 21.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1040c50368a8745cd8e19d1741f432c95c87dfac0f2aa24d09bbcc1aae74fc19 |
|
MD5 | 39530d7431f238c0887dfcc1148d8166 |
|
BLAKE2b-256 | ad8261e2b0b01bb26215545ea2f5c3bb35c1b10361c86be9a62e48f90b0afd5d |