A Python module that aids filtering, formatting, and transforming JSON-like objects
Project description
JTools
JTools is a robust library for interacting with JSON-like objects, focusing on providing an easy way to filter, format, and extract fields from JSON-like data.
A companion to the JavaScript version of this package: @blending_jake/jtools (https://www.npmjs.com/package/@blending_jake/jtools). The JavaScript version supports almost the exact same specials, filters, and formatting specification, with the goal of making it a seamless experience to go from accessing/filtering/formatting in JavaScript to Python and back. The goal is to make the two versions work as identically as possible.
Changelog
-
1.0.6
- Add
===
and!==
to match the strict equality checking needed in the JS version. The methodsseq
andsne
have been added toKey
to correspond with the new filters.===
is the same as==
and!==
in the same as!=
in the Python version. - Rename
null
->!present
and!null
->present
. Corresponding methods have been renamed tonot_present
andpresent
. This filter will catch values that arenull
orundefined
. - Make membership filters (
in
,contains
,!in
and!contains
) work properly with strings, lists, dicts, and sets. - Remove
$datetime
. See below for replacement. - Add
$call
and$attr
for calling a function and accessing an attribute. Can be used to replace$datetime
functionality. - Remove
Formatter.format
and addFormatter.single
andFormatter.many
to be consistent across other classes and support formatting arrays of items. - Add more tests to increase coverage and do basic performance testing
- Add
-
1.0.5
- Query strings can now start with specials to allow operations on the entire object being passed.
- Bug fixes and more unit tests
-
1.0.4
- Added new specials, mostly relating to time
$parse_timestamp
$datetime
$strptime
$strftime
- Added
not
filtering and theinterval
and!interval
operators - Made
Filter
consistent withGetter
by removing.filter()
and adding.single()
and.many()
- Added
fallback
toGetter
- added numerous unit tests
- Added new specials, mostly relating to time
-
1.0.3
- Rename
Getter.get
toGetter.single
- Add
Getter.many
- Support getting multiple fields at once by changing
Getter
to allowGetter(<field>)
andGetter([<field>, <field>, ...])
- Change
Filter
's before for when there are no filters. Now, by default, all items will be returned unlessFilter(..., empty_filters_response=False)
- Rename
Glossary
Installation
pip install jtools
# import
from jtools import Getter, Filter, Key, Condition, Formatter
Getter
Getter
on the surface is very simple: you give it a field query string (or several) and it returns the value (or values) at that path(s) from a given an item or list of items. Example:Getter("name").single({"name": "John"})
will return"John"
. However, there are many more cool features, like supporting dot-notation, having the ability to transform values with specials, and even the ability to drill down into lists. Below is a fuller list of the features.
-
.single(item)
can be used to get field(s) from a single item, or.many(items)
can be used to get field(s) from a list of items -
Multiple fields can be retrieved at once by passing a list of query strings, like
Getter(["name", "age"])
. Resulting values from.single
and.many
will be lists of corresponding length. -
Dot-notation is supported and can be used to access nested values. For example,
meta.id
can be used to get theid
field from the item{"meta": {"id": 1}}
, resulting in the value of1
. -
Integer paths can be used to index lists as long as
Getter(..., convert_ints=True)
, which is set toTrue
by default. This allows paths likefriends.0
. However,convert_ints=False
should be used if trying to access fields whose keys are strings containing digits, like{"index": {"0": ...}}
-
Specials can be can be used to transform the queried value, and multiple specials can be used back to back, with the output of one being used in the next. Specials are included in the field path and prefixed with
$
. For example, if you have{"long_number": 3.1415926}
, you can uselong_number.$round
to round it to2
decimal places, returning3.14
. -
Arguments can be passed into these specials! For example, if you have
{"email": "john_doe@gmail.com"}
and you want to get just the email provider, thenemail.$split("@").$index(-1)
can be used, which will returngmail.com
. Equally,email.$split("@").1
could be used. -
Arguments can be anything that can be represented in JSON. Note: JSON requires strings to be double-quoted, so
email.$split('@')
would not work andemail.$split("@")
would have to be used instead. -
You don't have to use
()
at the end of a special if there aren't any arguments, or the default arguments are acceptable. -
More specials can be added by using the class attribute
.register_special()
like so:Getter.register_special(<name>, <func>)
. The function should take at least one argument, which is the current value in the query string:lambda value, *args: ...
Specials
General
$length -> int
Maps
$keys -> list
$values -> list
$items -> List[tuple]
Type Conversions
$set -> set
$float -> float
$string -> str
$int -> int
$not -> bool
: Returns!value
$fallback(fallback) -> value or fallback
: If the value is None, then it will be replaced withfallback
.$ternary(if_true, if_false, strict=False) -> any
: Returnif_true
if the value istruish
, otherwise, returnif_false
. PassTrue
forstrict
if the value must beTrue
and not justtruish
.
Datetime
$parse_timestamp -> datetime
: Take a Unix timestamp in seconds and return a corresponding datetime object$strptime(fmt=None) -> datetime
: Parse a datetime string and return a corresponding datetime object. Iffmt=None
, then common formats will be tried. Refer to https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior for formatting instructions$timestamp -> float
: Dump a datetime object to a UTC timestamp as a float$strftime(fmt="%Y-%m-%dT%H:%M:%SZ") -> str
: Format a datetime object as a string usingfmt
. Refer to https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior for formatting instructions
Math / Numbers
$add(num) -> Union[int, float]
$subtract(num) -> Union[int, float]
$multiply(num) -> Union[int, float]
$divide(num) -> float
$pow(num) -> Union[int, float]
$abs(num) -> Union[int, float]
$distance(other) -> float
: Euler distance in N-dimensions$math(attr) -> any
: Returnsmath.<attr>(value)
, which can be used for operations likefloor
,cos
,sin
, etc.$round(n=2) -> float
Strings
$prefix(prefix) -> str
: Prefix the value with the specified string$suffix(suffix) -> str
: Concatenate a string to the end of the value$strip -> str
: Strip leading and trailing whitespace$replace(old, new) -> str
: Replace all occurrences of a string$trim(length=50, suffix="...") -> str
: Trim the length of a string$split(on=" ") -> List[str]
: Split a string
Lists
$sum -> Union[float, int]
: Return the sum of the items in the value$join(sep=", ") -> str
: Join a list using the specified separator$index(index) -> any
: Index a list. Negative indices are allowed.$range(start, end=None) ->
: Get a sublist. Defaults tovalue[start:]
, but an end value can be specified. Negative indices are allowed.$map(special, *args) -> list
: Applyspecial
to every element in the value. Arguments can be passed through to the special being used.
Attributes
$call(func, *args) -> any
: Call a function that is on the current value, implemented asgetattr(value, func)(*args)
$attr(attr) -> any
: Access an attribute of the given object, implemented asgetattr(value, attr)
Formatter
Formatter
allows fields to be taken from an object and then formatted into a string. The basic usage isFormatter(<spec>).single(<item>)
, although.many
exists as well. Fields to be replaced should be wrapped in{{}}
and any valid field query string forGetter
can be used. For example,Formatter('Name: {{name}}').format({"name": "John Smith"})
results inName: John Smith
. Below are some specific details.
-
The field specifications from
Getter
are valid here, so the above example could instead be'First Name: {{name.$split(" ").0}}'
to getFirst Name: John
instead. -
Field paths can be nested!!!! - this allows values from one field to be passed as the arguments to another, allowing complex queries. For example,
Formatter("Balance: ${{ balance.$subtract({{ pending_charges }}) }}").format({"balance": 1000, "pending_charges": 250})
which results inBalance: $750
. -
Whitespace is allowed inside of the curly braces before and after the field query string.
{{ a }}
is just as valid as{{a}}
. -
IMPORTANT: Nested fields that return strings which are then used as arguments must be manually double-quoted. For example, lets say we want to replace the domain
gmail
with<domain>
initem = {"email": "john_doe@gmail.com"}
. We want to determine the current domain, which we can do withemail.$split("@").1.$split(".").0
, and then we want to pass that as an argument into$replace
. To do so, we need to surround the nested field with double-quotes so it will be properly recognized as an argument in thereplace
special.Formatter('Generic Email: {{ email.$replace("{{ email.$split("@").1.$split(".").0 }}", "<domain>") }}').format(item)"
-
IMPORTANT: Pay attention when using
f-strings
andFormatter
asf"{{field}}"
becomes"{field}"
. If you have to use anf-string
, then you'll need to escape the braces with another brace, sof"{{{{field}}}}"
becomes"{{field}}"
.
Example (flattening operations):
errors = {
"errors": {
"Process Error": "Could not communicate with the subprocess",
"Connection Error": "Could not connect with the database instance"
}
}
Formatter('{errors.$items.$map("join", ": \\n\\t").$join("\\n")}').single(errors)
# Process Error:
# Could not communicate with the subprocess
# Connection Error:
# Could not connect with the database instance
The above example shows a powerful usage of flattening
errors
into its items, then joining each item; splitting the error name and message between lines, then joining all the errors together.
Example (nested replacement):
item = {
"x1": 1,
"y1": 1,
"x2": 12,
"y2": 54
}
Formatter(
"Midpoint: [{{x2.$subtract({{x1}}).$divide(2)}}, {y2.$subtract({{y1}}).$divide(2)}}]"
)
# Midpoint: [5.5, 26.5]
Additionally, the speed of formatting is very quick. The above statement can be preformed 10,000 times in around 0.75 seconds.
Filter
Filter
takes the field querying capabilities ofGetter
and combines it with filtering conditions to allow lists of items to be filtered down to just those of interest. The basic usage is:Filter(<filters>).many(<list of items>)
, although.single
can also be used to get a boolean answer of whether the item matches the filter or not. The filters can be manually built, or theKey
andCondition
classes can be used to simplify your code.
Filter Schema:
[
{"field": <field>, "operator": <op>, "value": <value>},
OR
{"or": <nested outer structure>},
OR
{"not": <nested outer structure>},
...
]
<field>: anything Getter accepts
<op>: See list below
<value>: Anything that makes sense for the operator
Note on
or
:
{"or": [
[ {filter1}, {filter2} ],
{filter3}
]}
is the same as
(filter1 AND filter2) OR filter3
. Nesting in anor
will cause those filters to beAND'd
and then everything in the toplevel of thator
will beOR'd
.
Operators:
>
<
>=
<=
==
!=
===
: same as==
!==
: same as!=
in
:<field> in <value>
!in
contains
:<value> in <field>
!contains
interval
:<field> in interval [value[0], value[1]]
(closed/inclusive interval)!interval
:<field> not in interval [value[0], value[1]]
startswith
endswith
present
!present
Key
Intended to simplify having to write
{"field": <field>, "operator": <operator>, "value": value}
a lot. The basic usage is:Key(<field>).<op>(<value>)
, or for the first six operators, the actual Python operators can be used, soKey(<field>) <op> <value>
. For example:Key("meta.id").eq(12)
is the same asKey("meta.id") == 12
, which is the same as{"field": "meta.id", "operator": "==", "value": 12}
.
Operators:
underlying operator | Key function |
Python operator |
---|---|---|
> |
gt |
> |
< |
lt |
< |
<= |
lte |
<= |
>= |
gte |
>= |
== |
eq |
== |
!= |
ne |
!= |
=== |
seq |
N/A |
!== |
sne |
N/A |
in |
in_ |
N/A |
!in |
nin |
N/A |
contains |
contains |
N/A |
!contains |
not_contains |
N/A |
interval |
interval |
N/A |
!interval |
not_interval |
N/A |
startswith |
startswith |
N/A |
endswith |
endswith |
N/A |
present |
present |
N/A |
!present |
not_present |
N/A |
Condition
Intended to be used in combination with
Key
to make creating filters easier than manually creating theJSON
. There are three conditions supported:and
,or
, andnot
. They can be manually accessed viaand_(*args)
,or_(*args)
, andnot_()
, or the overloaded operators&
,|
, and~
can be used, respectively.
Caution: &
and |
bind tighter than the comparisons operators and ~
binds the tightest
Key("first_name") == "John" | Key("first_name") == "Bill"
is actually
(Key("first_name") == ("John" | Key("first_name"))) == "Bill"
, not
(Key("first_name") == "John") | (Key("first_name") == "Bill")
Examples
Key("state").eq("Texas") | Key("city").eq("New York")
(Key("gender") == "male") & (Key("age") >= 18) & (Key("selective_service") == False)
Key('creation_time.$parse_timestamp.$attr("year")').lt(2005).or_(
Key('creation_time.$parse_timestamp.$attr("year")').gt(2015)
).and_(
Key("product_id") == 15
)
# (year < 2005 OR year > 2015) AND product_id == 15
Performance
There are several ways to increase the performance of filtering and getting. The query strings within filters or those being passed directly to a
Getter
are parsed when the object is created. This means that using aGetter
orFilter
object multiple times will be faster then creating a new object every time.For example:
# slower
for item in items:
f = Getter("timestamp.$parse_timestamp").single(item)
# do other stuff
# faster
getter = Getter("timestamp.$parse_timestamp")
for item in items:
f = getter.single(item)
# do other stuff
Specifically, reusing a
Getter
can improve performance by 7-8x and reusing aFilter
can improve by 5-6x.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.