Container for finding Python objects by matching attributes. Stores objects by attribute value for fast lookup.
Project description
FilterBox
Container for finding Python objects by matching attributes.
Finds are very fast. Finding objects using FilterBox can be 5-10x faster than SQLite.
pip install filterbox
Usage:
Find which day will be good for flying a kite. It needs to be windy and sunny.
from filterbox import FilterBox
days = [
{'day': 'Saturday', 'wind_speed': 1, 'sky': 'sunny',},
{'day': 'Sunday', 'wind_speed': 3, 'sky': 'rainy'},
{'day': 'Monday', 'wind_speed': 7, 'sky': 'sunny'},
{'day': 'Tuesday', 'wind_speed': 9, 'sky': 'rainy'}
]
def is_windy(obj):
return obj['wind_speed'] > 5
# make a FilterBox
fb = FilterBox( # make a FilterBox
days, # add objects of any Python type
on=[is_windy, 'sky'] # functions + attributes to find by
)
# find objects by function and / or attribute values
fb.find({is_windy: True, 'sky': 'sunny'})
# result: [{'day': 'Monday', 'wind_speed': 7, 'sky': 'sunny'}]
There are two classes available.
- FilterBox: can
add()
andremove()
objects after creation. - FrozenFilterBox: faster finds, lower memory usage, and immutable.
More Examples
Expand for sample code.
Match and exclude multiple values
from filterbox import FilterBox
objects = [
{'item': 1, 'size': 10, 'flavor': 'melon'},
{'item': 2, 'size': 10, 'flavor': 'lychee'},
{'item': 3, 'size': 20, 'flavor': 'peach'},
{'item': 4, 'size': 30, 'flavor': 'apple'}
]
fb = FilterBox(objects, on=['size', 'flavor'])
fb.find(
match={'size': [10, 20]}, # match anything with size in [10, 20]
exclude={'flavor': ['lychee', 'peach']} # where flavor is not in ['lychee', 'peach']
)
# result: [{'item': 1, 'size': 10, 'flavor': 'melon'}]
Accessing nested data using functions
Use functions to get values from nested data structures.
from filterbox import FilterBox
objs = [
{'a': {'b': [1, 2, 3]}},
{'a': {'b': [4, 5, 6]}}
]
def get_nested(obj):
return obj['a']['b'][0]
fb = FilterBox(objs, [get_nested])
fb.find({get_nested: 4})
# result: {'a': {'b': [4, 5, 6]}}
Greater than, less than
Suppose you need to find objects where x >= some number. If the number is constant, a function that returns
obj.x >= constant
will work.
Otherwise, FilterBox and FrozenFilterBox have a method get_values(attr)
which gets the set of
unique values for an attribute.
Here's how to use it to find objects having x >= 3
.
from filterbox import FilterBox
data = [{'x': i} for i in [1, 1, 2, 3, 5]]
fb = FilterBox(data, ['x'])
vals = fb.get_values('x') # get the set of unique values: {1, 2, 3, 5}
big_vals = [x for x in vals if x >= 3] # big_vals is [3, 5]
fb.find({'x': big_vals}) # result: [{'x': 3}, {'x': 5}
If x is a float or has many unique values, consider making a function on x that rounds it or puts it into a bin of similar values. Discretizing x in ths way will make lookups faster.
Handling missing attributes
Objects don't need to have every attribute.
- Objects that are missing an attribute will not be stored under that attribute. This saves lots of memory.
- To find all objects that have an attribute, match the special value
ANY
. - To find objects missing the attribute, exclude
ANY
. - In functions, raise MissingAttribute to tell FilterBox the object is missing.
Example:
from filterbox import FilterBox, ANY
from filterbox.exceptions import MissingAttribute
def get_a(obj):
try:
return obj['a']
except KeyError:
raise MissingAttribute # tell FilterBox this attribute is missing
objs = [{'a': 1}, {'a': 2}, {}]
fb = FilterBox(objs, ['a', get_a])
fb.find({'a': ANY}) # result: [{'a': 1}, {'a': 2}]
fb.find({get_a: ANY}) # result: [{'a': 1}, {'a': 2}]
fb.find(exclude={'a': ANY}) # result: [{}]
Recipes
- Auto-updating - Keep FilterBox updated when attribute values change
- Wordle solver - Solve string matching problems faster than regex
- Collision detection - Find objects based on type and proximity (grid-based)
- Percentiles - Find by percentile (median, p99, etc.)
API documentation:
How it works
For every attribute in FilterBox, it holds a dict that maps each unique value to the set of objects with that value.
This is the rough idea of the FilterBox data structure:
FilterBox = {
'attribute1': {val1: set(some_objs), val2: set(other_objs)},
'attribute2': {val3: set(some_objs), val4: set(other_objs)}
}
During find()
, the object sets matching each query value are retrieved. Then set operations like union
,
intersect
, and difference
are applied to get the final result.
That's a simplified version; for way more detail, See the "how it works" pages for FilterBox and FrozenFilterBox.
Related projects
FilterBox is a type of inverted index. It is optimized for its goal of finding in-memory Python objects.
Other Python inverted index implementations are aimed at things like vector search and finding documents by words. Outside of Python, ElasticSearch is a popular inverted index search tool. Each of these has goals outside of FilterBox's niche; there are no plans to expand FilterBox towards these functions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for filterbox-0.6.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35e440050e601b65dc47c8b51aa5c6bcb23352e447db93729abc987276024c56 |
|
MD5 | 1cc583e23c88f056078637e8b19f7c03 |
|
BLAKE2b-256 | d5a277d6b10bb61e30bd194d0f5a53ecf65ee219cbf19503415600174f97537e |