Allows filtering & aggregation iterable of dictionary by another dictionary. Much faster than pandas

These details have not been verified by PyPI

Project links

Project description

Leopards

Coverage build status

Leopards is a way to query list of dictionaries or objects as if you are filtering in DBMS. You can get dicts/objects that are matched by OR, AND or NOT or all of them. As you can see in the comparison they are much faster than Pandas.

Installation

pip install leopards

Usage

from leopards import Q
l = [{"name":"John","age":"16"}, {"name":"Mike","age":"19"},{"name":"Sarah","age":"21"}]
filtered= Q(l,{'name__contains':"k", "age__lt":20})
print(list(filtered))

output

[{'name': 'Mike', 'age': '19'}]

The above filtration can be written as

from leopards import Q

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, name__contains="k", age__lt=20)

Notes:

Q returns an iterator which can be converted to a list by calling list.
Even though, age was str in the dict, as the value of in the query dict was int, Leopards converted the value in dict automatically to match the query data type. This behaviour can be stopped by passing False to convert_types parameter.

Supported filters

eq: equals and this default filter
gt: greater than.
gte: greater than or equal.
lt: less than
lte: less than or equal
in: the value in a list of a tuple.
- e.g. age__in=[10,20,30]
contains: contains a substring as in the example.
icontains: case-insensitive contains.
startswith: checks if a value starts with a query strings.
istartswith: case-insensitive startswith.
endswith: checks if a value ends with a query strings.
iendswith: case-insensitive endswith.
isnull: checks if the value matches any of NULL_VALUES which are ('', '.', None, "None", "null", "NULL")
- e.g. filter__isnull=True or filter__isnull=False

For eq,gt,gte,lt,lte, in, contains, icontains, startswith,istartswith, endswith and iendswith, you can add a n to negate the results. e.g nin which is equivalent to not in

Advanced examples

This section will cover the use of OR, AND and NOT

Usage of `OR`

OR or __or__ takes a list of dictionaries to evaluate and returns with the first True.

from leopards import Q

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, {"OR": [{"name__contains": "k"}, {"age__gte": 21}]})
print(list(filtered))

output

[{'name': 'Mike', 'age': '19'}, {'name': 'Sarah', 'age': '21'}]

Usage of `NOT`

NOT or __not__ takes a dict for query run.

from leopards import Q

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, {"age__gt": 15, "NOT": {"age__eq": 19}})
print(list(filtered))

output

[{'name': 'John', 'age': '16'}, {'name': 'Sarah', 'age': '21'}]

Usage of `AND`

AND or __and__ takes a list of dict for query run, returns with the first False.

from leopards import Q

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"}]
filtered = Q(l, {"__and__": [{"age__gte": 15}, {"age__lt": 21}]})
print(list(filtered))

output

[{'name': 'John', 'age': '16'}, {'name': 'Mike', 'age': '19'}]

Aggregating Data

You can run the following aggregations

Count
Max
Min
Sum
Avg

Count

Find the count of certain aggregated column

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"John","age":"19"}]
from leopards import Count
count = Count(l,['age'])

output

[{"age":"16","count":1},{"age":"19","count":2}, {"age":"21","count":1}]

Max

Find the Max value for a certain column in certain aggregated columns

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"John","age":"19"}]
from leopards import Max
count = Max(l,"age",['name'],dtype=int)

output

[{'name': 'John', 'age': '19'}, {'name': 'Mike', 'age': '19'}, {'name': 'Sarah', 'age': '21'}]

Notes:

If you don't pass the aggregation columns, the maximum will be found across dataset.
You can pass the datatype of the column to convert it on the fly while evaluating

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"John","age":"19"}]
from leopards import Max
m = Max(l,"age",dtype=int)

output

[{'age': 21}]

Min

Find the Max value for a certain column in certain aggregated columns

l = [{"name": "John", "age": "16"}, {"name": "Mike", "age": "19"}, {"name": "Sarah", "age": "21"},{"name":"John","age":"19"}]
from leopards import Min
m = Min(l,"age",['name'])

output

[{'name': 'John', 'age': '16'}, {'name': 'Mike', 'age': '19'}, {'name': 'Sarah', 'age': '21'}]

Note:

If you don't pass the aggregation columns, the min will be found across dataset.
You can pass the datatype of the column to convert it on the fly while evaluating

Sum and Avg

Like Min and Max but only works with integers and floats.

Comparison with Pandas

This is done on Python 3.8 running on Ubuntu 22.04 on i7 11th generation and 32 GB of RAM.

Comparison	Pandas	Leopards
Package Size (Lower is better)	29.8 MB	7.5 KB
import Time (Worst) (Lower is better)	146 ms	1.05 ms
load 10k CSV lines (Lower is better) ^[1]	0.295s	0.138s
get first matched record (Lower is better)	0.310s	0.017s
print all filtered records (10/10k) (Lower is better)	0.310s	0.137s
filter by integers (Lower is better)	0.316s	0.138s

^[1] This was loading the whole csv in memory which was for sake of fair comparison. Nevertheless, Leopards can work with DictReader as an iterable which executes in 0.014s, then it handles line by line.

Thanks for Asma Tahir for Pandas stats.

Contributors

saeedesmaili

Tutorials

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Jan 30, 2025

0.20.1

Apr 16, 2023

0.20.0

Nov 20, 2022

0.10.2

Nov 16, 2022

0.10.1

Nov 15, 2022

0.10.0

Nov 15, 2022

0.9.2

Nov 15, 2022

0.9.1

Nov 15, 2022

0.9.0

Jan 30, 2025

0.6.1

Nov 15, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leopards-1.0.0.tar.gz (10.2 kB view details)

Uploaded Jan 30, 2025 Source

File details

Details for the file leopards-1.0.0.tar.gz.

File metadata

Download URL: leopards-1.0.0.tar.gz
Upload date: Jan 30, 2025
Size: 10.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for leopards-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`788340cc723cb9b251906d6c122dcd09484fbbe28394c95f9629b1c7701183a6`
MD5	`bd6c32b6de8bb55f076db334a2396393`
BLAKE2b-256	`0ee4023f01193f0a7e56120507d0a14113f5b69197b625060f4383f5f1d1944e`

See more details on using hashes here.

leopards 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Leopards

Installation

Usage

Supported filters

Advanced examples

Usage of `OR`

Usage of `NOT`

Usage of `AND`

Aggregating Data

Count

Max

Min

Sum and Avg

Comparison with Pandas

Contributors

Tutorials

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

leopards 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Leopards

Installation

Usage

Supported filters

Advanced examples

Usage of OR

Usage of NOT

Usage of AND

Aggregating Data

Count

Max

Min

Sum and Avg

Comparison with Pandas

Contributors

Tutorials

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Usage of `OR`

Usage of `NOT`

Usage of `AND`