Skip to main content

A simple library for performing Square Root Decomposition and Aggregation

Project description

Overview

Build Status Supported Versions license

This is pysquagg, a library containing a data structure intended for expediant computation of aggregations on a collection using Square Root Decomposition. The data structure is an extension of Mo's Algorithm. Please see more details in the associated blog post.

Motivation

The principles behind Mo's Algorithm is interesting and useful, but the implementation is a bit cumbersome. This library is intended to make it easier to use the algorithm in Python, plus introduce dynamic behavior, such that a collection can be modified after the data structure is created, and the corresponding blocks + aggregates are updated accordingly.

Details

The list supplied is split into $\left\lfloor \frac{n}{\left\lfloor \sqrt{n} \right\rfloor} \right\rfloor$ blocks of size $\sqrt{n}$, and the aggregate (based on the supplied aggregation function) is computed for each block. When thePySquagg object is queried with a valid start and end index:

  • Pre-computed aggregations for all blocks that are entirely within the range are collected
  • Elements not fully contained within a block are iterated over and the aggregation function is applied to them
  • The pre-computed aggregations and newly computed aggregations are combined and returned

The end result is a query method that is O($\sqrt{n}$), compared to O(n) for a naive implementation.

Additionally, the PySquagg object can be modified after creation (e.g; append, extend, pop), and the blocks are updated accordingly to always be of size $\sqrt{n}$. The aggregates are also computed on the updated blocks.

API & Usage

The API for using pysquagg is simple, as we're only providing a single class PySquagg:

from pysquagg.pysquagg import PySquagg

pysquagg_instance = PySquagg([1, 2, 3, 4, 5, 6], aggregator_function=sum)
pysquagg_instance.blocks # will print [[1, 2], [3, 4], [5, 6]]
pysquagg_instance.aggregated_values # will print [3, 7, 11]
pysquagg_instance.query(0, 5) # will print 21
pysquagg_instance += [7, 8]
pysquagg_instance.blocks # will print [[1, 2], [3, 4], [5, 6], [7, 8]]
pysquagg_instance.append(9)
pysquagg_instance.blocks # will print [[1, 2, 3], [4, 5, 6], [7, 8, 9]] - the block size has been recomputed from 2 -> 3
pysquagg_instance.aggregated_values # will print [6, 15, 24]
pysquagg_instance.query(0, 8) # will print 45
pysquagg_instance.pop()
pysquagg_instance[2] = -1
pysquagg_instance.blocks # will print [[1, 2], [-1, 4], [5, 6], [7, 8]] - block_size has dropped down from 3 -> 2

Parallel Option

If the aggregation function is computationally heavy, there is a parallel flag (default: False) which will use the ProcessPoolExecutor for Python versions below 3.13, and ThreadPoolExecutor for Python versions which are 3.13 and above and has free threading enabled.

from pysquagg.pysquagg import PySquagg

pysquagg_instance = PySquagg([1, 2, 3, 4, 5, 6], aggregator_function=sum, parallel=True)
# do all the same things as above, except now using the corresponding executor

Note: Extensive benchmarking has not been conducted.

Performance Characteristics

Complexity

Operation Average Case Time Complexity Worst Case Time Complexity
query O($\sqrt{n}$) O($\sqrt{n}$)
append O(1) O(n)
insert O(n) O(n)
pop O(n) O(n)
extend O(m) O(n + m)

The main reason for other operations being linear in the worst case is the fact that when the collection is modified, the blocks and aggregates need to be recomputed when the square root of the size of the collection changes. Furthermore, as PySquagg is a subclass of list, some of these performance characteristics are inherent.

Benchmarks

Some preliminary benchmarking can be conducted from scripts in the benchmarks directory. One highlight from comparing query to performing computations on the arbitrary slices (using sum as the aggregator function) is:

Operation PySquagg (s) Naive (s)
query 0.032 1.48

As derived from a 2023 Macbook Pro M2, 16GB RAM.

Constraints

The aggregator functions need to be associative and commutative, and the data structure is not thread-safe.

TODO

  • Identify if we can reduce the runtime of some operations to be sublinear
  • Perform more extensive benchmarking
  • Incorporate a mechanism for combining aggregator functions, if someone adds two PySquagg objects
  • Add a LoosePySquagg class that does not strictly enforce the sqrt(n), which may have some performance benefits for certain operations such as insert and pop which currently require recomputation of blocks and aggregates

💡 Interested in contributing? Check out the Local Development & Contributions Guide.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysquagg-1.2.0.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysquagg-1.2.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file pysquagg-1.2.0.tar.gz.

File metadata

  • Download URL: pysquagg-1.2.0.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pysquagg-1.2.0.tar.gz
Algorithm Hash digest
SHA256 363072d5c146dd976d6c340182e72d6b30f452eec5e7c143aaf6f429dd153414
MD5 1d2a23d2d8e7af5130572084d4c514ff
BLAKE2b-256 990bf2015b7e7ea21215777ef5b5ac333bec7275c61f6e8ad3196bb8067453b3

See more details on using hashes here.

File details

Details for the file pysquagg-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: pysquagg-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pysquagg-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 507847d158b65e34b607a2b5b8beae1161da3219dba03d1e08bd0319b91508ba
MD5 0ec7b4039e4f98bdd372e60f571bd527
BLAKE2b-256 cdda3d9a1288995117c31373d787405274ec81a45298bd31302c035349ecd343

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page