Skip to main content

SQL aggregation tool

Project description

sql-agg

Basic usage

Assuming you have the following database table:

user date column_a column_b
user1 2012-01-01 1 1
user1 2012-01-06 2 0
user2 2012-02-19 0 3

You can use sql-agg to extract aggregated data from the table as follows:

from datetime import date
from sqlagg import QueryContext
from sqlagg.columns import CountColumn, SimpleColumn, SumColumn
from sqlagg.filters import GT, LT

# create the columns
user = SimpleColumn("user")
i_a = SumColumn("column_a")
i_b = CountColumn("column_b")

# initialise the query context and add the columns to it
vc = QueryContext("table_name",
    filters=[GT('date', 'startdate'), LT('date', 'enddate')],
    group_by=["user"])
vc.append_column(user)
vc.append_column(i_a)
vc.append_column(i_b)

filter_values={
    "startdate": date(2012, 1, 1),
    "enddate": date(2012, 3, 1)
    }

# resolve the query context with the filter values (connection is an SQLAlchemy connection)
vc.resolve(connection, filter_values=filter_values)
data = vc.data

The resultant data variable will be a dictionary as follows:

{
    "user1": {
        "user": "user1",
        "column_a": 3,
        "column_b": 2
    },
    "user2": {
        "user": "user2",
        "column_a": 0,
        "column_b": 1
    }
}

Multi-level grouping

Multi-level grouping can be done by adding multiple SimpleColumn's to the QueryContext as well as multiple column names in the 'group_by' parameter of the QueryContext.

region = SimpleColumn("region")
sub_region = SimpleColumn("sub_region")
column_a = SumColumn("column_a")

vc = QueryContext("table_name",
    filters=None,
    group_by=["region","sub_region"])

The resultant data would look as follows:

{
    ("region1", "sub_region1"): {
        "region": "region1",
        "sub_region": "sub_region1",
        "column_a": 1
    },
    ("region1", "sub_region2"): {
        "region": "region1",
        "sub_region": "sub_region2",
        "column_a": 3
    },
    ("region2", "sub_region3"): {
        "region": "region2",
        "sub_region": "sub_region3",
        "column_a": 2
    }
}

Columns in detail

For each column you can specify the table, filters and also group_by fields. Using these features you can supply different filters per column or select data from different columns.

Different filters

column_a = SumColumn("column_a")
column_b = SumColumn("column_b", filters=[LT('date', 'enddate')])

In this case column_a will get the filters supplied to the QueryContext while column_b will be resolved with its own filters. This will result in two queries being run on the database.

Different tables

It is possible to select data from different tables by providing columns with different table_names.

column_a = SumColumn("column_a")
column_b = SumColumn("column_b", table_name="table_b", group_by=["user"])

Here column_a will be selected from the table configured in the QueryContext while column_b will be selected from table_name and will be grouped by user. This will result in two queries being run on the database.

As Name

It is possible to use the same column in multiple columns by specifying the alias argument of the column.

sum_a = SumColumn("column_a", alias="sum_a")
count_a = CountColumn("column_a", alias="count_a")

The resulting data will use the alias keys to reference the values.

Conditional / Case columns

Simple

num_wheels = SumWhen("vehicle", whens={"unicycle": 1, "bicycle": 2, "car": 4}, else_=0, alias="num_wheels")

Complex

num_children = SumWhen(whens={"users.age < 13": 1}, else_=0, alias="children")

Alias and Aggregate columns

Useful if you want to use a column more than once but don't want to re-calculate its value.

sum_a = SumColumn("column_a")

aggregate = AggregateColumn(lambda x, y: x / y,
                            AliasColumn("column_a"),
                            SumColumn("column_b")

TODO: custom queries

Filtering

The QueryContext and most column classes accept a filters parameter which must be iterable. Each element of this iterable must be a subclass of sqlagg.filter.SqlFilter. The elements of this parameter are combined using the AND operator.

i.e. filters = [EQ('user', 'username'), EQ('role', 'admin')]

is equivalent to:

filters = [AND([
    EQ('user', 'username'), EQ('role', 'admin')
])]

Any filter expression can be expressed using a RawFilter:

RawFilter('"user" = :username AND "date" between :start and :end')

In this case the same filter could be expressed as follows:

AND([EQ('user', 'username'), BETWEEN('date', 'start', 'end'])

Development

To install dependencies, create/activate a virtualenv and run

pip install -e .[test]

Running Tests

First create an environment variable for the appropriate connection string:

export SQLAGG_TEST_CONNECTION_STRING='postgresql://user:pass@localhost:5432/sqlagg_test'

Then run the following

pytest

Publishing to PyPI

To publish a new release on pypi.org:

  • Update __version__ in sqlagg/__init__.py.
  • Create and push a git tag with the new version number.
    git tag vX.Y.Z
    git push --tags
    
  • Wait for the pypi.yml workflow to build and publish the new release.

A dev release is published on test.pypi.org on each new push or merge to master. A dev release may also be published on-demand for any branch with workflow dispatch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlagg-0.18.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlagg-0.18.0-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file sqlagg-0.18.0.tar.gz.

File metadata

  • Download URL: sqlagg-0.18.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlagg-0.18.0.tar.gz
Algorithm Hash digest
SHA256 25cb5607ef86883d127eaf560c49cfe7ec8f29f72aa04e9bb83379b1ddf2167d
MD5 7f0943c0bee0665e38ac075396bc7f31
BLAKE2b-256 990f8d2cfe48952946d64f35c4475e9a3b06f2c3c714619e51bb12e3173cd600

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlagg-0.18.0.tar.gz:

Publisher: pypi.yml on dimagi/sql-agg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlagg-0.18.0-py3-none-any.whl.

File metadata

  • Download URL: sqlagg-0.18.0-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for sqlagg-0.18.0-py3-none-any.whl
Algorithm Hash digest
SHA256 37e1dc3caffff13828865033b4ecd7ecb72cfe3fbfa1a71d60aec0e2cfd7917e
MD5 b5874691216e412a1ed23278c893ce76
BLAKE2b-256 ccf4e21eefbd9e508d9b14235c869f872be63507058c4a30f88037a398e6af4c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlagg-0.18.0-py3-none-any.whl:

Publisher: pypi.yml on dimagi/sql-agg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page