SQL aggregation tool
Project description
sql-agg
Basic usage
Assuming you have the following database table:
| user | date | column_a | column_b |
|---|---|---|---|
| user1 | 2012-01-01 | 1 | 1 |
| user1 | 2012-01-06 | 2 | 0 |
| user2 | 2012-02-19 | 0 | 3 |
You can use sql-agg to extract aggregated data from the table as follows:
from datetime import date
from sqlagg import QueryContext
from sqlagg.columns import CountColumn, SimpleColumn, SumColumn
from sqlagg.filters import GT, LT
# create the columns
user = SimpleColumn("user")
i_a = SumColumn("column_a")
i_b = CountColumn("column_b")
# initialise the query context and add the columns to it
vc = QueryContext("table_name",
filters=[GT('date', 'startdate'), LT('date', 'enddate')],
group_by=["user"])
vc.append_column(user)
vc.append_column(i_a)
vc.append_column(i_b)
filter_values={
"startdate": date(2012, 1, 1),
"enddate": date(2012, 3, 1)
}
# resolve the query context with the filter values (connection is an SQLAlchemy connection)
vc.resolve(connection, filter_values=filter_values)
data = vc.data
The resultant data variable will be a dictionary as follows:
{
"user1": {
"user": "user1",
"column_a": 3,
"column_b": 2
},
"user2": {
"user": "user2",
"column_a": 0,
"column_b": 1
}
}
Multi-level grouping
Multi-level grouping can be done by adding multiple SimpleColumn's to the QueryContext as well as multiple column names in the 'group_by' parameter of the QueryContext.
region = SimpleColumn("region")
sub_region = SimpleColumn("sub_region")
column_a = SumColumn("column_a")
vc = QueryContext("table_name",
filters=None,
group_by=["region","sub_region"])
The resultant data would look as follows:
{
("region1", "sub_region1"): {
"region": "region1",
"sub_region": "sub_region1",
"column_a": 1
},
("region1", "sub_region2"): {
"region": "region1",
"sub_region": "sub_region2",
"column_a": 3
},
("region2", "sub_region3"): {
"region": "region2",
"sub_region": "sub_region3",
"column_a": 2
}
}
Columns in detail
For each column you can specify the table, filters and also group_by fields. Using these features you can supply
different filters per column or select data from different columns.
Different filters
column_a = SumColumn("column_a")
column_b = SumColumn("column_b", filters=[LT('date', 'enddate')])
In this case column_a will get the filters supplied to the QueryContext while column_b will be resolved with its own
filters. This will result in two queries being run on the database.
Different tables
It is possible to select data from different tables by providing columns with different table_names.
column_a = SumColumn("column_a")
column_b = SumColumn("column_b", table_name="table_b", group_by=["user"])
Here column_a will be selected from the table configured in the QueryContext while column_b will be selected from
table_name and will be grouped by user. This will result in two queries being run on the database.
As Name
It is possible to use the same column in multiple columns by specifying the alias argument of the column.
sum_a = SumColumn("column_a", alias="sum_a")
count_a = CountColumn("column_a", alias="count_a")
The resulting data will use the alias keys to reference the values.
Conditional / Case columns
Simple
num_wheels = SumWhen("vehicle", whens={"unicycle": 1, "bicycle": 2, "car": 4}, else_=0, alias="num_wheels")
Complex
num_children = SumWhen(whens={"users.age < 13": 1}, else_=0, alias="children")
Alias and Aggregate columns
Useful if you want to use a column more than once but don't want to re-calculate its value.
sum_a = SumColumn("column_a")
aggregate = AggregateColumn(lambda x, y: x / y,
AliasColumn("column_a"),
SumColumn("column_b")
TODO: custom queries
Filtering
The QueryContext and most column classes accept a filters parameter which must be iterable.
Each element of this iterable must be a subclass of sqlagg.filter.SqlFilter. The elements of this
parameter are combined using the AND operator.
i.e.
filters = [EQ('user', 'username'), EQ('role', 'admin')]
is equivalent to:
filters = [AND([
EQ('user', 'username'), EQ('role', 'admin')
])]
Any filter expression can be expressed using a RawFilter:
RawFilter('"user" = :username AND "date" between :start and :end')
In this case the same filter could be expressed as follows:
AND([EQ('user', 'username'), BETWEEN('date', 'start', 'end'])
Development
To install dependencies, create/activate a virtualenv and run
pip install -e .[test]
Running Tests
First create an environment variable for the appropriate connection string:
export SQLAGG_TEST_CONNECTION_STRING='postgresql://user:pass@localhost:5432/sqlagg_test'
Then run the following
pytest
Publishing to PyPI
To publish a new release on pypi.org:
- Update
__version__in sqlagg/__init__.py. - Create and push a git tag with the new version number.
git tag vX.Y.Z git push --tags
- Wait for the pypi.yml workflow to build and publish the new release.
A dev release is published on test.pypi.org on each new push or merge to master. A dev release may also be published on-demand for any branch with workflow dispatch.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlagg-0.18.0.tar.gz.
File metadata
- Download URL: sqlagg-0.18.0.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25cb5607ef86883d127eaf560c49cfe7ec8f29f72aa04e9bb83379b1ddf2167d
|
|
| MD5 |
7f0943c0bee0665e38ac075396bc7f31
|
|
| BLAKE2b-256 |
990f8d2cfe48952946d64f35c4475e9a3b06f2c3c714619e51bb12e3173cd600
|
Provenance
The following attestation bundles were made for sqlagg-0.18.0.tar.gz:
Publisher:
pypi.yml on dimagi/sql-agg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqlagg-0.18.0.tar.gz -
Subject digest:
25cb5607ef86883d127eaf560c49cfe7ec8f29f72aa04e9bb83379b1ddf2167d - Sigstore transparency entry: 191954960
- Sigstore integration time:
-
Permalink:
dimagi/sql-agg@ef0162cfdd135c45b46de76ba99ba077a6280b31 -
Branch / Tag:
refs/tags/v0.18.0 - Owner: https://github.com/dimagi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@ef0162cfdd135c45b46de76ba99ba077a6280b31 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sqlagg-0.18.0-py3-none-any.whl.
File metadata
- Download URL: sqlagg-0.18.0-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37e1dc3caffff13828865033b4ecd7ecb72cfe3fbfa1a71d60aec0e2cfd7917e
|
|
| MD5 |
b5874691216e412a1ed23278c893ce76
|
|
| BLAKE2b-256 |
ccf4e21eefbd9e508d9b14235c869f872be63507058c4a30f88037a398e6af4c
|
Provenance
The following attestation bundles were made for sqlagg-0.18.0-py3-none-any.whl:
Publisher:
pypi.yml on dimagi/sql-agg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqlagg-0.18.0-py3-none-any.whl -
Subject digest:
37e1dc3caffff13828865033b4ecd7ecb72cfe3fbfa1a71d60aec0e2cfd7917e - Sigstore transparency entry: 191954964
- Sigstore integration time:
-
Permalink:
dimagi/sql-agg@ef0162cfdd135c45b46de76ba99ba077a6280b31 -
Branch / Tag:
refs/tags/v0.18.0 - Owner: https://github.com/dimagi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@ef0162cfdd135c45b46de76ba99ba077a6280b31 -
Trigger Event:
push
-
Statement type: