A Python-to-SQL transpiler to work with relational databases
Grizzly is a transpiler from a Python-API to SQL to move computations from the client into a database system.
Grizzly implements its own
DataFrame structure that tracks operations, like projection, filter, joins, ...
Only when the result of the sequence of operations is needed, a SQL string is produced, resembling all those operations, and sent to a DBMS.
This way, you don't have to care about Out-of-Memory problems, un-optimized queries, and high CPU load.
Grizzly is available on PyPi: https://pypi.org/project/grizzly-sql
pip3 install --user grizzly-sql
As with any Python module, just import it
Connect to your database using an appropriate connection string:
import sqlite3 con = sqlite3.connect("grizzly.db")
Now, reference the table(s) you want to work with:
df = grizzly.read_table("events", con)
df is just a reference, it contains no data from your table.
To show its contents, use the
This will print the table's content on the screen.
Filter & Projection
Operations are similar to Pandas:
df[df["id" == 42]] # filter df = df[["actor1","actor2"]]
DataFrame can be joined with another
df1 = grizzly.read_table("table1") df2 = grizzly.read_table("table2") joined = df1.join(df2, on=["joinCol1", "joinCol2"], how="inner", comp='=')
on parameter, you specify the join columns. The first one is for the left input (
df1), the second one for the right input (
how parameter is used to select the join type:
left outer, etc. This value is directly placed into the generated query, and thus depends on
the dialect of the underlying DBMS. An additional
comp parameter lets you choose the comparison operator.
You sometimes want to join on multiple columns with different comparisons. For this, in Grizzly you define the expression as if it was for filters:
df1 = grizzly.read_table("t1") df2 = grizzly.read_table("t2") j = df1.join(df2, on = (df1['a'] == df2['b']) & (df1['c'] <= df2['d']), how="left outer")
This results in the following SQL code:
SELECT * FROM t1 LEFT OUTER JOIN (SELECT * FROM t2 ) IOBRD ON (t1.a = IOBRD.b) AND (t1.c <= IOBRD.d)
IOBRD is an autogenerated tuple variable needed to avoid disambiguation.
Grouping & Aggregation
You can also group the data on multiple columns and compute an aggregate over the groups:
df = grizzly.read_table("events") df = df[df['id'] == 42] g = df.groupby(["year","actor1"]) a = g.count("actor2")
If no aggregation function is used an
show() is called, only the grouping columns are selected.
You can apply aggregation functions on non-grouped
DataFrames of course. In this case the aggregates will be computed for the whole content.
a.sql() will give
SELECT year, actor1, count(actor2) FROM events WHERE id = 42 GROUP BY year, actor1
df.count() (i.e. before the grouping) for the above piece of code will return the single scalar value with the number of records in
SELECT count(*) FROM events WHERE id = 42
You can inspect the produced SQL string with
And the output will be
SELECT actor1, actor2 FROM events WHERE id = 42
- group by
- aggregation functions: min, max, mean (avg), count, sum
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size grizzly_sql-0.1.2-py2.py3-none-any.whl (9.1 kB)||File type Wheel||Python version py2.py3||Upload date||Hashes View hashes|
|Filename, size grizzly-sql-0.1.2.tar.gz (7.1 kB)||File type Source||Python version None||Upload date||Hashes View hashes|
Hashes for grizzly_sql-0.1.2-py2.py3-none-any.whl