A package for quick, scrappy analyses with pandas and SQL

These details have not been verified by PyPI

Project links

Homepage

Project description

siuba

scrappy data analysis, with seamless support for pandas and SQL

siuba (小巴) is a port of dplyr and other R libraries. It supports a tabular data analysis workflow centered on 5 common actions:

select() - keep certain columns of data.
filter() - keep certain rows of data.
mutate() - create or modify an existing column of data.
summarize() - reduce one or more columns down to a single number.
arrange() - reorder the rows of data.

These actions can be preceded by a group_by(), which causes them to be applied individually to grouped rows of data. Moreover, many SQL concepts, such as distinct(), count(), and joins are implemented. Inputs to these functions can be a pandas DataFrame or SQL connection (currently postgres, redshift, or sqlite).

For more on the rationale behind tools like dplyr, see this tidyverse paper. For examples of siuba in action, see the siuba guide.

Installation

pip install siuba

Examples

See the siuba guide or this live analysis for a full introduction.

Basic use

The code below uses the example DataFrame mtcars, to get the average horsepower (hp) per cylinder.

from siuba import group_by, summarize, _
from siuba.data import mtcars

(mtcars
  >> group_by(_.cyl)
  >> summarize(avg_hp = _.hp.mean())
  )

Out[1]: 
   cyl      avg_hp
0    4   82.636364
1    6  122.285714
2    8  209.214286

There are three key concepts in this example:

concept	example	meaning
verb	`group_by(...)`	a function that operates on a table, like a DataFrame or SQL table
siu expression	`_.hp.mean()`	an expression created with `siuba._`, that represents actions you want to perform
pipe	`mtcars >> group_by(...)`	a syntax that allows you to chain verbs with the `>>` operator

See the siuba guide overview for a full introduction.

What is a siu expression (e.g. `_.cyl == 4`)?

A siu expression is a way of specifying what action you want to perform. This allows siuba verbs to decide how to execute the action, depending on whether your data is a local DataFrame or remote table.

from siuba import _

_.cyl == 4

Out[2]:
█─==
├─█─.
│ ├─_
│ └─'cyl'
└─4

You can also think of siu expressions as a shorthand for a lambda function.

from siuba import _

# lambda approach
mtcars[lambda _: _.cyl == 4]

# siu expression approach
mtcars[_.cyl == 4]

Out[3]: 
     mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  carb
2   22.8    4  108.0   93  3.85  2.320  18.61   1   1     4     1
7   24.4    4  146.7   62  3.69  3.190  20.00   1   0     4     2
..   ...  ...    ...  ...   ...    ...    ...  ..  ..   ...   ...
27  30.4    4   95.1  113  3.77  1.513  16.90   1   1     5     2
31  21.4    4  121.0  109  4.11  2.780  18.60   1   1     4     2

[11 rows x 11 columns]

See the siuba guide or read more about lazy expressions.

Using with a SQL database

A killer feature of siuba is that the same analysis code can be run on a local DataFrame, or a SQL source.

In the code below, we set up an example database.

# Setup example data ----
from sqlalchemy import create_engine
from siuba.data import mtcars

# copy pandas DataFrame to sqlite
engine = create_engine("sqlite:///:memory:")
mtcars.to_sql("mtcars", engine, if_exists = "replace")

Next, we use the code from the first example, except now executed a SQL table.

# Demo SQL analysis with siuba ----
from siuba import _, tbl, group_by, summarize, filter

# connect with siuba
tbl_mtcars = tbl(engine, "mtcars")

(tbl_mtcars
  >> group_by(_.cyl)
  >> summarize(avg_hp = _.hp.mean())
  )

Out[4]: 
# Source: lazy query
# DB Conn: Engine(sqlite:///:memory:)
# Preview:
   cyl      avg_hp
0    4   82.636364
1    6  122.285714
2    8  209.214286
# .. may have more rows

See the querying SQL introduction here.

Example notebooks

Below are some examples I've kept as I've worked on siuba. For the most up to date explanations, see the siuba guide

siu expressions
dplyr style pandas
- select verb case study
sql using dplyr style
- simple sql statements
- the kitchen sink with postgres
tidytuesday examples
- tidytuesday is a weekly R data analysis project. In order to kick the tires on siuba, I've been using it to complete the assignments. More specifically, I've been porting Dave Robinson's tidytuesday analyses to use siuba.

Testing

Tests are done using pytest. They can be run using the following.

# start postgres db
docker-compose up
pytest siuba

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.0a3 pre-release yanked

Jan 12, 2022

Reason this release was yanked:

This version renamed to 0.1.*

1.0.0a2 pre-release yanked

Dec 19, 2021

Reason this release was yanked:

This version renamed to 0.1.*

1.0.0a1 pre-release yanked

Dec 18, 2021

Reason this release was yanked:

This version renamed to 0.1.*

0.4.5.dev1 pre-release

Sep 24, 2025

This version

0.4.4

Sep 19, 2023

0.4.3

Sep 18, 2023

0.4.2

Nov 16, 2022

0.4.1

Oct 26, 2022

0.4.0

Oct 18, 2022

0.4.0rc1 pre-release

Oct 13, 2022

0.3.0

May 31, 2022

0.2.3

Apr 30, 2022

0.2.2

Apr 30, 2022

0.2.1

Mar 29, 2022

0.2.0

Mar 29, 2022

0.2.0.dev3 pre-release

Mar 20, 2022

0.2.0.dev2 pre-release

Mar 20, 2022

0.2.0.dev1 pre-release

Mar 20, 2022

0.1.2

Jan 20, 2022

0.1.1

Jan 19, 2022

0.1.0

Jan 19, 2022

0.0.25

Jun 21, 2021

0.0.24

Aug 30, 2020

0.0.23

Aug 15, 2020

0.0.22

Aug 7, 2020

0.0.21

May 20, 2020

0.0.20

May 12, 2020

0.0.19

May 6, 2020

0.0.18

Apr 25, 2020

0.0.17

Feb 17, 2020

0.0.16

Feb 11, 2020

0.0.15

Feb 8, 2020

0.0.14

Jan 25, 2020

0.0.13

Oct 29, 2019

0.0.12

Oct 29, 2019

0.0.11

Aug 8, 2019

0.0.10

Aug 6, 2019

0.0.9

Aug 2, 2019

0.0.8

Jun 17, 2019

0.0.7

Jun 1, 2019

0.0.6

Apr 30, 2019

0.0.5

Apr 26, 2019

0.0.4

Apr 26, 2019

0.0.3

Mar 29, 2019

0.0.2

Feb 20, 2019

0.0.1

Feb 11, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

siuba-0.4.4.tar.gz (170.5 kB view details)

Uploaded Sep 19, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

siuba-0.4.4-py3-none-any.whl (208.6 kB view details)

Uploaded Sep 19, 2023 Python 3

File details

Details for the file siuba-0.4.4.tar.gz.

File metadata

Download URL: siuba-0.4.4.tar.gz
Upload date: Sep 19, 2023
Size: 170.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for siuba-0.4.4.tar.gz
Algorithm	Hash digest
SHA256	`73fb5e3934e45f2083cf0cc362c761fac2f1ae6f918746442edd8f493f009387`
MD5	`6d2b4d9aa9922c0ef7decc8594d89a6b`
BLAKE2b-256	`4cc35203adb162baea4eebe869bffde230f806b612e36a6e3bf4049c3452c786`

See more details on using hashes here.

File details

Details for the file siuba-0.4.4-py3-none-any.whl.

File metadata

Download URL: siuba-0.4.4-py3-none-any.whl
Upload date: Sep 19, 2023
Size: 208.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for siuba-0.4.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07a6b2a02f39e53a8fdb1f1a3a7c49bce182346901e99df1d073e5427cf9e9dc`
MD5	`0a05413635f481da7e39a5e1be482fbb`
BLAKE2b-256	`bea959a1d3ae43ce39b3b1addf670c304312b732fd5e29c6f96132c66019c801`

See more details on using hashes here.

siuba 0.4.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

siuba

Installation

Examples

Basic use

What is a siu expression (e.g. `_.cyl == 4`)?

Using with a SQL database

Example notebooks

Testing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

siuba 0.4.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

siuba

Installation

Examples

Basic use

What is a siu expression (e.g. _.cyl == 4)?

Using with a SQL database

Example notebooks

Testing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

What is a siu expression (e.g. `_.cyl == 4`)?