Tidy interface to polars
Project description
tidypolars
tidypolars is a data frame library built on top of the blazingly fast polars library that gives access to methods and functions familiar to R tidyverse users.
Installation
You can install tidypolars with pip
:
$ pip install tidypolars
Or through conda
:
$ conda install -c conda-forge tidypolars
General syntax
tidypolars methods are designed to work like tidyverse functions:
import tidypolars as tp
from tidypolars import col, desc
df = tp.tibble(x = range(3), y = range(3, 6), z = ['a', 'a', 'b'])
(
df
.select('x', 'y', 'z')
.filter(col('x') < 4, col('y') > 1)
.arrange(desc('z'), 'x')
.mutate(double_x = col('x') * 2,
x_plus_y = col('x') + col('y'))
)
┌─────┬─────┬─────┬──────────┬──────────┐
│ x ┆ y ┆ z ┆ double_x ┆ x_plus_y │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪══════════╪══════════╡
│ 2 ┆ 5 ┆ b ┆ 4 ┆ 7 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 0 ┆ 3 ┆ a ┆ 0 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 4 ┆ a ┆ 2 ┆ 5 │
└─────┴─────┴─────┴──────────┴──────────┘
The key difference from R is that column names must be wrapped in col()
in the following methods:
.filter()
.mutate()
.summarize()
The general idea - when doing calculations on a column you need to wrap it in col()
. When doing simple column selections (like in .select()
) you can pass the column names as strings.
A full list of functions can be found here.
Group by syntax
Methods operate by group by calling the by
arg.
- A single column can be passed with
_by = 'z'
- Multiple columns can be passed with
_by = ['y', 'z']
(
df
.summarize(avg_x = tp.mean(col('x')),
_by = 'z')
)
┌─────┬───────┐
│ z ┆ avg_x │
│ --- ┆ --- │
│ str ┆ f64 │
╞═════╪═══════╡
│ a ┆ 0.5 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ b ┆ 2 │
└─────┴───────┘
Selecting/dropping columns
tidyselect functions can be mixed with normal selection when selecting columns:
df = tp.tibble(x1 = range(3), x2 = range(3), y = range(3), z = range(3))
df.select(tp.starts_with('x'), 'z')
┌─────┬─────┬─────┐
│ x1 ┆ x2 ┆ z │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ 0 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1 ┆ 1 ┆ 1 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 2 ┆ 2 │
└─────┴─────┴─────┘
To drop columns use the .drop()
method:
df.drop(tp.starts_with('x'), 'z')
┌─────┐
│ y │
│ --- │
│ i64 │
╞═════╡
│ 0 │
├╌╌╌╌╌┤
│ 1 │
├╌╌╌╌╌┤
│ 2 │
└─────┘
Converting to/from pandas data frames
If you need to use a package that requires pandas data frames, you can convert from a tidypolars tibble
to
a pandas DataFrame
.
To do this you'll first need to install pyarrow:
pip install pyarrow
To convert to a pandas DataFrame
:
df = df.as_pandas()
To convert from a pandas DataFrame
to a tidypolars tibble
:
df = tp.as_tibble(df)
Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tidypolars-0.3.2.tar.gz
.
File metadata
- Download URL: tidypolars-0.3.2.tar.gz
- Upload date:
- Size: 17.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.14 Darwin/24.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 009cb92d3211fb20a375adbfcf043703b04411aa22c05351addd13addff84f67 |
|
MD5 | 3336e55c01d1b0111bbc96d4daa18d53 |
|
BLAKE2b-256 | 06b498c9f4f6c87d0630c4745777140c889d3b1a70db5a292204020839f0deae |
File details
Details for the file tidypolars-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: tidypolars-0.3.2-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.14 Darwin/24.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73ca614730c38abc63e30db832e599b098e95ea7094d9cff60fe24733b0a5293 |
|
MD5 | 18f38a0c23c9259b993f295658da67b1 |
|
BLAKE2b-256 | 4bc5fe58f5ff266d15d22c3eb214c04d035c606febd317ea51cac85e212e74de |