Skip to main content

Tidy interface to polars

Project description

tidypolars

PyPI Latest Release conda-forge

tidypolars is a data frame library built on top of the blazingly fast polars library that gives access to methods and functions familiar to R tidyverse users.

Installation

You can install tidypolars with pip:

$ pip install tidypolars

Or through conda:

$ conda install -c conda-forge tidypolars

General syntax

tidypolars methods are designed to work like tidyverse functions:

import tidypolars as tp
from tidypolars import col, desc

df = tp.tibble(x = range(3), y = range(3, 6), z = ['a', 'a', 'b'])

(
    df
    .select('x', 'y', 'z')
    .filter(col('x') < 4, col('y') > 1)
    .arrange(desc('z'), 'x')
    .mutate(double_x = col('x') * 2,
            x_plus_y = col('x') + col('y'))
)
┌─────┬─────┬─────┬──────────┬──────────┐
│ x   ┆ y   ┆ z   ┆ double_x ┆ x_plus_y │
│ --- ┆ --- ┆ --- ┆ ---      ┆ ---      │
│ i64 ┆ i64 ┆ str ┆ i64      ┆ i64      │
╞═════╪═════╪═════╪══════════╪══════════╡
│ 2   ┆ 5   ┆ b   ┆ 4        ┆ 7        │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 0   ┆ 3   ┆ a   ┆ 0        ┆ 3        │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1   ┆ 4   ┆ a   ┆ 2        ┆ 5        │
└─────┴─────┴─────┴──────────┴──────────┘

The key difference from R is that column names must be wrapped in col() in the following methods:

  • .filter()
  • .mutate()
  • .summarize()

The general idea - when doing calculations on a column you need to wrap it in col(). When doing simple column selections (like in .select()) you can pass the column names as strings.

A full list of functions can be found here.

Group by syntax

Methods operate by group by calling the by arg.

  • A single column can be passed with _by = 'z'
  • Multiple columns can be passed with _by = ['y', 'z']
(
    df
    .summarize(avg_x = tp.mean(col('x')),
               _by = 'z')
)
┌─────┬───────┐
│ z   ┆ avg_x │
│ --- ┆ ---   │
│ str ┆ f64   │
╞═════╪═══════╡
│ a   ┆ 0.5   │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ b   ┆ 2     │
└─────┴───────┘

Selecting/dropping columns

tidyselect functions can be mixed with normal selection when selecting columns:

df = tp.tibble(x1 = range(3), x2 = range(3), y = range(3), z = range(3))

df.select(tp.starts_with('x'), 'z')
┌─────┬─────┬─────┐
│ x1  ┆ x2  ┆ z   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0   ┆ 0   ┆ 0   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1   ┆ 1   ┆ 1   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 2   ┆ 2   │
└─────┴─────┴─────┘

To drop columns use the .drop() method:

df.drop(tp.starts_with('x'), 'z')
┌─────┐
│ y   │
│ --- │
│ i64 │
╞═════╡
│ 0   │
├╌╌╌╌╌┤
│ 1   │
├╌╌╌╌╌┤
│ 2   │
└─────┘

Converting to/from pandas data frames

If you need to use a package that requires pandas data frames, you can convert from a tidypolars tibble to a pandas DataFrame.

To do this you'll first need to install pyarrow:

pip install pyarrow

To convert to a pandas DataFrame:

df = df.as_pandas()

To convert from a pandas DataFrame to a tidypolars tibble:

df = tp.as_tibble(df)

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidypolars-0.3.2.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

tidypolars-0.3.2-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file tidypolars-0.3.2.tar.gz.

File metadata

  • Download URL: tidypolars-0.3.2.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Darwin/24.0.0

File hashes

Hashes for tidypolars-0.3.2.tar.gz
Algorithm Hash digest
SHA256 009cb92d3211fb20a375adbfcf043703b04411aa22c05351addd13addff84f67
MD5 3336e55c01d1b0111bbc96d4daa18d53
BLAKE2b-256 06b498c9f4f6c87d0630c4745777140c889d3b1a70db5a292204020839f0deae

See more details on using hashes here.

File details

Details for the file tidypolars-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: tidypolars-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.14 Darwin/24.0.0

File hashes

Hashes for tidypolars-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 73ca614730c38abc63e30db832e599b098e95ea7094d9cff60fe24733b0a5293
MD5 18f38a0c23c9259b993f295658da67b1
BLAKE2b-256 4bc5fe58f5ff266d15d22c3eb214c04d035c606febd317ea51cac85e212e74de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page