Skip to main content

"A tidier approach to pandas."

Project description

TidyBear

A tidier approach to pandas.

This package was originally a collection of functions, routines, and processes that I found myself often repeating. It has since evolved into a desire to work my way through the tidyverse to reimplement my favorite tidy features in python. This project is not aimed at creating a better experience for every pandas task, but rather just a different one that sometimes feels more natural to me. I hope something here can be useful to you.

Installation

pip install tidybear

Usage

import pandas as pd
import tidybear as tb

Verbs

# rename columns
tb.rename(data, old="new")

# select columns
tb.select(data, ["col1", "col2"])

# count number of rows across multiple columns
tb.count(data, ["col1", "col2"])

# pivot long to wide or wide to long
tb.pivot_longer(data, ["val1", "val2"], names_to="val_type")
tb.pivot_wider(data, names_from="val_type", values_from="value")

# slice rows
tb.slice_max(data, order_by="val1", n=10)
tb.slice_min(data, order_by="val1", n=10, groupby="col1")

# join dataframes
tb.left_join(data1, data2, "colA") #  use "colA" as key
tb.right_join(data1, data2, col1A="col1B") #  use "col1A" from left and "col1B" from right

tb.cross_join(data1, data2)

Groupby and Summarise API

with tb.GroupBy(df, "group_var") as g:
    g.n()
    g.sum("value", name="total_value")
    g.n_distinct("ids", name="n_unique_ids")

    summary = g.summarise()

TidySelectors

  • everything() - Select all columns
  • last_col - Select last column
  • first_col - Select first column
  • contains(pattern) - Select columns that contain the literal string
  • matches(pattern) - Select columns that match the regular expression pattern
  • starts_with(pattern) - Select columns that start with the literal string
  • ends_with - Select all columns that end with the literal srting
  • num_range - Select all columns that match a numeric range like x01, x02, x03

These can be used in a variety of tidybear verbs

from tidybear.selectors import contains, everything

# select all columns that contain "foo"
tb.select(data, contains("foo"))

# pivot all columns to long format
tb.pivot_longer(data, everything())

You can also negate these, so if you wanted everything except one columns, you could do:

from tidybear.selectors import last_col

tb.select(data, -last_col())

Coming Soon (maybe)

  • Method chaining
  • Tribbles!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidybear-0.0.5.tar.gz (10.7 kB view hashes)

Uploaded Source

Built Distribution

tidybear-0.0.5-py3-none-any.whl (12.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page