Project description

PyPI Pandas Version

redframes (rectangular data frames) is a data manipulation library for ML and visualization. It is fully interoperable with pandas, compatible with scikit-learn, and works great with matplotlib!

redframes prioritizes syntax over flexibility and scope. And minimizes the number-of-googles-per-lines-of-code™ so that you can focus on the work that matters most.

"What is redframes?" would be the answer to the Jeopardy! clue "A pythonic dplyr".

Install & Import

pip install redframes
import redframes as rf


Copy-and-paste this:

import redframes as rf

df = rf.DataFrame({
    "foo": ["A", "A", "B", None, "B", "A", "A", "C"],
    "bar": [1, 4, 2, -4, 5, 6, 6, -2], 
    "baz": [0.99, None, 0.25, 0.75, 0.66, 0.47, 0.48, None]

# ['A', 'A', 'B', None, 'B', 'A', 'A', 'C']
# ['foo', 'bar', 'baz']
# {'rows': 8, 'columns': 3}
# False
# {'foo': str, 'bar': int, 'baz': float}

    .mutate({"bar100": lambda row: row["bar"] * 100})
    .select(["foo", "baz", "bar100"])
    .filter(lambda row: 
        (row["foo"].isin(["A", "B"])) & (row["bar100"] > 0)
        "bar_mean": ("bar100", rf.stat.mean), 
        "baz_sum": ("baz", rf.stat.sum)
    .gather(["bar_mean", "baz_sum"])


Save, load, and convert rf.DataFrame objects:

import redframes as rf
import pandas as pd

df = rf.DataFrame({"foo": [1, 2], "bar": ["A", "B"]})

# save/load, "example.csv")
df = rf.load("example.csv")

# to/from pandas
pandf = rf.unwrap(df)
reddf = rf.wrap(pandf)


There are 23 core "verbs" that make up rf.DataFrame objects. Each verb is pure, "chain-able", and has an analog in pandas/dplyr (see docstrings for more info/examples):

pandas dplyr
.accumulate cumsum mutate(... = cumsum(...))
.append concat bind_rows
.combine + unite
.dedupe drop_duplicates distinct
.denix dropna drop_na
.drop drop(..., axis=1) select(- ...)
.fill fillna fill, replace_na
.filter df[df[col] == condition] filter
.gather melt gather, pivot_longer
.group groupby group_by
.join merge *_join
.mutate apply, astype mutate
.rank rank("dense") dense_rank
.rename rename rename
.replace replace mutate(... = case_when(...))
.sample sample(n, frac) sample_n, sample_frac
.select select select
.shuffle sample(frac=1) sample_frac(..., 1)
.sort sort_values arrange
.split df[col].str.split() separate
.spread pivot_table spread, pivot_wider
.summarise agg summarise
.take head, tail slice_head, slice_tail


rf.DataFrame objects integrate seamlessly with matplotlib:

import redframes as rf
import matplotlib.pyplot as plt

df = rf.DataFrame({
    'position': ['TE', 'K', 'RB', 'WR', 'QB'],
    'avp': [116.98, 131.15, 180, 222.22, 272.91]

df = (
    .mutate({"color": lambda row: row["position"] in ["WR", "RB"]})
    .replace({"color": {False: "orange", True: "red"}})

plt.barh(df["position"], df["avp"], color=df["color"]);


rf.DataFrame objects are fully compatible with sklearn functions, estimators, and transformers:

import redframes as rf
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df = rf.DataFrame({
    "touchdowns": [15, 19, 5, 7, 9, 10, 12, 22, 16, 10],
    "age": [21, 22, 21, 24, 26, 28, 30, 35, 28, 21],
    "mvp": [1, 1, 0, 0, 0, 0, 0, 1, 0, 0]

target = "touchdowns"
y = df[target]
X = df.drop(target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

model = LinearRegression(), y_train)
model.score(X_test, y_test)
# 0.5083194901655527

# rf.DataFrame({'age': [21], 'mvp': [0]})

X_new = rf.DataFrame({'age': [22], 'mvp': [1]})
# array([19.])

