Skip to main content

[re]ctangular[d]ata[frames]

Project description

redframes
PyPI PyPI - Python Version Pandas Version

redframes (rectangular data frames) is a data manipulation library for ML and visualization. It is fully interoperable with pandas, compatible with scikit-learn, and works great with matplotlib!

redframes prioritizes syntax over flexibility and scope. And minimizes the number-of-googles-per-lines-of-code™ so that you can focus on the work that matters most.

"What is redframes?" would be the answer to the Jeopardy! clue "A pythonic dplyr".

Install & Import

pip install redframes
import redframes as rf

Quickstart

Copy-and-paste this:

import redframes as rf

df = rf.DataFrame({
    "foo": ["A", "A", "B", None, "B", "A", "A", "C"],
    "bar": [1, 4, 2, -4, 5, 6, 6, -2], 
    "baz": [0.99, None, 0.25, 0.75, 0.66, 0.47, 0.48, None]
})

# | foo   |   bar |    baz |
# |:------|------:|-------:|
# | A     |     1 |   0.99 |
# | A     |     4 |        |
# | B     |     2 |   0.25 |
# |       |    -4 |   0.75 |
# | B     |     5 |   0.66 |
# | A     |     6 |   0.47 |
# | A     |     6 |   0.48 |
# | C     |    -2 |        |

(
    df
    .mutate({"bar100": lambda row: row["bar"] * 100})
    .select(["foo", "baz", "bar100"])
    .filter(lambda row: (row["foo"].isin(["A", "B"])) & (row["bar100"] > 0))
    .denix("baz")
    .group("foo")
    .rollup({
        "bar_mean": ("bar100", rf.stat.mean), 
        "baz_sum": ("baz", rf.stat.sum)
    })
    .gather(["bar_mean", "baz_sum"], into=("variable", "value"))
    .sort("value")
)

# | foo   | variable   |   value |
# |:------|:-----------|--------:|
# | B     | baz_sum    |   0.91  |
# | A     | baz_sum    |   1.94  |
# | B     | bar_mean   | 350     |
# | A     | bar_mean   | 433.333 |

IO

Save, load, and convert rf.DataFrame objects:

import redframes as rf
import pandas as pd

df = rf.DataFrame({"foo": [1, 2], "bar": ["A", "B"]})

# save/load
rf.save(df, "example.csv")
df = rf.load("example.csv")

# to/from pandas
pandf = rf.unwrap(df)
reddf = rf.wrap(pandf)

Verbs

There are 24 core "verbs" that make up rf.DataFrame objects. Each verb is pure, "chain-able", and has an analog in pandas/tidyverse (see docstrings for more info/examples):

pandas tidyverse
.accumulate cumsum mutate(... = cumsum(...))
.append concat bind_rows
.combine + unite
.cross merge(..., how="cross") full_join(..., by = character())
.dedupe drop_duplicates distinct
.denix dropna drop_na
.drop drop(..., axis=1) select(-...)
.fill fillna fill, replace_na
.filter df[df[col] == condition] filter
.gather melt gather, pivot_longer
.group groupby group_by
.join merge *_join
.mutate apply, astype mutate
.rank rank("dense") dense_rank
.rename rename rename
.replace replace mutate(... = case_when(...))
.rollup agg summarize
.sample sample(n, frac) sample_n, sample_frac
.select select select
.shuffle sample(frac=1) sample_frac(..., 1)
.sort sort_values arrange
.split df[col].str.split() separate
.spread pivot_table spread, pivot_wider
.take head, tail slice_head, slice_tail

Properties

In addition to all of the verbs there are several properties attached to each DataFrame:

df["foo"] 
# ['A', 'A', 'B', None, 'B', 'A', 'A', 'C']

df.columns 
# ['foo', 'bar', 'baz']

df.dimensions
# {'rows': 8, 'columns': 3}

df.empty
# False

df.memory
# '686 B'

df.types
# {'foo': object, 'bar': int, 'baz': float}

matplotlib

rf.DataFrame objects integrate seamlessly with matplotlib:

import redframes as rf
import matplotlib.pyplot as plt

df = rf.DataFrame({
    'position': ['TE', 'K', 'RB', 'WR', 'QB'],
    'avp': [116.98, 131.15, 180, 222.22, 272.91]
})

df = (
    df
    .mutate({"color": lambda row: row["position"] in ["WR", "RB"]})
    .replace({"color": {False: "orange", True: "red"}})
)

plt.barh(df["position"], df["avp"], color=df["color"]);
redframes

scikit-learn

rf.DataFrame objects are fully compatible with sklearn functions, estimators, and transformers:

import redframes as rf
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df = rf.DataFrame({
    "touchdowns": [15, 19, 5, 7, 9, 10, 12, 22, 16, 10],
    "age": [21, 22, 21, 24, 26, 28, 30, 35, 28, 21],
    "mvp": [1, 1, 0, 0, 0, 0, 0, 1, 0, 0]
})

target = "touchdowns"
y = df[target]
X = df.drop(target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

model = LinearRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)
# 0.5083194901655527

print(X_train.take(1))
# rf.DataFrame({'age': [21], 'mvp': [0]})

X_new = rf.DataFrame({'age': [22], 'mvp': [1]})
model.predict(X_new)
# array([19.])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redframes-1.3b1.tar.gz (26.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redframes-1.3b1-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file redframes-1.3b1.tar.gz.

File metadata

  • Download URL: redframes-1.3b1.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for redframes-1.3b1.tar.gz
Algorithm Hash digest
SHA256 b42eafb5b6186a58cfc6f8504986ab24b1549717032dfd2ecfcfbbd70b7557aa
MD5 d181fd8c2048acc65a10b75d8ca4dfee
BLAKE2b-256 97da9c5b33a103b8ef9a488e02d2453d4b9b52c9f2d08bc251b128ccef6c620c

See more details on using hashes here.

File details

Details for the file redframes-1.3b1-py3-none-any.whl.

File metadata

  • Download URL: redframes-1.3b1-py3-none-any.whl
  • Upload date:
  • Size: 37.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for redframes-1.3b1-py3-none-any.whl
Algorithm Hash digest
SHA256 695631b0d2d3a3d54c66a9d2d86f0cac049f20f785d9e38b6598b4068b690d40
MD5 c7fca9bc304893f322c61a48a03cdb64
BLAKE2b-256 68040e769b3e3510d45586a4621a537cb57d952ed93efd0cb16060d04b8d7d2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page