Skip to main content

General Purpose Data Manipulation Library

Project description

redframes
Pandas Version PyPI Downloads

About

redframes (rectangular data frames) is a general purpose data manipulation library that prioritizes syntax, simplicity, and speed (to a solution). Importantly, the library is fully interoperable with pandas, compatible with scikit-learn, and works great with matplotlib.

Install & Import

pip install redframes
import redframes as rf

Quickstart

Copy-and-paste this to get started:

import redframes as rf

df = rf.DataFrame({
    'bear': ['Brown bear', 'Polar bear', 'Asian black bear', 'American black bear', 'Sun bear', 'Sloth bear', 'Spectacled bear', 'Giant panda'],
    'genus': ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda'],
    'weight (male, lbs)': ['300-860', '880-1320', '220-440', '125-500', '60-150', '175-310', '220-340', '190-275'],
    'weight (female, lbs)': ['205-455', '330-550', '110-275', '90-300', '45-90', '120-210', '140-180', '155-220']
})

# | bear                | genus      | weight (male, lbs)   | weight (female, lbs)   |
# |:--------------------|:-----------|:---------------------|:-----------------------|
# | Brown bear          | Ursus      | 300-860              | 205-455                |
# | Polar bear          | Ursus      | 880-1320             | 330-550                |
# | Asian black bear    | Ursus      | 220-440              | 110-275                |
# | American black bear | Ursus      | 125-500              | 90-300                 |
# | Sun bear            | Helarctos  | 60-150               | 45-90                  |
# | Sloth bear          | Melursus   | 175-310              | 120-210                |
# | Spectacled bear     | Tremarctos | 220-340              | 140-180                |
# | Giant panda         | Ailuropoda | 190-275              | 155-220                |

(
    df
        .rename({"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
        .gather(["male", "female"], into=("sex", "weight"))
        .split("weight", into=["min", "max"], sep="-")
        .gather(["min", "max"], into=("stat", "weight"))
        .mutate({"weight": lambda row: float(row["weight"])})
        .group(["genus", "sex"])
        .rollup({"weight": ("weight", rf.stat.mean)})
        .spread("sex", using="weight")
        .mutate({"dimorphism": lambda row: round(row["male"] / row["female"], 2)})
        .drop(["male", "female"])
        .sort("dimorphism", descending=True)
)

# | genus      |   dimorphism |
# |:-----------|-------------:|
# | Ursus      |         2.01 |
# | Tremarctos |         1.75 |
# | Helarctos  |         1.56 |
# | Melursus   |         1.47 |
# | Ailuropoda |         1.24 |

For comparison, here's the equivalent pandas:

import pandas as pd

# df = pd.DataFrame({...})

df = df.rename(columns={"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
df = pd.melt(df, id_vars=['bear', 'genus'], value_vars=['male', 'female'], var_name='sex', value_name='weight')
df[["min", "max"]] = df["weight"].str.split("-", expand=True)
df = df.drop("weight", axis=1)
df = pd.melt(df, id_vars=['bear', 'genus', 'sex'], value_vars=['min', 'max'], var_name='stat', value_name='weight')
df['weight'] = df["weight"].astype('float')
df = df.groupby(["genus", "sex"])["weight"].mean()
df = df.reset_index()
df = pd.pivot_table(df, index=['genus'], columns=['sex'], values='weight')
df = df.reset_index()
df = df.rename_axis(None, axis=1)
df["dimorphism"] = round(df["male"] / df["female"], 2)
df = df.drop(["female", "male"], axis=1)
df = df.sort_values("dimorphism", ascending=False)
df = df.reset_index(drop=True)

# 🤮

IO

Save, load, and convert rf.DataFrame objects:

# save .csv
rf.save(df, "bears.csv")

# load .csv
df = rf.load("bears.csv")

# convert redframes → pandas
pandas_df = rf.unwrap(df)

# convert pandas → redframes
df = rf.wrap(pandas_df)

Verbs

Verbs are pure and "chain-able" methods that manipulate rf.DataFrame objects. Here is the complete list (see docstrings for examples and more details):

Verb Description
accumulate Run a cumulative sum over a column
append Append rows from another DataFrame
combine Combine multiple columns into a single column (opposite of split)
cross Cross join columns from another DataFrame
dedupe Remove duplicate rows
denix Remove rows with missing values
drop Drop entire columns (opposite of select)
fill Fill missing values "down", "up", or with a constant
filter Keep rows matching specific conditions
gather Gather columns into rows (opposite of spread)
group Prepare groups for compatible verbs
join Join columns from another DataFrame
mutate Create a new, or overwrite an existing column
pack Collate and concatenate row values for a target column (opposite of unpack)
rank Rank order values in a column
rename Rename column keys
replace Replace matching values within columns
rollup Apply summary functions and/or statistics to target columns
sample Randomly sample any number of rows
select Select specific columns (opposite of drop)
shuffle Shuffle the order of all rows
sort Sort rows by specific columns
split Split a single column into multiple columns (opposite of combine)
spread Spread rows into columns (opposite of gather)
take Take any number of rows (from the top/bottom)
unpack "Explode" concatenated row values into multiple rows (opposite of pack)

Properties

In addition to all of the verbs there are several properties attached to each DataFrame object:

df["genus"] 
# ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda']

df.columns 
# ['bear', 'genus', 'weight (male, lbs)', 'weight (female, lbs)']

df.dimensions
# {'rows': 8, 'columns': 4}

df.empty
# False

df.memory
# '2 KB'

df.types
# {'bear': object, 'genus': object, 'weight (male, lbs)': object, 'weight (female, lbs)': object}

matplotlib

rf.DataFrame objects integrate seamlessly with matplotlib:

import redframes as rf
import matplotlib.pyplot as plt

football = rf.DataFrame({
    'position': ['TE', 'K', 'RB', 'WR', 'QB'],
    'avp': [116.98, 131.15, 180, 222.22, 272.91]
})

df = (
    football
        .mutate({"color": lambda row: row["position"] in ["WR", "RB"]})
        .replace({"color": {False: "orange", True: "red"}})
)

plt.barh(df["position"], df["avp"], color=df["color"]);
redframes

scikit-learn

rf.DataFrame objects are fully compatible with sklearn functions, estimators, and transformers:

import redframes as rf
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df = rf.DataFrame({
    "touchdowns": [15, 19, 5, 7, 9, 10, 12, 22, 16, 10],
    "age": [21, 22, 21, 24, 26, 28, 30, 35, 28, 21],
    "mvp": [1, 1, 0, 0, 0, 0, 0, 1, 0, 0]
})

target = "touchdowns"
y = df[target]
X = df.drop(target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

model = LinearRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)
# 0.5083194901655527

print(X_train.take(1))
# rf.DataFrame({'age': [21], 'mvp': [0]})

X_new = rf.DataFrame({'age': [22], 'mvp': [1]})
model.predict(X_new)
# array([19.])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redframes-1.4.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redframes-1.4-py3-none-any.whl (41.0 kB view details)

Uploaded Python 3

File details

Details for the file redframes-1.4.tar.gz.

File metadata

  • Download URL: redframes-1.4.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for redframes-1.4.tar.gz
Algorithm Hash digest
SHA256 cdd1b1d1f7a6cc6422a38767bf17c2b61526ba3200865dce6321f95a99854434
MD5 1106bc50a47ac08cc0005af8cd8f22ce
BLAKE2b-256 6404fed7519c50bb6589799b4ffa325f106486f0fa8d69e0b598594df0163326

See more details on using hashes here.

File details

Details for the file redframes-1.4-py3-none-any.whl.

File metadata

  • Download URL: redframes-1.4-py3-none-any.whl
  • Upload date:
  • Size: 41.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for redframes-1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3b708d7a89f72fd9dd91160a406f7efdf6a1f9cdae2807251a1eab4059d33733
MD5 400b2d6d0b504dc3948984852777db78
BLAKE2b-256 0ce2914573520a2152730186e4de69f1393fab40c75e5d626eae6fe4690df2b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page