Skip to main content

General Purpose Data Manipulation Library

Project description

redframes
Pandas Version PyPI Downloads

About

redframes (rectangular data frames) is a general purpose data manipulation library that prioritizes syntax, simplicity, and speed (to a solution). Importantly, the library is fully interoperable with pandas, compatible with scikit-learn, and works great with matplotlib.

Install & Import

pip install redframes
import redframes as rf

Quickstart

Copy-and-paste this to get started:

import redframes as rf

df = rf.DataFrame({
    'bear': ['Brown bear', 'Polar bear', 'Asian black bear', 'American black bear', 'Sun bear', 'Sloth bear', 'Spectacled bear', 'Giant panda'],
    'genus': ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda'],
    'weight (male, lbs)': ['300-860', '880-1320', '220-440', '125-500', '60-150', '175-310', '220-340', '190-275'],
    'weight (female, lbs)': ['205-455', '330-550', '110-275', '90-300', '45-90', '120-210', '140-180', '155-220']
})

# | bear                | genus      | weight (male, lbs)   | weight (female, lbs)   |
# |:--------------------|:-----------|:---------------------|:-----------------------|
# | Brown bear          | Ursus      | 300-860              | 205-455                |
# | Polar bear          | Ursus      | 880-1320             | 330-550                |
# | Asian black bear    | Ursus      | 220-440              | 110-275                |
# | American black bear | Ursus      | 125-500              | 90-300                 |
# | Sun bear            | Helarctos  | 60-150               | 45-90                  |
# | Sloth bear          | Melursus   | 175-310              | 120-210                |
# | Spectacled bear     | Tremarctos | 220-340              | 140-180                |
# | Giant panda         | Ailuropoda | 190-275              | 155-220                |

(
    df
        .rename({"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
        .gather(["male", "female"], into=("sex", "weight"))
        .split("weight", into=["min", "max"], sep="-")
        .gather(["min", "max"], into=("stat", "weight"))
        .mutate({"weight": lambda row: float(row["weight"])})
        .group(["genus", "sex"])
        .rollup({"weight": ("weight", rf.stat.mean)})
        .spread("sex", using="weight")
        .mutate({"dimorphism": lambda row: round(row["male"] / row["female"], 2)})
        .drop(["male", "female"])
        .sort("dimorphism", descending=True)
)

# | genus      |   dimorphism |
# |:-----------|-------------:|
# | Ursus      |         2.01 |
# | Tremarctos |         1.75 |
# | Helarctos  |         1.56 |
# | Melursus   |         1.47 |
# | Ailuropoda |         1.24 |

For comparison, here's the equivalent pandas:

import pandas as pd

# df = pd.DataFrame({...})

df = df.rename(columns={"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
df = pd.melt(df, id_vars=['bear', 'genus'], value_vars=['male', 'female'], var_name='sex', value_name='weight')
df[["min", "max"]] = df["weight"].str.split("-", expand=True)
df = df.drop("weight", axis=1)
df = pd.melt(df, id_vars=['bear', 'genus', 'sex'], value_vars=['min', 'max'], var_name='stat', value_name='weight')
df['weight'] = df["weight"].astype('float')
df = df.groupby(["genus", "sex"])["weight"].mean()
df = df.reset_index()
df = pd.pivot_table(df, index=['genus'], columns=['sex'], values='weight')
df = df.reset_index()
df = df.rename_axis(None, axis=1)
df["dimorphism"] = round(df["male"] / df["female"], 2)
df = df.drop(["female", "male"], axis=1)
df = df.sort_values("dimorphism", ascending=False)
df = df.reset_index(drop=True)

# 🤮

IO

Save, load, and convert rf.DataFrame objects:

# save .csv
rf.save(df, "bears.csv")

# load .csv
df = rf.load("bears.csv")

# convert redframes → pandas
pandas_df = rf.unwrap(df)

# convert pandas → redframes
df = rf.wrap(pandas_df)

Verbs

Verbs are pure and "chain-able" methods that manipulate rf.DataFrame objects. Here is the complete list (see docstrings for examples and more details):

Verb Description
accumulate Run a cumulative sum over a column
append Append rows from another DataFrame
combine Combine multiple columns into a single column (opposite of split)
cross Cross join columns from another DataFrame
dedupe Remove duplicate rows
denix Remove rows with missing values
drop Drop entire columns (opposite of select)
fill Fill missing values "down", "up", or with a constant
filter Keep rows matching specific conditions
gather Gather columns into rows (opposite of spread)
group Prepare groups for compatible verbs
join Join columns from another DataFrame
mutate Create a new, or overwrite an existing column
pack Collate and concatenate row values for a target column (opposite of unpack)
rank Rank order values in a column
rename Rename column keys
replace Replace matching values within columns
rollup Apply summary functions and/or statistics to target columns
sample Randomly sample any number of rows
select Select specific columns (opposite of drop)
shuffle Shuffle the order of all rows
sort Sort rows by specific columns
split Split a single column into multiple columns (opposite of combine)
spread Spread rows into columns (opposite of gather)
take Take any number of rows (from the top/bottom)
unpack "Explode" concatenated row values into multiple rows (opposite of pack)

Properties

In addition to all of the verbs there are several properties attached to each DataFrame object:

df["genus"] 
# ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda']

df.columns 
# ['bear', 'genus', 'weight (male, lbs)', 'weight (female, lbs)']

df.dimensions
# {'rows': 8, 'columns': 4}

df.empty
# False

df.memory
# '2 KB'

df.types
# {'bear': object, 'genus': object, 'weight (male, lbs)': object, 'weight (female, lbs)': object}

matplotlib

rf.DataFrame objects integrate seamlessly with matplotlib:

import redframes as rf
import matplotlib.pyplot as plt

football = rf.DataFrame({
    'position': ['TE', 'K', 'RB', 'WR', 'QB'],
    'avp': [116.98, 131.15, 180, 222.22, 272.91]
})

df = (
    football
        .mutate({"color": lambda row: row["position"] in ["WR", "RB"]})
        .replace({"color": {False: "orange", True: "red"}})
)

plt.barh(df["position"], df["avp"], color=df["color"]);
redframes

scikit-learn

rf.DataFrame objects are fully compatible with sklearn functions, estimators, and transformers:

import redframes as rf
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df = rf.DataFrame({
    "touchdowns": [15, 19, 5, 7, 9, 10, 12, 22, 16, 10],
    "age": [21, 22, 21, 24, 26, 28, 30, 35, 28, 21],
    "mvp": [1, 1, 0, 0, 0, 0, 0, 1, 0, 0]
})

target = "touchdowns"
y = df[target]
X = df.drop(target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

model = LinearRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)
# 0.5083194901655527

print(X_train.take(1))
# rf.DataFrame({'age': [21], 'mvp': [0]})

X_new = rf.DataFrame({'age': [22], 'mvp': [1]})
model.predict(X_new)
# array([19.])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redframes-1.4b1.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redframes-1.4b1-py3-none-any.whl (41.0 kB view details)

Uploaded Python 3

File details

Details for the file redframes-1.4b1.tar.gz.

File metadata

  • Download URL: redframes-1.4b1.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for redframes-1.4b1.tar.gz
Algorithm Hash digest
SHA256 cfd075c5e921d645ef628e04615d0a36429655b6ff71fac253e9ea06b3536bdc
MD5 b4c384ec056262af39f5b8ffdfcc9fe7
BLAKE2b-256 cb661da363b0c60f9d1a43e1e1cbe12b9d0b4903ea9d11b14e65f6489073f42d

See more details on using hashes here.

File details

Details for the file redframes-1.4b1-py3-none-any.whl.

File metadata

  • Download URL: redframes-1.4b1-py3-none-any.whl
  • Upload date:
  • Size: 41.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for redframes-1.4b1-py3-none-any.whl
Algorithm Hash digest
SHA256 bcf52f3464871c256a8b53dd73c3562d12e168ead545bcf96ac1b64b74e9798c
MD5 0a607e8e4ffd446a920afbf87e90f065
BLAKE2b-256 31fdb9d79ffc12f73caf00f1475e8c1b0a39b075f69277101d08801696240c3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page