General Purpose Data Manipulation Library
Project description
About
redframes (rectangular data frames) is a general purpose data manipulation library that prioritizes syntax, simplicity, and speed (to a solution). Importantly, the library is fully interoperable with pandas, compatible with scikit-learn, and works great with matplotlib.
Install & Import
pip install redframes
import redframes as rf
Quickstart
Copy-and-paste this to get started:
import redframes as rf
df = rf.DataFrame({
'bear': ['Brown bear', 'Polar bear', 'Asian black bear', 'American black bear', 'Sun bear', 'Sloth bear', 'Spectacled bear', 'Giant panda'],
'genus': ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda'],
'weight (male, lbs)': ['300-860', '880-1320', '220-440', '125-500', '60-150', '175-310', '220-340', '190-275'],
'weight (female, lbs)': ['205-455', '330-550', '110-275', '90-300', '45-90', '120-210', '140-180', '155-220']
})
# | bear | genus | weight (male, lbs) | weight (female, lbs) |
# |:--------------------|:-----------|:---------------------|:-----------------------|
# | Brown bear | Ursus | 300-860 | 205-455 |
# | Polar bear | Ursus | 880-1320 | 330-550 |
# | Asian black bear | Ursus | 220-440 | 110-275 |
# | American black bear | Ursus | 125-500 | 90-300 |
# | Sun bear | Helarctos | 60-150 | 45-90 |
# | Sloth bear | Melursus | 175-310 | 120-210 |
# | Spectacled bear | Tremarctos | 220-340 | 140-180 |
# | Giant panda | Ailuropoda | 190-275 | 155-220 |
(
df
.rename({"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
.gather(["male", "female"], into=("sex", "weight"))
.split("weight", into=["min", "max"], sep="-")
.gather(["min", "max"], into=("stat", "weight"))
.mutate({"weight": lambda row: float(row["weight"])})
.group(["genus", "sex"])
.rollup({"weight": ("weight", rf.stat.mean)})
.spread("sex", using="weight")
.mutate({"dimorphism": lambda row: round(row["male"] / row["female"], 2)})
.drop(["male", "female"])
.sort("dimorphism", descending=True)
)
# | genus | dimorphism |
# |:-----------|-------------:|
# | Ursus | 2.01 |
# | Tremarctos | 1.75 |
# | Helarctos | 1.56 |
# | Melursus | 1.47 |
# | Ailuropoda | 1.24 |
For comparison, here's the equivalent pandas:
import pandas as pd
# df = pd.DataFrame({...})
df = df.rename(columns={"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
df = pd.melt(df, id_vars=['bear', 'genus'], value_vars=['male', 'female'], var_name='sex', value_name='weight')
df[["min", "max"]] = df["weight"].str.split("-", expand=True)
df = df.drop("weight", axis=1)
df = pd.melt(df, id_vars=['bear', 'genus', 'sex'], value_vars=['min', 'max'], var_name='stat', value_name='weight')
df['weight'] = df["weight"].astype('float')
df = df.groupby(["genus", "sex"])["weight"].mean()
df = df.reset_index()
df = pd.pivot_table(df, index=['genus'], columns=['sex'], values='weight')
df = df.reset_index()
df = df.rename_axis(None, axis=1)
df["dimorphism"] = round(df["male"] / df["female"], 2)
df = df.drop(["female", "male"], axis=1)
df = df.sort_values("dimorphism", ascending=False)
df = df.reset_index(drop=True)
# 🤮
IO
Save, load, and convert rf.DataFrame
objects:
# save .csv
rf.save(df, "bears.csv")
# load .csv
df = rf.load("bears.csv")
# convert redframes → pandas
pandas_df = rf.unwrap(df)
# convert pandas → redframes
df = rf.wrap(pandas_df)
Verbs
Verbs are pure and "chain-able" methods that manipulate rf.DataFrame
objects. Here is the complete list (see docstrings for examples and more details):
Verb | Description |
---|---|
accumulate ‡ |
Run a cumulative sum over a column |
append |
Append rows from another DataFrame |
combine |
Combine multiple columns into a single column (opposite of split ) |
cross |
Cross join columns from another DataFrame |
dedupe |
Remove duplicate rows |
denix |
Remove rows with missing values |
drop |
Drop entire columns (opposite of select ) |
fill |
Fill missing values "down", "up", or with a constant |
filter |
Keep rows matching specific conditions |
gather ‡ |
Gather columns into rows (opposite of spread ) |
group |
Prepare groups for compatible verbs‡ |
join |
Join columns from another DataFrame |
mutate |
Create a new, or overwrite an existing column |
pack ‡ |
Collate and concatenate row values for a target column (opposite of unpack ) |
rank ‡ |
Rank order values in a column |
rename |
Rename column keys |
replace |
Replace matching values within columns |
rollup ‡ |
Apply summary functions and/or statistics to target columns |
sample |
Randomly sample any number of rows |
select |
Select specific columns (opposite of drop ) |
shuffle |
Shuffle the order of all rows |
sort |
Sort rows by specific columns |
split |
Split a single column into multiple columns (opposite of combine ) |
spread |
Spread rows into columns (opposite of gather ) |
take ‡ |
Take any number of rows (from the top/bottom) |
unpack |
"Explode" concatenated row values into multiple rows (opposite of pack ) |
Properties
In addition to all of the verbs there are several properties attached to each DataFrame
object:
df["genus"]
# ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda']
df.columns
# ['bear', 'genus', 'weight (male, lbs)', 'weight (female, lbs)']
df.dimensions
# {'rows': 8, 'columns': 4}
df.empty
# False
df.memory
# '2 KB'
df.types
# {'bear': object, 'genus': object, 'weight (male, lbs)': object, 'weight (female, lbs)': object}
matplotlib
rf.DataFrame
objects integrate seamlessly with matplotlib
:
import redframes as rf
import matplotlib.pyplot as plt
football = rf.DataFrame({
'position': ['TE', 'K', 'RB', 'WR', 'QB'],
'avp': [116.98, 131.15, 180, 222.22, 272.91]
})
df = (
football
.mutate({"color": lambda row: row["position"] in ["WR", "RB"]})
.replace({"color": {False: "orange", True: "red"}})
)
plt.barh(df["position"], df["avp"], color=df["color"]);
scikit-learn
rf.DataFrame
objects are fully compatible with sklearn
functions, estimators, and transformers:
import redframes as rf
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
df = rf.DataFrame({
"touchdowns": [15, 19, 5, 7, 9, 10, 12, 22, 16, 10],
"age": [21, 22, 21, 24, 26, 28, 30, 35, 28, 21],
"mvp": [1, 1, 0, 0, 0, 0, 0, 1, 0, 0]
})
target = "touchdowns"
y = df[target]
X = df.drop(target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
model = LinearRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)
# 0.5083194901655527
print(X_train.take(1))
# rf.DataFrame({'age': [21], 'mvp': [0]})
X_new = rf.DataFrame({'age': [22], 'mvp': [1]})
model.predict(X_new)
# array([19.])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file redframes-1.4.1.tar.gz
.
File metadata
- Download URL: redframes-1.4.1.tar.gz
- Upload date:
- Size: 29.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a340a7b9e1b09b5e32967e67fecef5b0e7026a9c0c379ab8ec872f0ba2d7b5a4 |
|
MD5 | 012f3abb54213de9cad4575114abad0b |
|
BLAKE2b-256 | 4f3c92c4f80875dd3767a09fd5e61881e8d6a0b16cb8e457c55b4db8b579c5d3 |
File details
Details for the file redframes-1.4.1-py3-none-any.whl
.
File metadata
- Download URL: redframes-1.4.1-py3-none-any.whl
- Upload date:
- Size: 41.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d8d882273ef62d40df67b9ae9c0ad3f7faa7418d818479944714ba81cdcbba3 |
|
MD5 | 078a7179defbb1b33b18d45474f6360d |
|
BLAKE2b-256 | af3050f4d3f72390e60e94e78396d1550b7aa5db562b10dd194e2afdc391dd1c |