[re]ctangular[d]ata[frames]
Project description
redframes (rectangular data frames) is a data manipulation library for ML and visualization. It is fully interoperable with pandas, compatible with scikit-learn, and works great with matplotlib!
redframes prioritizes syntax over flexibility and scope. And minimizes the number-of-googles-per-lines-of-code™ so that you can focus on the work that matters most.
"What is redframes?" would be the answer to the Jeopardy! clue "A pythonic dplyr".
Install & Import
pip install redframes
import redframes as rf
Quickstart
Copy-and-paste this:
import redframes as rf
df = rf.DataFrame({
"foo": ["A", "A", "B", None, "B", "A", "A", "C"],
"bar": [1, 4, 2, -4, 5, 6, 6, -2],
"baz": [0.99, None, 0.25, 0.75, 0.66, 0.47, 0.48, None]
})
df["foo"]
# ['A', 'A', 'B', None, 'B', 'A', 'A', 'C']
df.columns
# ['foo', 'bar', 'baz']
df.dimensions
# {'rows': 8, 'columns': 3}
df.empty
# False
df.types
# {'foo': object, 'bar': int, 'baz': float}
(
df
.mutate({"bar100": lambda row: row["bar"] * 100})
.select(["foo", "baz", "bar100"])
.filter(lambda row:
(row["foo"].isin(["A", "B"])) & (row["bar100"] > 0)
)
.denix("baz")
.group("foo")
.rollup({
"bar_mean": ("bar100", rf.stat.mean),
"baz_sum": ("baz", rf.stat.sum)
})
.gather(["bar_mean", "baz_sum"])
.sort("value")
)
IO
Save, load, and convert rf.DataFrame
objects:
import redframes as rf
import pandas as pd
df = rf.DataFrame({"foo": [1, 2], "bar": ["A", "B"]})
# save/load
rf.save(df, "example.csv")
df = rf.load("example.csv")
# to/from pandas
pandf = rf.unwrap(df)
reddf = rf.wrap(pandf)
Verbs
There are 23 core "verbs" that make up rf.DataFrame
objects. Each verb is pure, "chain-able", and has an analog in pandas/dplyr (see docstrings for more info/examples):
pandas | dplyr | |
---|---|---|
.accumulate |
cumsum |
mutate(... = cumsum(...)) |
.append |
concat |
bind_rows |
.combine |
+ |
unite |
.dedupe |
drop_duplicates |
distinct |
.denix |
dropna |
drop_na |
.drop |
drop(..., axis=1) |
select(- ...) |
.fill |
fillna |
fill , replace_na |
.filter |
df[df[col] == condition] |
filter |
.gather |
melt |
gather , pivot_longer |
.group |
groupby |
group_by |
.join |
merge |
*_join |
.mutate |
apply , astype |
mutate |
.rank |
rank("dense") |
dense_rank |
.rename |
rename |
rename |
.replace |
replace |
mutate(... = case_when(...)) |
.rollup |
agg |
summarize |
.sample |
sample(n, frac) |
sample_n , sample_frac |
.select |
select |
select |
.shuffle |
sample(frac=1) |
sample_frac(..., 1) |
.sort |
sort_values |
arrange |
.split |
df[col].str.split() |
separate |
.spread |
pivot_table |
spread , pivot_wider |
.take |
head , tail |
slice_head , slice_tail |
matplotlib
rf.DataFrame
objects integrate seamlessly with matplotlib
:
import redframes as rf
import matplotlib.pyplot as plt
df = rf.DataFrame({
'position': ['TE', 'K', 'RB', 'WR', 'QB'],
'avp': [116.98, 131.15, 180, 222.22, 272.91]
})
df = (
df
.mutate({"color": lambda row: row["position"] in ["WR", "RB"]})
.replace({"color": {False: "orange", True: "red"}})
)
plt.barh(df["position"], df["avp"], color=df["color"]);
scikit-learn
rf.DataFrame
objects are fully compatible with sklearn
functions, estimators, and transformers:
import redframes as rf
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
df = rf.DataFrame({
"touchdowns": [15, 19, 5, 7, 9, 10, 12, 22, 16, 10],
"age": [21, 22, 21, 24, 26, 28, 30, 35, 28, 21],
"mvp": [1, 1, 0, 0, 0, 0, 0, 1, 0, 0]
})
target = "touchdowns"
y = df[target]
X = df.drop(target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
model = LinearRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)
# 0.5083194901655527
print(X_train.take(1))
# rf.DataFrame({'age': [21], 'mvp': [0]})
X_new = rf.DataFrame({'age': [22], 'mvp': [1]})
model.predict(X_new)
# array([19.])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file redframes-1.1.tar.gz
.
File metadata
- Download URL: redframes-1.1.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
e374acb383f6b0ad5e9e21e6dbbd22a7e8bba2156e70623fd3b4073ab55ee870
|
|
MD5 |
897a8cf0feb3f86dd55fe013c6220476
|
|
BLAKE2b-256 |
6b71cfb3471becfa93aec11eeddc1283feadf980b1d12196e142511aec2e54ca
|
File details
Details for the file redframes-1.1-py3-none-any.whl
.
File metadata
- Download URL: redframes-1.1-py3-none-any.whl
- Upload date:
- Size: 35.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
d621a702dba10c364b974fc6cd433ffe72025b106b500cf7b0aa30fac4eb0fe3
|
|
MD5 |
b72ed847f31225450c815b3a17868341
|
|
BLAKE2b-256 |
9f990f65a269202ada2487f4cf024954eca968df5ba7dba8f84b46c039a684c1
|