Python dictionary with broadcast support.
Project description
Broadcast Dictionary
Python dictionary with broadcast support.
Installation
pip install bcdict
Usage
from bcdict import BCDict
>>> d = BCDict({"a": "hello", "b": "world!"})
>>> d
{'a': 'hello', 'b': 'world!'}
Regular element access:
>>> d['a']
'hello'
Regular element assignments
>>> d['a'] = "Hello"
>>> d
{'a': 'Hello', 'b': 'world!'}
Calling functions:
>>> d.upper()
{'a': 'HELLO', 'b': 'WORLD!'}
Slicing:
>>> d[1:3]
{'a': 'el', 'b': 'or'}
Applying functions:
>>> d.pipe(len)
{'a': 5, 'b': 6}
When there is a conflict between an attribute in the values and an attribute in
BCDict
, use the attribute accessor explicitly:
>>> d.a.upper()
{'a': 'HELLO', 'b': 'WORLD!'}
Slicing with conflicting keys:
>>> n = BCDict({1:"hello", 2: "world"})
>>> n[1]
'hello'
>>> # Using the attribute accessor:
>>> n.a[1]
{1: 'e', 2: 'o'}
Full example
Here we create a dictionary with 3 datasets and then train, apply and validate a linear regression on all 3 datasets without a single for loop or dictionary comprehension.
from collections.abc import Collection
from pprint import pprint
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
def get_random_data(datasets: Collection) -> dict[str, pd.DataFrame]:
"""Just create some random data."""
columns = list("ABCD") + ["target"]
dfs = {}
for name in datasets:
dfs[name] = pd.DataFrame(
np.random.random((10, len(columns))), columns=columns
)
return dfs
datasets = ["noord", "brabant", "limburg"]
# make dict with three dataframes, one for each grid:
train_dfs = BCDict(get_random_data(datasets))
test_dfs = BCDict(get_random_data(datasets))
features = list("ABCD")
target = "target"
# get X, y *for all 3 grids at once*:
X_train = train_dfs[features]
y_train = train_dfs[target]
# get X, y *for all 3 grids at once*:
X_test = test_dfs[features]
y_test = test_dfs[target]
# creates models for all 3 grids at once:
# we call the `train` function on each dataframe in X_train, and pass the
# corresponding y_train series into the function.
def train(X: pd.DataFrame, y: pd.Series) -> LinearRegression:
"""We use this function to train a model."""
model = LinearRegression()
model.fit(X, y)
return model
models = X_train.pipe(train, y_train)
# Apply each model to the correct grid.
# `models` is a BCDict.
# When calling the `predict` function, it knows that `test_dfs` is a dict with
# the same keys as `models`. When calling predict on each model, the corresponding
# dataframe from `test_dfs` is passed to the function.
preds = models.predict(X_test)
# now we pipe all predictions and the
scores = y_test.pipe(r2_score, preds)
pprint(scores)
# {'brabant': -2.2075573154836925,
# 'limburg': -1.3066288799673251,
# 'noord': -0.8467452520467658}
assert list(scores.keys()) == datasets
assert all((isinstance(v, float) for v in scores.values()))
# Conclusion: not a single for loop or dict comprehension used to train 3 models
# predict and evaluate 3 data sets :)
Next steps
Check out the full documentation and the examples on bcdict.readthedocs.io
Changelog
v0.4.0
- new functions
eq()
andne()
for equality/inequality with broadcast support
v0.3.0
- new functions in
bcdict
package:apply()
broadcast()
broadcast_arg()
broadcast_kwarg()
- docs: write some documentation and host it on readthedocs
v0.2.0
- remove
item()
function. Use.a[]
instead.
v0.1.0
- initial release
Original repository: https://github.com/mariushelf/bcdict
Author: Marius Helf (helfsmarius@gmail.com)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.