RandomizedSearchCV/GridSearchCV with pandas.DataFrame interface
Project description
sklearn-cv-pandas
RandomizedSearchCV/GridSearchCV with pandas.DataFrame interface
Why do I want this?
- I usually prepare features as pandas.DataFrame
- Scikit learn input should be
array-like. https://scikit-learn.org/stable/glossary.html#term-array-like. - Although it includes pandas.DataFrame, there are some issues;
- It does not support
Int64data type - Output model does not remember which columns should be used
- It does not support
Solution
- Provide GridSearchCV / RandomizedSearchCV with pandas.DataFrame interface
- Internally preprocess DataFrame to be applicable for sklearn
- Output of
fitcommand is now originalModelobject, which- stores column name information
- provides pandas.DataFrame interface for prediction
Installation
pip install sklearn_cv_pandas
Usage
Configure CV object
Instantiate CV in the same manner as original ones.
from scipy import stats
from sklearn import linear_model
from sklearn_cv_pandas import RandomizedSearchCV
estimator = linear_model.Lasso()
param_dist = dict(alpha=stats.loguniform(1e-5, 10))
cv = RandomizedSearchCV(estimator, param_dist, scoring="mean_absolute_error")
fit with pandas.DataFrame
Our CV object has new methods fit_holdout_pandas and fit_cv_pandas.
Original ones requires x and y as numpy.array.
Instead of numpy array, you can specify one pandas.DataFrame
and column names for x (feature_columns), and column name of y (target_column).
model = cv.fit_cv_pandas(
df, target_column="y", feature_columns=["x{}".format(i) for i in range(100)], n_fold=5
)
predict with pandas.DataFrame
You can run prediction with pandas.DataFrame interface as well.
Output of fit_holdout_pandas and fit_cv_pandas stores feature_columns and target_column.
You can just input pandas.DataFrame for prediction into the method predict.
model.predict(df)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sklearn_cv_pandas-0.0.12.tar.gz.
File metadata
- Download URL: sklearn_cv_pandas-0.0.12.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2332de99179047c65265b49a991e51324982a96e992050fe7ec2be8b30ece363
|
|
| MD5 |
c61543c5f9d56c38a5d66b52ca1f625c
|
|
| BLAKE2b-256 |
052cb1b023e8f187f264bd2cef41e91c0e9f5dbd88b9ecee7216263e91de1666
|
File details
Details for the file sklearn_cv_pandas-0.0.12-py3-none-any.whl.
File metadata
- Download URL: sklearn_cv_pandas-0.0.12-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ce37900b15b9faf98c11b6e415ab13e5b0fcfe27557e7471a56df1aea144b91
|
|
| MD5 |
c476a8ac43aaeb3e26f51e38cec76090
|
|
| BLAKE2b-256 |
10999378c1b940380eef4f459f9dd5902496a16bfd5e2236d5e514ee841ac89a
|