Peer into the future of a data science project
Project description
Foreshadow is an automatic pipeline generation tool that makes creating, iterating, and evaluating machine learning pipelines a fast and intuitive experience allowing data scientists to spend more time on data science and less time on code.
Key Features
Scikit-Learn compatible
- Automatic column intent inference
Numerical
Categorical
Text
Droppable (All values in a column are either the same or different)
Allow user override on column intent and transformation functions
- Automatic feature preprocessing depending on the column intent type
Numerical: imputation followed by scaling
Categorical: a variety of categorical encoding
Text: TFIDF followed by SVD
Automatic model selection
Rapid pipeline development / iteration
Features in the road map
Automatic feature engineering
Automatic parameter optimization
Foreshadow supports python 3.6+
Installing Foreshadow
$ pip install foreshadow
Read the documentation to set up the project from source.
Getting Started
To get started with foreshadow, install the package using pip install. This will also install the dependencies. Now create a simple python script that uses all the defaults with Foreshadow.
First import foreshadow
from foreshadow.foreshadow import Foreshadow
from foreshadow.estimators import AutoEstimator
from foreshadow.utils import ProblemType
Also import sklearn, pandas, and numpy for the demo
import pandas as pd
from sklearn.datasets import boston_housing
from sklearn.model_selection import train_test_split
Now load in the boston housing dataset from sklearn into pandas dataframes. This is a common dataset for testing machine learning models and comes built in to scikit-learn.
boston = load_boston()
bostonX_df = pd.DataFrame(boston.data, columns=boston.feature_names)
bostony_df = pd.DataFrame(boston.target, columns=['target'])
Next, exactly as if working with an sklearn estimator, perform a train test split on the data and pass the train data into the fit function of a new Foreshadow object
X_train, X_test, y_train, y_test = train_test_split(bostonX_df,
bostony_df, test_size=0.2)
problem_type = ProblemType.REGRESSION
estimator = AutoEstimator(
problem_type=problem_type,
auto="tpot",
estimator_kwargs={"max_time_mins": 1},
)
shadow = Foreshadow(estimator=estimator, problem_type=problem_type)
shadow.fit(X_train, y_train)
Now fs is a fit Foreshadow object for which all feature engineering has been performed and the estimator has been trained and optimized. It is now possible to utilize this exactly as a fit sklearn estimator to make predictions.
shadow.score(X_test, y_test)
Great, you now have a working Foreshaow installation! Keep reading to learn how to export, modify and construct pipelines of your own.
Tutorial
We also have a jupyter notebook tutorial to go through more details under the examples folder.
Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file foreshadow-1.0.1.tar.gz
.
File metadata
- Download URL: foreshadow-1.0.1.tar.gz
- Upload date:
- Size: 3.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.15 CPython/3.6.8 Darwin/17.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6044f6f0131c981fa5354bdee2ce96e5e29645cd8c006b0c9b441cbb06ea69b |
|
MD5 | 67c155ee21b8601f3ed30d8176a92e41 |
|
BLAKE2b-256 | e3802e2daa963093f3c738be7ecc7b8a174900d36ebd15ceaa9be50dde49bb90 |
File details
Details for the file foreshadow-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: foreshadow-1.0.1-py3-none-any.whl
- Upload date:
- Size: 4.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.15 CPython/3.6.8 Darwin/17.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06646e3bb146ef412f7b11d0b080b10a24ad0e39c0b63d65902b50d396f774ec |
|
MD5 | 215e04e6694691dd901c7ace915b6eb7 |
|
BLAKE2b-256 | 07998a392e155e3d5ab363fb0f2f533a5023d2b0a5fc8634d20374f1e9312763 |