Skip to main content

Peer into the future of a data science project

Project description

BuildStatus Documentation Status Coverage Code Style License

Foreshadow is an automatic pipeline generation tool that makes creating, iterating, and evaluating machine learning pipelines a fast and intuitive experience allowing data scientists to spend more time on data science and less time on code.

Key Features

  • Scikit-Learn compatible

  • Automatic column intent inference
    • Numerical

    • Categorical

    • Text

    • Droppable (All values in a column are either the same or different)

  • Allow user override on column intent and transformation functions

  • Automatic feature preprocessing depending on the column intent type
    • Numerical: imputation followed by scaling

    • Categorical: a variety of categorical encoding

    • Text: TFIDF followed by SVD

  • Automatic model selection

  • Rapid pipeline development / iteration

Features in the road map

  • Automatic feature engineering

  • Automatic parameter optimization

Foreshadow supports python 3.6+

Installing Foreshadow

$ pip install foreshadow

Read the documentation to set up the project from source.

Getting Started

To get started with foreshadow, install the package using pip install. This will also install the dependencies. Now create a simple python script that uses all the defaults with Foreshadow.

First import foreshadow

from foreshadow.foreshadow import Foreshadow
from foreshadow.estimators import AutoEstimator
from foreshadow.utils import ProblemType

Also import sklearn, pandas, and numpy for the demo

import pandas as pd

from sklearn.datasets import boston_housing
from sklearn.model_selection import train_test_split

Now load in the boston housing dataset from sklearn into pandas dataframes. This is a common dataset for testing machine learning models and comes built in to scikit-learn.

boston = load_boston()
bostonX_df = pd.DataFrame(boston.data, columns=boston.feature_names)
bostony_df = pd.DataFrame(boston.target, columns=['target'])

Next, exactly as if working with an sklearn estimator, perform a train test split on the data and pass the train data into the fit function of a new Foreshadow object

X_train, X_test, y_train, y_test = train_test_split(bostonX_df,
   bostony_df, test_size=0.2)

problem_type = ProblemType.REGRESSION

estimator = AutoEstimator(
    problem_type=problem_type,
    auto="tpot",
    estimator_kwargs={"max_time_mins": 1},
)
shadow = Foreshadow(estimator=estimator, problem_type=problem_type)
shadow.fit(X_train, y_train)

Now fs is a fit Foreshadow object for which all feature engineering has been performed and the estimator has been trained and optimized. It is now possible to utilize this exactly as a fit sklearn estimator to make predictions.

shadow.score(X_test, y_test)

Great, you now have a working Foreshaow installation! Keep reading to learn how to export, modify and construct pipelines of your own.

Tutorial

We also have a jupyter notebook tutorial to go through more details under the examples folder.

Documentation

Read the docs!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

foreshadow-1.0.1.tar.gz (3.9 MB view details)

Uploaded Source

Built Distribution

foreshadow-1.0.1-py3-none-any.whl (4.1 MB view details)

Uploaded Python 3

File details

Details for the file foreshadow-1.0.1.tar.gz.

File metadata

  • Download URL: foreshadow-1.0.1.tar.gz
  • Upload date:
  • Size: 3.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.15 CPython/3.6.8 Darwin/17.7.0

File hashes

Hashes for foreshadow-1.0.1.tar.gz
Algorithm Hash digest
SHA256 c6044f6f0131c981fa5354bdee2ce96e5e29645cd8c006b0c9b441cbb06ea69b
MD5 67c155ee21b8601f3ed30d8176a92e41
BLAKE2b-256 e3802e2daa963093f3c738be7ecc7b8a174900d36ebd15ceaa9be50dde49bb90

See more details on using hashes here.

File details

Details for the file foreshadow-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: foreshadow-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.15 CPython/3.6.8 Darwin/17.7.0

File hashes

Hashes for foreshadow-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 06646e3bb146ef412f7b11d0b080b10a24ad0e39c0b63d65902b50d396f774ec
MD5 215e04e6694691dd901c7ace915b6eb7
BLAKE2b-256 07998a392e155e3d5ab363fb0f2f533a5023d2b0a5fc8634d20374f1e9312763

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page