AutoML, Forecasting, NLP, Image Classification, Feature Engineering, Model Evaluation, Model Interpretation, Fast Processing.
Project description
Installation
# Most up-to-date
pip install git+https://github.com/AdrianAntico/RetroFit.git#egg=retrofit
# From pypi
pip install retrofit==0.0.2
# Check out R package RemixAutoML
https://github.com/AdrianAntico/RemixAutoML
Feature Engineering
Feature Engineering - Some of the feature engineering functions can only be found in this package. I believe feature engineering is your best bet for improving model performance. I have functions that cover all feature types. There are feature engineering functions for numeric data, categorical data, text data, and date data. They are all designed to generate features for training and scoring pipelines and they run extremely fast with low memory utilization. The package takes advantage of datatable or polars (user chooses) for all feature engineering and data wrangling related functions which means you'll only have to go to big data tools if absolutely necessary.
Machine Learning
Machine Learning Training -
Machine Learning Scoring -
Machine Learning Evaluation -
Machine Learning Interpretation -
Feature Engineering
Expand to view feature engineering functions
Feature Engineering: Numeric Feature Engineering
Expand to view content
Coming Soon
Feature Engineering: Categorical Feature Engineering
Expand to view content
Coming Soon
Feature Engineering: Module TimeSeriesFeatures
Expand to view content
AutoCalendarVariables()
Code Example
# Test Function
import datatable
import retrofit
from retrofit import TimeSeriesFeatures as ts
# Data can be created using the R package RemixAutoML and function FakeDataGenerator
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
data = ts.AutoCalendarVariables(
data=data,
ArgsList=None,
DateColumnNames = 'CalendarDateColumn',
CalendarVariables = ['wday','mday','wom','month','quarter','year'],
Processing = 'datatable',
InputFrame = 'datatable',
OutputFrame = 'datatable')
# Check
data.names
Function Description
AutoCalendarVariables()
Automatically generate calendar variables from your datatable.
AutoLags()
Code Example
# Test Function
import datatable
import retrofit
from retrofit import TimeSeriesFeatures as ts
# Data can be created using the R package RemixAutoML and function FakeDataGenerator
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
## Group Example:
data = ts.AutoLags(data=data, LagPeriods=[1,3,5,7], LagColumnNames='Leads', DateColumnName='CalendarDateColumn', ByVariables=None, ImputeValue=-1, Sort=True)
print(data.names)
## Group and Multiple Periods and LagColumnNames:
data = ts.AutoLags(data=data, LagPeriods=[1,3,5], LagColumnNames=['Leads','XREGS1'], DateColumnName='CalendarDateColumn', ByVariables=['MarketingSegments', 'MarketingSegments2', 'MarketingSegments3', 'Label'], ImputeValue=-1, Sort=True)
print(data.names)
## No Group Example:
data = ts.AutoLags(data=data, LagPeriods=1, LagColumnNames='Leads', DateColumnName='CalendarDateColumn', ByVariables=None, ImputeValue=-1, Sort=True)
print(data.names)
Function Description
AutoLags()
Automatically generate any number of lags, for any number of columns, by any number of By-Variables, using datatable.
AutoRollStats()
Code Example
# Test Function
import datatable
import retrofit
from retrofit import TimeSeriesFeatures as ts
## Group Example:
import datatable as dt
from datatable import sort, f, by
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
data = ts.AutoRollStats(data=data, RollColumnNames='Leads', DateColumnName='CalendarDateColumn', ByVariables=None, MovingAvg_Periods=[3,5,7], MovingSD_Periods=[3,5,7], MovingMin_Periods=[3,5,7], MovingMax_Periods=[3,5,7], ImputeValue=-1, Sort=True)
print(data.names)
## Group and Multiple Periods and RollColumnNames:
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
data = ts.AutoRollStats(data=data, RollColumnNames=['Leads','XREGS1'], DateColumnName='CalendarDateColumn', ByVariables=['MarketingSegments', 'MarketingSegments2', 'MarketingSegments3', 'Label'], MovingAvg_Periods=[3,5,7], MovingSD_Periods=[3,5,7], MovingMin_Periods=[3,5,7], MovingMax_Periods=[3,5,7], ImputeValue=-1, Sort=True)
print(data.names)
## No Group Example:
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
data = ts.AutoRollStats(data=data, RollColumnNames='Leads', DateColumnName='CalendarDateColumn', ByVariables=None, MovingAvg_Periods=[3,5,7], MovingSD_Periods=[3,5,7], MovingMin_Periods=[3,5,7], MovingMax_Periods=[3,5,7], ImputeValue=-1, Sort=True)
print(data.names)
Function Description
AutoRollStats()
Automatically generate any number of moving averages, moving standard deviations, moving mins and moving maxs from any number of source columns, by any number of By-Variables, using datatable.
AutoDiff()
Code Example
# Test Function
import datatable
import retrofit
from retrofit import TimeSeriesFeatures as ts
## Group Example:
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
data = ts.AutoDiff(data=data, DateColumnName = 'CalendarDateColumn', ByVariables = ['MarketingSegments', 'MarketingSegments2', 'MarketingSegments3', 'Label'], DiffNumericVariables = 'Leads', DiffDateVariables = 'CalendarDateColumn', DiffGroupVariables = None, NLag1 = 0, NLag2 = 1, Sort=True, InputFrame = 'datatable', OutputFrame = 'datatable')
print(data.names)
## Group and Multiple Periods and RollColumnNames:
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
data = ts.AutoDiff(data=data, DateColumnName = 'CalendarDateColumn', ByVariables = ['MarketingSegments', 'MarketingSegments2', 'MarketingSegments3', 'Label'], DiffNumericVariables = 'Leads', DiffDateVariables = 'CalendarDateColumn', DiffGroupVariables = None, NLag1 = 0, NLag2 = 1, Sort=True, InputFrame = 'datatable', OutputFrame = 'datatable')
print(data.names)
## No Group Example:
data = dt.fread("C:/Users/Bizon/Documents/GitHub/BenchmarkData.csv")
data = ts.AutoDiff(data=data, DateColumnName = 'CalendarDateColumn', ByVariables = None, DiffNumericVariables = 'Leads', DiffDateVariables = 'CalendarDateColumn', DiffGroupVariables = None, NLag1 = 0, NLag2 = 1, Sort=True, InputFrame = 'datatable', OutputFrame = 'datatable')
print(data.names)
Function Description
AutoDiff()
Automatically generate any number of differences from any number of source columns, for numeric, character, and date columns, by any number of By-Variables, using datatable.
Feature Engineering: Data Set Feature Engineering
Expand to view content
Coming Soon
Feature Engineering: Model-Based Feature Engineering
Expand to view content
Coming Soon
Machine Learning Training
Expand to view machine learning functions
Coming Soon
Machine Learning Scoring
Expand to view machine learning scoring functions
Coming Soon
Machine Learning Evaluation
Expand to view machine learning evaluation functions
Coming Soon
Machine Learning Interpretation
Expand to view machine learning interpretation functions
Coming Soon
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.