A machine learning pipeline for preprocessing, model selection, and evaluation.
Project description
🧠 Brains Build Code - Automated Machine Learning Pipeline
Brains Build Code is an automated machine learning pipeline designed to simplify the end-to-end machine learning workflow. It handles:
- Data preprocessing
- Feature engineering
- Model selection
- Hyperparameter tuning
- Model evaluation
Built to save you time, reduce boilerplate, and accelerate experimentation.
Installation
From PyPI
pip install brainsbuildcode
Directly from GitHub
pip install git+https://github.com/achelousace/brainsbuildcode.git
📖 Usage
Fast Build Example
from brainsbuildcode import Brain
from sklearn.datasets import load_breast_cancer
import pandas as pd
# Load dataset
data = load_breast_cancer(as_frame=True)
df = data.frame
# Instantiate and build the model
best_model = Brain(df, target='target', model_name='RFC', grid_search=None)
best_model.build()
Alternative (Chainable Call)
from brainsbuildcode import Brain
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer(as_frame=True)
df = data.frame
# Instantiate and immediately build
best_model = Brain(df, target='target', model_name='RFC', grid_search='cv').build()
Full Pipline Example
from sklearn.datasets import load_iris
import pandas as pd
from brainsbuildcode import Convert, Brain
# Load dataset
data = load_iris(as_frame=True)
df = data.frame
df['target'] = data.target
# Full Pipline of Brain class
brain = Brain(df=df,
target='target', # Target column name
model_name='RFC', # Model name ('RFC', 'XGBC', etc.)
task='classification', # Task type: 'classification' or 'regression'
# Scaling and PCA
scale=True, # Apply scaling to numerical columns (True) or disable scaling (False)
pca=False, # Apply PCA (True) or not (False)
pca_comp=0.9, # PCA components to retain (if pca=True), default 90% variance
# Data splitting and CV
test_size=0.2, # Test set size (0.2 = 20% test)
cv=5, # Number of cross-validation folds
# Missing values handling
miss=True, # Show missing values summary (True) or skip (False)
ynan=False, # Drop rows with NaN in target (True) or keep (False)
# Columns to process
numerical_cols=[], # Manually specify numerical columns (empty = auto-detect)
categorical_cols=[], # Manually specify categorical columns (empty = auto-detect)
ordinal_cols={}, # Dict of ordinal columns: {column: [order]}
drop_cols=(), # Columns to drop before processing
# Encoding and Imputation
categorical_encoding='onehot', # 'onehot' or 'label' encoding for categorical features
ordinal_encoding=False, # Apply ordinal encoding (True) or skip (False)
numerical_impute_strategy='mean', # Imputation strategy for numerical columns
categorical_impute_strategy='most_frequent', # Imputation strategy for categorical
ordinal_impute_strategy='most_frequent', # Imputation strategy for ordinal
categorical_fill_value=None, # Fill value for categorical imputation
ordinal_fill_value=None, # Fill value for ordinal imputation
# Display options
showx=True, # Display processed X_train/X_test (True) or skip (False)
summary=True, # Display data summary (True) or skip (False)
objvalue=True, # Display value counts of categorical columns (True/False)
xtype=True, # Display column data types (True/False)
# Duplicate handling
drop_duplicates=False, # False = keep duplicates, True = drop first occurrence, 'all' = drop all duplicates
# Target encoding
yencode=None, # 'encode' = LabelEncode target, 'bin' = Binarize, None = no encoding
# Grid Search / Hyperparameter tuning
grid_search=None, # None = no tuning, 'cv' = GridSearchCV, 'rand' = RandomizedSearchCV
voting='soft', # Voting method if using ensemble voting ('soft' or 'hard')
voteclass=[], # List of classifiers for voting model
pa=0, # Grid search plot: index of hyperparameter to visualize (0 = first param)
# Column value filtering
dropc=False, # Drop rows based on column values (True/False)
column_value={}, # Dict of {column_name: values_to_drop} if dropc=True
# Ordinal detection
ord_threshold=None, # Auto-detect ordinal columns if unique values <= threshold (None disables)
ordname=(), # Tuple of ordinal column names to treat manually
# Conversion thresholds for object columns
typeval=80, # % threshold: convert object column to numeric if >= this %
convrate=80, # % threshold: if numeric, convert to int if >= this %, else float
preprocessed_out=False # Return preprocessed dataset (True) or train model (False)
)
# Build, preprocess, and train the model
brain.build()
Convert Function (Manual Data Preprocessing)
from sklearn.datasets import load_iris
from brainsbuildcode import Convert, Brain
import seaborn as sns
import pandas as pd
# Load dataset
df = sns.load_dataset("titanic")
# Define target
target = 'survived'
# Step 1: Apply conversion
converter = Convert(df, target)
X, y, ncol, ocol, ordinal_cols = converter.apply()
# Now you can process `X`, `y`, `ncol`, `ocol`, `ordinal_cols` manually, or pass them back to `Brain`
Pass Convert to Brain Manually
from sklearn.datasets import load_iris
from brainsbuildcode import Convert, Brain
import seaborn as sns
import pandas as pd
# Load dataset
df = sns.load_dataset("titanic")
# Define target
target = 'survived'
# Step 1: Apply conversion
converter = Convert(df, target)
X, y, ncol, ocol, ordinal_cols = converter.apply()
# Step 2: Instantiate Brain with converted column info
best_model = Brain(
df=df,
target=target,
model_name='RFC',
grid_search=None,
drop_duplicates=True,
numerical_cols=ncol, # Pass numerical columns from Convert
categorical_cols=ocol, # Pass categorical columns from Convert
ordinal_cols=ordinal_cols # Pass ordinal columns from Convert
)
# Step 3: Build the model
best_model.build()
Models Names
| Model_Name | Model Name | Task |
|---|---|---|
| LR | LogisticRegression | classification |
| RFC | RandomForestClassifier | classification |
| XGBC | XGBClassifier | classification |
| KNNC | KNeighborsClassifier | classification |
| DTC | DecisionTreeClassifier | classification |
| SVC | SVC | classification |
| MLPC | MLPClassifier | classification |
| ADAC | AdaBoostClassifier | classification |
| GBC | GradientBoostingClassifier | classification |
| BC | BaggingClassifier | classification |
| NBC | BernoulliNB | classification |
| Linear | LinearRegression | regression |
| RFR | RandomForestRegressor | regression |
| XGBR | XGBRegressor | regression |
| KNNR | KNeighborsRegressor | regression |
| DTR | DecisionTreeRegressor | regression |
| SVR | SVR | regression |
| MLPR | MLPRegressor | regression |
| ADAR | AdaBoostRegressor | regression |
| GBR | GradientBoostingRegressor | regression |
| BR | BaggingRegressor | regression |
| NBR | BayesianRidge | regression |
| LRmulti | LogisticRegression | multi-class classification |
| RFmulti | RandomForestClassifier | multi-class classification |
| XGBmulti | XGBClassifier | multi-class classification |
| KNNmulti | KNeighborsClassifier | multi-class classification |
| DTmulti | DecisionTreeClassifier | multi-class classification |
| SVCmulti | SVC | multi-class classification |
| MLPCmulti | MLPClassifier | multi-class classification |
| ADAmulti | AdaBoostClassifier | multi-class classification |
| GBmulti | GradientBoostingClassifier | multi-class classification |
| BCmulti | BaggingClassifier | multi-class classification |
| NBmulti | ComplementNB | multi-class classification |
| vote | VotingClassifier | Ensamble |
💡 Key Features
-
Automatic Detection of numerical, categorical, and ordinal features.
-
Missing Value Handling with customizable strategies.
-
Feature Scaling & PCA Support.
-
Flexible Encoding: One-hot, label, ordinal.
-
Multiple Models Supported: Random Forest, XGBoost, Logistic Regression, SVC, etc.
-
Voting Classifiers & Ensemble Models.
-
Hyperparameter Optimization: Grid Search & Randomized Search.
-
Detailed Evaluation Metrics & Visualizations.
🔗 License
This project is licensed under the MIT License.
🛠️ Contribution
Feel free to contribute, suggest features, or report issues via pull requests and the issues section!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file brainsbuildcode-1.0.3.tar.gz.
File metadata
- Download URL: brainsbuildcode-1.0.3.tar.gz
- Upload date:
- Size: 18.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78dbf799e27606b078cc67d865bef1a2ac2b4a16c1d8ca6b54c045e643251510
|
|
| MD5 |
b0fc0b4dc836d034c8d5857183244456
|
|
| BLAKE2b-256 |
c429b6562e0371f70c7e9c0a9664d3afd957a3386944d3de8906bbf58db09b32
|
File details
Details for the file brainsbuildcode-1.0.3-py3-none-any.whl.
File metadata
- Download URL: brainsbuildcode-1.0.3-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b082175aa751306183dae7bbad9163948d086188c06e4d325881bf8ecf386edd
|
|
| MD5 |
75c0fb36b06cbbf67a2e4161080691a6
|
|
| BLAKE2b-256 |
5bfbce5c3be317b7d71aa2074e42395dd762957e1de4d2342f722a34974f1296
|