Skip to main content

A machine learning pipeline for preprocessing, model selection, and evaluation.

Project description

🧠 Brains Build Code - Automated Machine Learning Pipeline

Brains Build Code is an automated machine learning pipeline designed to simplify the end-to-end machine learning workflow. It handles:

  • Data preprocessing
  • Feature engineering
  • Model selection
  • Hyperparameter tuning
  • Model evaluation

Built to save you time, reduce boilerplate, and accelerate experimentation.

Installation

From PyPI

pip install brainsbuildcode

Directly from GitHub

pip install git+https://github.com/achelousace/brainsbuildcode.git

📖 Usage

Fast Build Example

from brainsbuildcode import Brain
from sklearn.datasets import load_breast_cancer
import pandas as pd

# Load dataset
data = load_breast_cancer(as_frame=True)
df = data.frame

# Instantiate and build the model
best_model = Brain(df, target='target', model_name='RFC', grid_search=None)
best_model.build()

Alternative (Chainable Call)

from brainsbuildcode import Brain
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer(as_frame=True)
df = data.frame

# Instantiate and immediately build
best_model = Brain(df, target='target', model_name='RFC', grid_search='cv').build()

Full Pipline Example

from sklearn.datasets import load_iris
import pandas as pd
from brainsbuildcode import Convert, Brain

# Load dataset
data = load_iris(as_frame=True)
df = data.frame
df['target'] = data.target


# Full Pipline of Brain class
brain = Brain(df=df,
              target='target',                        # Target column name
              model_name='RFC',                       # Model name ('RFC', 'XGBC', etc.)
              task='classification',                  # Task type: 'classification' or 'regression'
              
              # Scaling and PCA
              scale=True,                             # Apply scaling to numerical columns (True) or disable scaling (False)
              pca=False,                              # Apply PCA (True) or not (False)
              pca_comp=0.9,                           # PCA components to retain (if pca=True), default 90% variance

              # Data splitting and CV
              test_size=0.2,                          # Test set size (0.2 = 20% test)
              cv=5,                                   # Number of cross-validation folds
              
              # Missing values handling
              miss=True,                              # Show missing values summary (True) or skip (False)
              ynan=False,                             # Drop rows with NaN in target (True) or keep (False)
              
              # Columns to process
              numerical_cols=[],                      # Manually specify numerical columns (empty = auto-detect)
              categorical_cols=[],                    # Manually specify categorical columns (empty = auto-detect)
              ordinal_cols={},                        # Dict of ordinal columns: {column: [order]}
              drop_cols=(),                           # Columns to drop before processing
              
              # Encoding and Imputation
              categorical_encoding='onehot',          # 'onehot' or 'label' encoding for categorical features
              ordinal_encoding=False,                 # Apply ordinal encoding (True) or skip (False)
              numerical_impute_strategy='mean',       # Imputation strategy for numerical columns
              categorical_impute_strategy='most_frequent',  # Imputation strategy for categorical
              ordinal_impute_strategy='most_frequent',      # Imputation strategy for ordinal
              categorical_fill_value=None,            # Fill value for categorical imputation
              ordinal_fill_value=None,                # Fill value for ordinal imputation
              
              # Display options
              showx=True,                            # Display processed X_train/X_test (True) or skip (False)
              summary=True,                          # Display data summary (True) or skip (False)
              objvalue=True,                         # Display value counts of categorical columns (True/False)
              xtype=True,                            # Display column data types (True/False)
              
              # Duplicate handling
              drop_duplicates=False,    # False = keep duplicates, True = drop first occurrence, 'all' = drop all duplicates
              
              # Target encoding
              yencode=None,                           # 'encode' = LabelEncode target, 'bin' = Binarize, None = no encoding
              
              # Grid Search / Hyperparameter tuning
              grid_search=None,                       # None = no tuning, 'cv' = GridSearchCV, 'rand' = RandomizedSearchCV
              voting='soft',                          # Voting method if using ensemble voting ('soft' or 'hard')
              voteclass=[],                           # List of classifiers for voting model
              pa=0,                                   # Grid search plot: index of hyperparameter to visualize (0 = first param)

              # Column value filtering
              dropc=False,                            # Drop rows based on column values (True/False)
              column_value={},                        # Dict of {column_name: values_to_drop} if dropc=True

              # Ordinal detection
              ord_threshold=None,                     # Auto-detect ordinal columns if unique values <= threshold (None disables)
              ordname=(),                             # Tuple of ordinal column names to treat manually
              
              # Conversion thresholds for object columns
              typeval=80,                             # % threshold: convert object column to numeric if >= this %
              convrate=80,                            # % threshold: if numeric, convert to int if >= this %, else float

              preprocessed_out=False                  # Return preprocessed dataset (True) or train model (False)
             )

# Build, preprocess, and train the model
brain.build()

Convert Function (Manual Data Preprocessing)

from sklearn.datasets import load_iris
from brainsbuildcode import Convert, Brain
import seaborn as sns
import pandas as pd

# Load dataset
df = sns.load_dataset("titanic")

# Define target
target = 'survived'

# Step 1: Apply conversion
converter = Convert(df, target)
X, y, ncol, ocol, ordinal_cols = converter.apply()

# Now you can process `X`, `y`, `ncol`, `ocol`, `ordinal_cols` manually, or pass them back to `Brain`

Pass Convert to Brain Manually

from sklearn.datasets import load_iris
from brainsbuildcode import Convert, Brain
import seaborn as sns
import pandas as pd

# Load dataset
df = sns.load_dataset("titanic")

# Define target
target = 'survived'

# Step 1: Apply conversion
converter = Convert(df, target)
X, y, ncol, ocol, ordinal_cols = converter.apply()

# Step 2: Instantiate Brain with converted column info
best_model = Brain(
    df=df,
    target=target,
    model_name='RFC',
    grid_search=None,
    drop_duplicates=True,
    numerical_cols=ncol,        # Pass numerical columns from Convert
    categorical_cols=ocol,      # Pass categorical columns from Convert
    ordinal_cols=ordinal_cols   # Pass ordinal columns from Convert
)

# Step 3: Build the model
best_model.build()

Models Names

Model_Name Model Name Task
LR LogisticRegression classification
RFC RandomForestClassifier classification
XGBC XGBClassifier classification
KNNC KNeighborsClassifier classification
DTC DecisionTreeClassifier classification
SVC SVC classification
MLPC MLPClassifier classification
ADAC AdaBoostClassifier classification
GBC GradientBoostingClassifier classification
BC BaggingClassifier classification
NBC BernoulliNB classification
Linear LinearRegression regression
RFR RandomForestRegressor regression
XGBR XGBRegressor regression
KNNR KNeighborsRegressor regression
DTR DecisionTreeRegressor regression
SVR SVR regression
MLPR MLPRegressor regression
ADAR AdaBoostRegressor regression
GBR GradientBoostingRegressor regression
BR BaggingRegressor regression
NBR BayesianRidge regression
LRmulti LogisticRegression multi-class classification
RFmulti RandomForestClassifier multi-class classification
XGBmulti XGBClassifier multi-class classification
KNNmulti KNeighborsClassifier multi-class classification
DTmulti DecisionTreeClassifier multi-class classification
SVCmulti SVC multi-class classification
MLPCmulti MLPClassifier multi-class classification
ADAmulti AdaBoostClassifier multi-class classification
GBmulti GradientBoostingClassifier multi-class classification
BCmulti BaggingClassifier multi-class classification
NBmulti ComplementNB multi-class classification
vote VotingClassifier Ensamble

💡 Key Features

  • Automatic Detection of numerical, categorical, and ordinal features.

  • Missing Value Handling with customizable strategies.

  • Feature Scaling & PCA Support.

  • Flexible Encoding: One-hot, label, ordinal.

  • Multiple Models Supported: Random Forest, XGBoost, Logistic Regression, SVC, etc.

  • Voting Classifiers & Ensemble Models.

  • Hyperparameter Optimization: Grid Search & Randomized Search.

  • Detailed Evaluation Metrics & Visualizations.

🔗 License

This project is licensed under the MIT License.

🛠️ Contribution

Feel free to contribute, suggest features, or report issues via pull requests and the issues section!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

brainsbuildcode-1.0.3.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

brainsbuildcode-1.0.3-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file brainsbuildcode-1.0.3.tar.gz.

File metadata

  • Download URL: brainsbuildcode-1.0.3.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for brainsbuildcode-1.0.3.tar.gz
Algorithm Hash digest
SHA256 78dbf799e27606b078cc67d865bef1a2ac2b4a16c1d8ca6b54c045e643251510
MD5 b0fc0b4dc836d034c8d5857183244456
BLAKE2b-256 c429b6562e0371f70c7e9c0a9664d3afd957a3386944d3de8906bbf58db09b32

See more details on using hashes here.

File details

Details for the file brainsbuildcode-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for brainsbuildcode-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b082175aa751306183dae7bbad9163948d086188c06e4d325881bf8ecf386edd
MD5 75c0fb36b06cbbf67a2e4161080691a6
BLAKE2b-256 5bfbce5c3be317b7d71aa2074e42395dd762957e1de4d2342f722a34974f1296

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page