A package for hyperparameter tuning of models for cross-sectional data.
Project description
AutoHPSearch
A Python package for automatic hyperparameter tuning of machine learning models for cross-sectional data. AutoHPSearch simplifies the process of hyperparameter optimization for various machine learning models by providing a unified interface to tune hyperparameters across multiple model types.
AutoHPSearch also contains functionality for full end-to-end pipelines that include cleaning, parameter search, model evaluation, and automated production of data reports in markdown format (example here).
The search space is navigated with grid, random, or bayesian search. Random search is faster but provides a less comprehensive coverage of the search space. CUDA-enabled computing for neural network implementations is included.
Installation
pip install autohpsearch
Or install directly from the repository:
git clone https://github.com/rudyvdbrink/autohpsearch.git
cd autohpsearch
pip install -e .
To enable CUDA you need to manually install the right version of torch+cuda depending on your GPU and system.
Usage
Examples Scripts
- Classification - Demonstrates simple binary classification
- Regression - Simple regression example
- Neural Network Usage - Syntax examples for using scikit-learn compatible neural networks
- Iris Example - Examples of both classification and regression solving using real data
- Pipeline Example - An example of a full automated end-to-end pipeline
Basic Example
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from autohpsearch.search.hptuing import tune_hyperparameters, generate_hypergrid
# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Generate hyperparameter grid for multiple models
hypergrid = generate_hypergrid(['logistic_regression', 'random_forest_clf', 'xgboost_clf'])
# Tune hyperparameters
results = tune_hyperparameters(
X_train, y_train,
X_test, y_test,
hypergrid=hypergrid,
scoring='balanced_accuracy',
search_type='random',
cv=5
)
# Access best model and results
best_model = results['best_model'] # The winning model
optimal_params = results['optimal_params'] # Best paramters for each model
performance_results = results['results'] # cross-validation and test score table
print(f"Best model: {type(best_model).__name__}")
print(f"Optimal parameters: {optimal_params}")
print(f"Results summary:\n{performance_results}")
Using Neural Network Models
from autohpsearch.models.nn import AutoHPSearchClassifier
# Create a neural network classifier with custom parameters
nn_clf = AutoHPSearchClassifier(
hidden_layers=(64, 32),
activation='relu',
dropout_rate=0.2,
learning_rate=0.001,
optimizer='adam',
batch_size=32,
epochs=100
)
# Train the model
nn_clf.fit(X_train_scaled, y_train)
# Make predictions
y_pred = nn_clf.predict(X_test_scaled)
Creating and Fitting a Full End-To-End Automatic Pipeline
# Import requirements
from autohpsearch.datasets.dataloaders import fetch_housing
from autohpsearch.pipeline.pipeline import AutoMLPipeline
# Load an example dataset
X_train, X_test, y_train, y_test = fetch_housing()
# Fit the pipeline: this will clean the data run hyperparameter search, train the model, and evaluate it
pipeline = AutoMLPipeline(task_type='regression')
pipeline.fit(X_train=X_train,X_test=X_test,y_train=y_train,y_test=y_test)
Automated Reports on Data Distributions And Model Performance
AutoHPsearch can generate a report on the data that includes plots of feature distributions before and after data cleaning, and statistics on requested properties of the data such as the number of outliers etc. It will also include plots for the best performing model to examine its performance on the test set. You can find an example report here. To create a report, simply run:
# Write a report in markdown format
pipeline.generate_data_report()
Available Models
AutoHPSearch supports the following model types:
Classification Models
- logistic_regression: Logistic regression classifier (including L1 / L2 / elastic net regularization)
- random_forest_clf: Random forest classifier
- gradient_boosting_clf: Gradient boosting classifier
- svm_clf: Support vector machine classifier
- knn_clf: K-nearest neighbors classifier
- xgboost_clf: XGBoost classifier
- dnn_clf: Deep neural network classifier
Regression Models
- linear_regression: Linear regression
- ridge: Ridge regression
- lasso: Lasso regression
- elastic_net: Elastic Net regression
- random_forest_reg: Random forest regressor
- gradient_boosting_reg: Gradient boosting regressor
- svr: Support vector regression
- knn_reg: K-nearest neighbors regressor
- xgboost_reg: XGBoost regressor
- dnn_reg: Deep neural network regressor
Hyperparameter Tuning
The generate_hypergrid() function creates a comprehensive grid of hyperparameters for each model type. You can:
- Generate grids for all supported models:
generate_hypergrid(task_type='classification') - Generate a grid for a specific model:
generate_hypergrid('random_forest_clf')orgenerate_hypergrid('random_forest_reg', task_type='regression') - Generate grids for multiple models:
generate_hypergrid(['logistic_regression', 'xgboost_clf'])
The tune_hyperparameters() function performs grid search cross-validation on the specified models and returns:
- The best overall model
- Optimal parameters for each model
- Performance metrics for each model
Neural Network Models
AutoHPSearch includes custom neural network implementations that are compatible with scikit-learn:
AutoHPSearchClassifier: For classification tasksAutoHPSearchRegressor: For regression tasks
These models provide flexibility in architecture design and training configuration while maintaining the familiar scikit-learn API.
Author
Rudy van den Brink
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autohpsearch-0.4.0.tar.gz.
File metadata
- Download URL: autohpsearch-0.4.0.tar.gz
- Upload date:
- Size: 49.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a8eabeec96a2b1dd46b324cd2339c8c59751834408a4030a16612a8b88ad327
|
|
| MD5 |
261a4ee581daff2899923b8538c22782
|
|
| BLAKE2b-256 |
545d06265ccd0aed54195ce0c50f72fc46dd1f87c2d0f939e772edc1fa5a4f82
|
File details
Details for the file autohpsearch-0.4.0-py3-none-any.whl.
File metadata
- Download URL: autohpsearch-0.4.0-py3-none-any.whl
- Upload date:
- Size: 51.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56bb12cecc26950ce27600768482ab15af344ffe3eba6b81a6482f208a18cafa
|
|
| MD5 |
cea31fc5e9e7875c5b4a93c85068748a
|
|
| BLAKE2b-256 |
528878a0e34b967375abf96c5f085170b6ebad4d2737e02950754bd7f9717fa3
|