A simple AutoML tool for small datasets with useful helper functions

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Education
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.11
Topic
- Software Development :: Libraries

Project description

ZipML

ZipML is a lightweight AutoML library designed for small datasets, offering essential helper functions like train-test splitting, model comparison, and confusion matrix generation.

Features

Automated Model Training: Automatically train and compare machine learning models on your dataset.
Helper Functions:
- Train-test split functionality for easy data management.
- Confusion matrix generation and the ability to save it as a PNG.
- Custom logging features for better tracking of your model's performance.
Model Comparison: Compare the performance of different models with ease, providing metrics and visual feedback.
CLI Support: Run machine learning tasks directly from the command line.
Extensible: Add your own models and customize workflows as needed.
Visualization Tools: Includes tools for visualizing model performance metrics, helping to understand model behavior better.
Hyperparameter Tuning: Support for hyperparameter tuning to optimize model performance.
Data Preprocessing: Built-in data preprocessing steps to handle missing values, scaling, and encoding.

Installation

Install the package via pip:

pip install zipml

Alternatively, clone the repository:

git clone https://github.com/abdozmantar/zipml.git
cd zipml
pip install .

Usage

Example Usage with Code

Here's a practical example of how to use ZipML:

import pandas as pd
from zipml.model import analyze_model_predictions
from zipml.model import calculate_model_results
from zipml.visualization import save_and_plot_confusion_matrix
from zipml.data import split_data
from zipml import compare_models
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression


# Sample dataset
data = {
    'feature_1': [0.517, 0.648, 0.105, 0.331, 0.781, 0.026, 0.048],
    'feature_2': [0.202, 0.425, 0.643, 0.721, 0.646, 0.827, 0.303],
    'feature_3': [0.897, 0.579, 0.014, 0.167, 0.015, 0.358, 0.744],
    'feature_4': [0.457, 0.856, 0.376, 0.527, 0.648, 0.534, 0.047],
    'feature_5': [0.046, 0.118, 0.222, 0.001, 0.969, 0.239, 0.203],
    'target': [0, 1, 1, 1, 1, 1, 0]
}

# Creating DataFrame
df = pd.DataFrame(data)

# Splitting data into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split the data into training and test sets
X_train, X_test, y_train, y_test = split_data(X, y)

# Define models
models = [
    RandomForestClassifier(),
    LogisticRegression(),
    GradientBoostingClassifier()
]

# Compare models and select the best one
best_model, performance = compare_models(models, X_train, X_test, y_train, y_test)
print(f"Best model: {best_model} with performance: {performance}")

# Calculate performance metrics for the best model
best_model_metrics = calculate_model_results(y_test, best_model.predict(X_test))

# Analyze model predictions
val_df, most_wrong = analyze_model_predictions(best_model, X_test, y_test)

# Save and plot confusion matrix
save_and_plot_confusion_matrix(y_test, best_model.predict(X_test), save_path="confusion_matrix.png")

CLI Usage

You can run ZipML from the command line using the following commands:

Train a Single Model

zipml --train train.csv --test test.csv --model randomforest --result results.json

--train: Path to the training dataset CSV file.
--test: Path to the testing dataset CSV file.
--model: Name of the model to be trained (e.g., randomforest, logisticregression, gradientboosting).
--result: Path to the JSON file where results will be saved.

Compare Multiple Models

zipml --train train.csv --test test.csv --compare --compare_models randomforest svc knn --result results.json

--compare: A flag to indicate multiple model comparison.
--compare_models: A list of models to compare (e.g., randomforest, logisticregression, gradientboosting).
--result: Path to the JSON file where comparison results will be saved.

Load a Pre-trained Model and Make Predictions

zipml --load_model trained_model.pkl --test test.csv --result predictions.json

--load_model: Path to the saved model file.
--test: Path to the testing dataset CSV file.
--result: Path to the JSON file where predictions will be saved.

Save the Trained Model

To save the trained model after training:

zipml --train train.csv --test test.csv --model randomforest --save_model trained_model.pkl

--result: Path to the file where the trained model will be saved.

Output

The output of training and comparison commands will include various performance metrics such as accuracy, precision, recall, and F1 score.
Results will be saved in JSON format, making them easy to review and analyze.

Dependencies

Python 3.6+
Pandas
Scikit-learn
Matplotlib
Seaborn

Contributing

Fork the repository.
Create your feature branch (git checkout -b feature/foo).
Commit your changes (git commit -am 'Add some foo').
Push to the branch (git push origin feature/foo).
Open a pull request.

Author

Abdullah OZMANTAR GitHub: @abdozmantar

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Education
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.11
Topic
- Software Development :: Libraries

Release history Release notifications | RSS feed

This version

0.2.5

Sep 28, 2024

0.2.4

Sep 27, 2024

0.2.3

Sep 25, 2024

0.2.2

Sep 25, 2024

0.2.1

Sep 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zipml-0.2.5.tar.gz (17.8 kB view details)

Uploaded Sep 28, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zipml-0.2.5-py3-none-any.whl (26.3 kB view details)

Uploaded Sep 28, 2024 Python 3

File details

Details for the file zipml-0.2.5.tar.gz.

File metadata

Download URL: zipml-0.2.5.tar.gz
Upload date: Sep 28, 2024
Size: 17.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for zipml-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`daa33a7291173b5eb800dc520daf6f9cbda76ad61f53d6c0f3d822559d9ff418`
MD5	`2f21e308112502bae6cbea69982c7251`
BLAKE2b-256	`18736244f1991e56fe6555fe17d40c95fa2260abbf829df35016eb4fb5992a68`

See more details on using hashes here.

File details

Details for the file zipml-0.2.5-py3-none-any.whl.

File metadata

Download URL: zipml-0.2.5-py3-none-any.whl
Upload date: Sep 28, 2024
Size: 26.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for zipml-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`946f556ed3c7afa0aba1cb66fce20acf6c22b44558f81f40109f87e910eba648`
MD5	`002c6e3975d3cc4cdd2f90ee5863f129`
BLAKE2b-256	`331099de45ff1294047fac28f873207029fa7d63ecaae746791d0ad1b60342b6`

See more details on using hashes here.

zipml 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ZipML

Features

Installation

Usage

Example Usage with Code

CLI Usage

Train a Single Model

Compare Multiple Models

Load a Pre-trained Model and Make Predictions

Save the Trained Model

Output

Dependencies

Contributing

Author

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes