Skip to main content

A package for automated machine learning based on scikit-learn and sklong to tackle the longitudinal machine learning classificationt tasks.

Project description


Auto-Sklong
Auto-Sklong

A specialised Python library for Automated Machine Learning (AutoML) of Longitudinal machine learning classification tasks built upon GAMA

📰 Latest News

  • Bye Bye PDM!: We are now leveraging UV from Astral (alongside Ruff)!

  • Documentation: For a deep dive into Auto-Sklong, check out our official docs.

  • PyPi: The library's latest version is published on PyPi here.

💡 About The Project

Auto-Scikit-Longitudinal, also called Auto-Sklong is an automated machine learning (AutoML) library designed to analyse longitudinal data (Classification tasks focussed as of today) using various search methods. Namely, Bayesian Optimisation via SMAC3, Asynchronous Successive Halving, Evolutionary Algorithms, and Random Search via the General Automated Machine Learning Assistant (GAMA).

Auto-Sklong built upon GAMA, offers a brand-new search space to tackle the Longitudinal Machine Learning classification problems, with a user-friendly interface, similar to the popular Scikit paradigm.

Please for further information, visit the official documentation.

🛠️ Installation

To install Auto-Sklong, take these two easy steps:

  1. Install the latest version of Auto-Sklong:
pip install Auto-Sklong

You could also install different versions of the library by specifying the version number, e.g. pip install Auto-Sklong==0.0.1. Refer to Release Notes

  1. 📦 [MANDATORY] Update the required dependencies (Why? See here)

Auto-Sklong incorporates via Sklong a modified version of Scikit-Learn called Scikit-Lexicographical-Trees, which can be found at this Pypi link.

This revised version guarantees compatibility with the unique features of Scikit-longitudinal. Nevertheless, conflicts may occur with other dependencies in Auto-Sklong that also require Scikit-Learn. Follow these steps to prevent any issues when running your project.

🫵 Simple Setup: Command Line Installation

Say you want to try Auto-Sklong in a very simple environment. Such as without a proper project.toml file (Poetry, PDM, etc). Run the following command:

pip uninstall scikit-learn scikit-lexicographical-trees && pip install scikit-lexicographical-trees
🫵 Project Setup: Using `UV`

Imagine you are managing your project with UV, a powerful and flexible project management tool. Below is an example configuration for integrating UV in your pyproject.toml file.

To ensure smooth operation and avoid dependency conflicts, you can override specific dependencies like Scikit-Learn. Add the following configuration to your pyproject.toml:

[tool.uv]
package = true
override-dependencies = [
    "scikit-learn ; sys_platform == 'never'",
]

This setup ensures that UV will manage your project’s dependencies efficiently, while avoiding conflicts with Scikit-Learn.

🫵 Project Setup: Using `PDM`

Imagine you have a project being managed by PDM, or any other package manager. The example below demonstrates PDM. Nevertheless, the process is similar for Poetry.

Therefore, to prevent dependency conflicts, you can exclude Scikit-Learn by adding the provided configuration to your pyproject.toml file.

[tool.pdm.resolution]
excludes = ["scikit-learn"]

This exclusion ensures Scikit-Lexicographical-Trees (used as Scikit-learn) is used seamlessly within your project.

🚀 What's New Compared to GAMA?

We enhanced @PGijsbers' open-source GAMA initiative by introducing a brand-new search space designed specifically for tackling longitudinal classification problems. This search space is powered by our custom library, Scikit-Longitudinal (Sklong), enabling Combined Algorithm Selection and Hyperparameter Optimization (CASH Optimization).

Unlike GAMA or other existing AutoML libraries, Auto-Sklong offers out-of-the-box support for longitudinal classification tasks—a capability not previously available.

Search Space Viz.:

To better understand our proposed search space, refer to the visualisation below (read from left to right, each step being one new component to a final pipeline candidate configuration):

Search Space Visualization

While GAMA offers some configurability for search spaces, we improved its functionality to better suit our needs. You can find the details of our contributions in the following pull requests:

💻 Developer Notes

For developers looking to contribute, please refer to the Contributing section of GAMA here and Scikit-Longitudinal here.

🛠️ Supported Operating Systems

Auto-Sklong is compatible with the following operating systems:

  • MacOS  (Careful, you may need to force your settings to be under intel x86_64 and not apple silicon if you hold an M-based chip)
  • Linux 🐧
  • On Windows 🪟, you are recommended to run the library within a Docker container under a Linux distribution.

🚀 Getting Started

To perform AutoML on your longitudinal analysis with Auto-Sklong, use the following two-easy-steps.

  • First, load and prepare your dataset using the LongitudinalDataset class of Sklong.

  • Second, use the GamaLongitudinalClassifier class of Auto-Sklong. Following instantiating it set up its hyperparameters or let default, you can apply the popular fit, predict, prodict_proba, methods in the same way that Scikit-learn does, as shown in the example below. It will then automatically search for the best model and hyperparameters for your dataset.

Refer to the documentation for more information on the GamaLongitudinalClassifier class.

from sklearn.metrics import classification_report
from scikit_longitudinal.data_preparation import LongitudinalDataset
from gama.GamaLongitudinalClassifier import GamaLongitudinalClassifier

# Load your longitudinal dataset
dataset = LongitudinalDataset('./stroke.csv')
dataset.load_data_target_train_test_split(
  target_column="class_stroke_wave_4",
)

# Pre-set or manually set your temporal dependencies 
dataset.setup_features_group(input_data="elsa")

# Instantiate the AutoML system
automl = GamaLongitudinalClassifier(
    features_group=dataset.features_group(),
    non_longitudinal_features=dataset.non_longitudinal_features(),
    feature_list_names=dataset.data.columns.tolist(),
)

# Run the AutoML system to find the best model and hyperparameters
model.fit(dataset.X_train, dataset.y_train)

# Predictions and prediction probabilities
label_predictions = automl.predict(X_test)
probability_predictions = automl.predict_proba(X_test)

# Classification report
print(classification_report(y_test, label_predictions))

# Export a reproducible script of the champion model
automl.export_script() 

📝 How to Cite?

Auto-Sklong paper has been accepted to the International Conference on Bioinformatics and Biomedicine (BIBM) 2024 edition. Awaiting for the proceeding to be released. In the meantime, for the repository, utilise the button top right corner of the repository "How to cite?", or open the following citation file: CITATION.cff.

🔐 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_sklong-0.0.4.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Auto_Sklong-0.0.4-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file auto_sklong-0.0.4.tar.gz.

File metadata

  • Download URL: auto_sklong-0.0.4.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.11

File hashes

Hashes for auto_sklong-0.0.4.tar.gz
Algorithm Hash digest
SHA256 b236dea8e5d7f5e6cc8ebceb1f4f460033e9ee51e22605f614cca9914132d23c
MD5 6f38f0344089a6afc669e021cc229b0d
BLAKE2b-256 8dbaf6d6ec1cd98193d8717616dfc8ea4556aa70440ada1c51f148c05b06b932

See more details on using hashes here.

File details

Details for the file Auto_Sklong-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for Auto_Sklong-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ece9ff7ecccf309ec1d56f61af6b91a2dcf30a54516d49862e6bb281e134e386
MD5 83fd76b0a1eba80f8dc370c2ab83f834
BLAKE2b-256 40c8420cafb1786ec44f301ffd4aa559e87170f343620a53a08a6f135ae318d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page