Skip to main content

No project description provided

Project description

Project for Automated Learning MAchine

Maintenance pre-commit pytest PyPI version code coverage

This library aims at providing tools for an automatic machine learning approach. As many tools already exist to establish one or the other component of an AutoML approach, the idea of this library is to provide a structure rather than to implement a complete service. In this library, a broad definition of AutoML is used : it covers the optimization of hyperparameters, the historization of models, the analysis of performances etc. In short, any element that can be replicated and that must, in most cases, be included in the analysis results of the models. Also, thanks to the use of components, this library is designed to be modular and allows the user to add his own analyses.
It therefore contains the following elements

  1. A vanilla approach described below (in basic usage section) and in the notebooks classification and regression. In this approach, the users define a Project, which can then be passed to either a ModelSelector to find the best model for this project, or to a ModelEvaluation to study more in depth the behavior of a given model on this project.

  2. A collection of components that can be added to enrich analysis.

Install it with

python -m pip install palma

Documentation

Access the full documentation here.

Basic usage

  1. Start your project

To start using the library, use the project class

import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import ShuffleSplit
from palma import Project

X, y = make_classification(n_informative=2, n_features=100)
X, y = pd.DataFrame(X), pd.Series(y).astype(bool)

project = Project(problem="classification", project_name="default")

project.start(
    X, y,
    splitter=ShuffleSplit(n_splits=10, random_state=42),
)

The instantiation defines the type of problem and the start method will set what is needed to carry out ML project :

  • A testing strategy (argument splitter). That will define train and test instances. Note that we use cross validator from sklearn to do that. In the optimisation of hyper-parameters, a train test split will be operated, in this case, the first split will be used. This implies for instance that if you want 80/20 splitting method that shuffle the dataset, you should use
splitter = model_selection.ShuffleSplit(n_splits=5, random_state=42)
  • Training data X and target y

This initialization is done in two steps to allow user to add optional Components to the project before its start.

  1. Run hyper-optimisation

The hyper-optimisation process will look for the best model in pool of models that tend to perform well on various problem. For this specific task we make use of FLAML module. After hyper parametrisation, the metric to track can be computed

from palma import ModelSelector

ms = ModelSelector(engine="FlamlOptimizer",
                   engine_parameters=dict(time_budget=30))
ms.start(project)
print(ms.best_model_)
  1. Tailoring and analysing your estimator
from palma import ModelEvaluation
from sklearn.ensemble import RandomForestClassifier

# Use your own
model = ModelEvaluation(estimator=RandomForestClassifier())
model.fit(project)

# Get the optimized estimator
model = ModelEvaluation(estimator=ms.best_model_)
model.fit(project)

Contributing

You are very welcome to contribute to the project, by requesting features, pointing out new tools that can be added as component, by identifying issues and creating new features. Development guidelines will be detailed in near future.

  • Fork the repository
  • Clone your forked repository git clone https://github.com/$USER/palma.git
  • Test using pytest pip install pytest; pytest tests/
  • Submit you work with a pull request.

Authors

Eurobios Mews Labs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

palma-2024.1.3.tar.gz (55.7 kB view details)

Uploaded Source

Built Distribution

palma-2024.1.3-py3-none-any.whl (66.7 kB view details)

Uploaded Python 3

File details

Details for the file palma-2024.1.3.tar.gz.

File metadata

  • Download URL: palma-2024.1.3.tar.gz
  • Upload date:
  • Size: 55.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for palma-2024.1.3.tar.gz
Algorithm Hash digest
SHA256 7c335756c49126746cd3498a49fa1de0164140338f12fd1aa5dc0d247e1755eb
MD5 4b9e0768d5552714d36e88b31fd2423d
BLAKE2b-256 e0212cb6a7b821d149880c05cd7a9a658aafcb7bf8f4b2278f52e11fc748b0e6

See more details on using hashes here.

File details

Details for the file palma-2024.1.3-py3-none-any.whl.

File metadata

  • Download URL: palma-2024.1.3-py3-none-any.whl
  • Upload date:
  • Size: 66.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for palma-2024.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 cb09fe6be21057b6cf95178e5a4a2e4204ffa63935d7db0d1fb367f0de0eb6ec
MD5 396936077837d542569f134c6bb63396
BLAKE2b-256 5ea82dff371b67c081af8bfd839e523d317ab2572fbfb53fd10d4c65b2f07660

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page