No project description provided
Project description
Project for Automated Learning MAchine
This library aims at providing tools for an automatic machine learning approach.
As many tools already exist to establish one or the other component of an AutoML
approach, the idea of this library is to provide a structure rather than to
implement a complete service.
In this library, a broad definition of AutoML is used : it covers the
optimization of hyperparameters, the historization of models, the analysis
of performances etc. In short, any element that can be replicated and that must,
in most cases, be included in the analysis results of the models.
Also, thanks to the use of components, this
library is designed to be modular and allows the user to add his own
analyses.
It therefore contains the following elements
-
A vanilla approach described below (in basic usage section) and in the notebooks classification and regression. In this approach, the users define a
Project
, which can then be passed to either aModelSelector
to find the best model for this project, or to aModelEvaluation
to study more in depth the behavior of a given model on this project. -
A collection of components that can be added to enrich analysis.
Install it with
python -m pip install palma
Documentation
Access the full documentation here.
Basic usage
- Start your project
To start using the library, use the project class
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import ShuffleSplit
from palma import Project
X, y = make_classification(n_informative=2, n_features=100)
X, y = pd.DataFrame(X), pd.Series(y).astype(bool)
project = Project(problem="classification", project_name="default")
project.start(
X, y,
splitter=ShuffleSplit(n_splits=10, random_state=42),
)
The instantiation defines the type of problem and the start
method will set
what is needed to carry out ML project :
- A testing strategy (argument
splitter
). That will define train and test instances. Note that we use cross validator from sklearn to do that. In the optimisation of hyper-parameters, a train test split will be operated, in this case, the first split will be used. This implies for instance that if you want 80/20 splitting method that shuffle the dataset, you should use
splitter = model_selection.ShuffleSplit(n_splits=5, random_state=42)
- Training data
X
and targety
This initialization is done in two steps to allow user to add optional
Component
s to the project before its start.
- Run hyper-optimisation
The hyper-optimisation process will look for the best model in pool of models that tend to perform well on various problem. For this specific task we make use of FLAML module. After hyper parametrisation, the metric to track can be computed
from palma import ModelSelector
ms = ModelSelector(engine="FlamlOptimizer",
engine_parameters=dict(time_budget=30))
ms.start(project)
print(ms.best_model_)
- Tailoring and analysing your estimator
from palma import ModelEvaluation
from sklearn.ensemble import RandomForestClassifier
# Use your own
model = ModelEvaluation(estimator=RandomForestClassifier())
model.fit(project)
# Get the optimized estimator
model = ModelEvaluation(estimator=ms.best_model_)
model.fit(project)
Contributing
You are very welcome to contribute to the project, by requesting features, pointing out new tools that can be added as component, by identifying issues and creating new features. Development guidelines will be detailed in near future.
- Fork the repository
- Clone your forked repository
git clone https://github.com/$USER/palma.git
- Test using pytest
pip install pytest; pytest tests/
- Submit you work with a pull request.
Authors
Eurobios Mews Labs
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file palma-2024.1.3.tar.gz
.
File metadata
- Download URL: palma-2024.1.3.tar.gz
- Upload date:
- Size: 55.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c335756c49126746cd3498a49fa1de0164140338f12fd1aa5dc0d247e1755eb |
|
MD5 | 4b9e0768d5552714d36e88b31fd2423d |
|
BLAKE2b-256 | e0212cb6a7b821d149880c05cd7a9a658aafcb7bf8f4b2278f52e11fc748b0e6 |
File details
Details for the file palma-2024.1.3-py3-none-any.whl
.
File metadata
- Download URL: palma-2024.1.3-py3-none-any.whl
- Upload date:
- Size: 66.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb09fe6be21057b6cf95178e5a4a2e4204ffa63935d7db0d1fb367f0de0eb6ec |
|
MD5 | 396936077837d542569f134c6bb63396 |
|
BLAKE2b-256 | 5ea82dff371b67c081af8bfd839e523d317ab2572fbfb53fd10d4c65b2f07660 |