Framework for creating, running and validation of ML models on tabular data
Project description
DreamML - Self Machine Learning ❤️
The next stage of evalution DS-Template
About the DreamML
DreamML is a machine learning framework aimed at the industrial process. The main task is to choose a simple model, taking into account the balance of complexity, quality and metrics. We also suggest reviewing the quality of the models in special development reports, and for some tasks, a validation report created using the central bank's methodology.
*This is the first cycle of the project's release into open source, then we plan to publish more materials and improve the framework.
Installation
Step 1: Install Anaconda or Python 3.8
Step 2: Create environment
- Anaconda
conda create --name dreamml_env python=3.8 - Python 3.8
python -m venv dreamml_env
Step 3: Activate environment
- Anaconda
conda activate dreamml_env - Python
source dreamml_env/bin/activate
Step 4: Go to the dreamml root folder
Step 5: Install dreamml in your environment
pip install -e .
Get started
To develop a model, you can use the notebooks located in the notebooks/1. Model Development
and select the one you need depending on the type of your task.
To validate models, you can use the notebooks located in the notebooks/2. Validate Model
To calibration models, you can use the notebooks located in the notebooks/3. Calibration
How to Use
Information on notebooks for development notebooks/1. Model Development
-
First, you need to determine the pipeline configuration
- For
regression,binary,multiclass,multilabeltasks you can refer to this documentdocs/1_Model_Development_doc.md - For
topic_modelingtask you can refer to this documentdocs/1_Topic_Modeling_doc.md - For
timeserieswith (boosting) task you can refer to this documentdocs/1_TimeSeries_doc.md - For
amtswith (Prophet) task you can refer to this documentdocs/1_AltModeTimeSeries_forecast.md - If your dataset contains text features you should refer to this document
docs/1_NLP_text_classification_doc.md - If you would like to learn more about quality metrics and loss functions, we recommend that you refer to the document
docs/Binary_Classification_Metrics_doc.md
- For
-
You should start building the configuration and preparing the data for modeling
config_storage = ConfigStorage(config=config)
transformer = DataTransformer(config_storage)
data_storage = transformer.transform()
- Next, you should run the simulation pipeline
pipeline = MainPipeline(config_storage=config_storage, data_storage=data_storage)
pipeline.transform()
- For some tasks, you can also use Light Auto M L as a model and calculate out of time potential
lama = add_lama_model(data_storage.get_eval_set(), config_storage)
oot_potential = calculate_oot_metrics(data_storage.get_eval_set(), config_storage)
- You can also start the process of saving simulation artifacts if you need it
saver = pipeline.artifact_saver
models = pipeline.prepared_model_dict
pipeline.oot_potential = oot_potential
models.update(lama)
nb_name = saver.get_notebook_path_and_save()
saver.save_artifacts(
models=models,
other_models=pipeline.other_model_dict,
encoder=transformer.cat_transformer,
ipynb_name=nb_name,
feature_threshold=config_storage.feature_threshold,
)
saver.save_data(data=data_storage.get_eval_set(), dropped_data=data_storage.get_dropped_data())
- At the end, we can generate a development report. By default, it will be saved to the
dreamml/resultsfolder.
get_report(pipeline=pipeline, config_storage=config_storage, data_storage=data_storage, encoder=transformer.cat_transformer)
Authors
| Author | |
|---|---|
| Nikita Buts | nikitabuts2000@gmail.com |
| Alexander Izyurov | halfbrick845@gmail.com |
| Ivan Plotnikov | com.gateway.api@gmail.com |
| Maidari Tsydenov | maidaritsydenov@gmail.com |
| Evgeny Tkachenko | e_t@inbox.ru |
| Ilya Ivanov | morwes4@gmail.com |
| Nikita Varganov | - |
LICENSE
This project is licensed under the Apache License, Version 2.0. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dreamml-3.5.3.tar.gz.
File metadata
- Download URL: dreamml-3.5.3.tar.gz
- Upload date:
- Size: 302.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a0967290ac04022c855a38d4d83d1a957e6011dbcf37939e145765ab98cfdb4
|
|
| MD5 |
8c54616d90a06df5895328ea18a774f5
|
|
| BLAKE2b-256 |
bc5a48a7c0806dd3d5073295c88dc8ffeb6791a502d941d7e330026babf95340
|
File details
Details for the file dreamml-3.5.3-py3-none-any.whl.
File metadata
- Download URL: dreamml-3.5.3-py3-none-any.whl
- Upload date:
- Size: 425.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2c55380ba4cbb666613d0267e2b32c8a511e674eeb05bab09659a6570058e06
|
|
| MD5 |
6027492bb09598c506e2703e40274bd9
|
|
| BLAKE2b-256 |
90bfd312730d99fc7718ce309246f75ba6d89db2b8dba5fa5f18d5677bc8ee19
|