Skip to main content

Framework for creating, running and validation of ML models on tabular data

Project description

DreamML - Self Machine Learning ❤️

The next stage of evalution DS-Template

DreamML_promo

About the DreamML


DreamML is a machine learning framework aimed at the industrial process. The main task is to choose a simple model, taking into account the balance of complexity, quality and metrics. We also suggest reviewing the quality of the models in special development reports, and for some tasks, a validation report created using the central bank's methodology.

*This is the first cycle of the project's release into open source, then we plan to publish more materials and improve the framework.

Installation


Step 1: Install Anaconda or Python 3.8

Step 2: Create environment

  • Anaconda conda create --name dreamml_env python=3.8
  • Python 3.8 python -m venv dreamml_env

Step 3: Activate environment

  • Anaconda conda activate dreamml_env
  • Python source dreamml_env/bin/activate

Step 4: Go to the dreamml root folder

Step 5: Install dreamml in your environment pip install -e .

Get started


To develop a model, you can use the notebooks located in the notebooks/1. Model Development and select the one you need depending on the type of your task.

To validate models, you can use the notebooks located in the notebooks/2. Validate Model

To calibration models, you can use the notebooks located in the notebooks/3. Calibration

How to Use


Information on notebooks for development notebooks/1. Model Development

  1. First, you need to determine the pipeline configuration

    • For regression, binary, multiclass, multilabel tasks you can refer to this document docs/1_Model_Development_doc.md
    • For topic_modeling task you can refer to this document docs/1_Topic_Modeling_doc.md
    • For timeseries with (boosting) task you can refer to this document docs/1_TimeSeries_doc.md
    • For amts with (Prophet) task you can refer to this document docs/1_AltModeTimeSeries_forecast.md
    • If your dataset contains text features you should refer to this document docs/1_NLP_text_classification_doc.md
    • If you would like to learn more about quality metrics and loss functions, we recommend that you refer to the document docs/Binary_Classification_Metrics_doc.md
  2. You should start building the configuration and preparing the data for modeling

config_storage = ConfigStorage(config=config)
transformer = DataTransformer(config_storage)
data_storage = transformer.transform()
  1. Next, you should run the simulation pipeline
pipeline = MainPipeline(config_storage=config_storage, data_storage=data_storage)
pipeline.transform()
  1. For some tasks, you can also use Light Auto M L as a model and calculate out of time potential
lama = add_lama_model(data_storage.get_eval_set(), config_storage)
oot_potential = calculate_oot_metrics(data_storage.get_eval_set(), config_storage)
  1. You can also start the process of saving simulation artifacts if you need it
saver = pipeline.artifact_saver
models = pipeline.prepared_model_dict
pipeline.oot_potential = oot_potential
models.update(lama)
nb_name = saver.get_notebook_path_and_save()
saver.save_artifacts(
    models=models,
    other_models=pipeline.other_model_dict,
    encoder=transformer.cat_transformer,
    ipynb_name=nb_name,
    feature_threshold=config_storage.feature_threshold,
)
saver.save_data(data=data_storage.get_eval_set(), dropped_data=data_storage.get_dropped_data())
  1. At the end, we can generate a development report. By default, it will be saved to the dreamml/results folder.
get_report(pipeline=pipeline, config_storage=config_storage, data_storage=data_storage, encoder=transformer.cat_transformer)

Authors


Author Email
Nikita Buts nikitabuts2000@gmail.com
Alexander Izyurov halfbrick845@gmail.com
Ivan Plotnikov com.gateway.api@gmail.com
Maidari Tsydenov maidaritsydenov@gmail.com
Evgeny Tkachenko e_t@inbox.ru
Ilya Ivanov morwes4@gmail.com
Nikita Varganov -

LICENSE


This project is licensed under the Apache License, Version 2.0. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dreamml-3.5.3.tar.gz (302.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dreamml-3.5.3-py3-none-any.whl (425.1 kB view details)

Uploaded Python 3

File details

Details for the file dreamml-3.5.3.tar.gz.

File metadata

  • Download URL: dreamml-3.5.3.tar.gz
  • Upload date:
  • Size: 302.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for dreamml-3.5.3.tar.gz
Algorithm Hash digest
SHA256 7a0967290ac04022c855a38d4d83d1a957e6011dbcf37939e145765ab98cfdb4
MD5 8c54616d90a06df5895328ea18a774f5
BLAKE2b-256 bc5a48a7c0806dd3d5073295c88dc8ffeb6791a502d941d7e330026babf95340

See more details on using hashes here.

File details

Details for the file dreamml-3.5.3-py3-none-any.whl.

File metadata

  • Download URL: dreamml-3.5.3-py3-none-any.whl
  • Upload date:
  • Size: 425.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for dreamml-3.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b2c55380ba4cbb666613d0267e2b32c8a511e674eeb05bab09659a6570058e06
MD5 6027492bb09598c506e2703e40274bd9
BLAKE2b-256 90bfd312730d99fc7718ce309246f75ba6d89db2b8dba5fa5f18d5677bc8ee19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page