Skip to main content

Framework for creating, running and validation of ML models on tabular data

Project description

DreamML - Self Machine Learning ❤️

The next stage of evalution DS-Template

DreamML_promo

About the DreamML


DreamML is a machine learning framework aimed at the industrial process. The main task is to choose a simple model, taking into account the balance of complexity, quality and metrics. We also suggest reviewing the quality of the models in special development reports, and for some tasks, a validation report created using the central bank's methodology.

*This is the first cycle of the project's release into open source, then we plan to publish more materials and improve the framework.


Installation

📦 Python Package

pip install dreamml

📂 Repository

Step 1: Install Anaconda or Python 3.8

Step 2: Create environment

  • Anaconda conda create --name dreamml_env python=3.8
  • Python 3.8 python -m venv dreamml_env

Step 3: Activate environment

  • Anaconda conda activate dreamml_env
  • Python source dreamml_env/bin/activate

Step 4: Clone the repository and go to the dreamml root folder

git clone https://gitverse.ru/dreamml/DreamML.git
cd DreamML

Step 5: Install dreamml in your environment

pip install -e .

🐳 Docker

git clone https://gitverse.ru/dreamml/DreamML.git
cd DreamML
docker build -t dreamml:v3.5.4 .
docker run -d -p 8888:8888 -v $(pwd):/app --name dreamml_container dreamml:v3.5.4

(!) If ${pwd} does not work (for example, in older versions of PowerShell), use the absolute path:

docker run -d -p 8888:8888 -v C:\path\to\DreamML:/app --name dreamml_container dreamml:v3.5.4

Then go to http://localhost:8888

Get started


To develop a model, you can use the notebooks located in the notebooks/1. Model Development and select the one you need depending on the type of your task.

To validate models, you can use the notebooks located in the notebooks/2. Validate Model

To calibration models, you can use the notebooks located in the notebooks/3. Calibration

How to Use


Information on notebooks for development notebooks/1. Model Development

  1. First, you need to determine the pipeline configuration

    • For regression, binary, multiclass, multilabel tasks you can refer to this document docs/1_Model_Development_doc.md
    • For topic_modeling task you can refer to this document docs/1_Topic_Modeling_doc.md
    • For timeseries with (boosting) task you can refer to this document docs/1_TimeSeries_doc.md
    • For amts with (Prophet) task you can refer to this document docs/1_AltModeTimeSeries_forecast.md
    • If your dataset contains text features you should refer to this document docs/1_NLP_text_classification_doc.md
    • If you would like to learn more about quality metrics and loss functions, we recommend that you refer to the document docs/Binary_Classification_Metrics_doc.md
  2. You should start building the configuration and preparing the data for modeling

config_storage = ConfigStorage(config=config)
transformer = DataTransformer(config_storage)
data_storage = transformer.transform()
  1. Next, you should run the simulation pipeline
pipeline = MainPipeline(config_storage=config_storage, data_storage=data_storage)
pipeline.transform()
  1. For some tasks, you can also use Light Auto M L as a model and calculate out of time potential
lama = add_lama_model(data_storage.get_eval_set(), config_storage)
oot_potential = calculate_oot_metrics(data_storage.get_eval_set(), config_storage)
  1. You can also start the process of saving simulation artifacts if you need it
saver = pipeline.artifact_saver
models = pipeline.prepared_model_dict
pipeline.oot_potential = oot_potential
models.update(lama)
nb_name = saver.get_notebook_path_and_save()
saver.save_artifacts(
    models=models,
    other_models=pipeline.other_model_dict,
    encoder=transformer.cat_transformer,
    ipynb_name=nb_name,
    feature_threshold=config_storage.feature_threshold,
)
saver.save_data(data=data_storage.get_eval_set(), dropped_data=data_storage.get_dropped_data())
  1. At the end, we can generate a development report. By default, it will be saved to the dreamml/results folder.
get_report(pipeline=pipeline, config_storage=config_storage, data_storage=data_storage, encoder=transformer.cat_transformer)

Authors


Author Email
Nikita Buts nikitabuts2000@gmail.com
Alexander Izyurov halfbrick845@gmail.com
Ivan Plotnikov com.gateway.api@gmail.com
Maidari Tsydenov maidaritsydenov@gmail.com
Evgeny Tkachenko e_t@inbox.ru
Ilya Ivanov morwes4@gmail.com
Nikita Varganov -

LICENSE


This project is licensed under the Apache License, Version 2.0. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dreamml-3.5.4.tar.gz (321.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dreamml-3.5.4-py3-none-any.whl (449.0 kB view details)

Uploaded Python 3

File details

Details for the file dreamml-3.5.4.tar.gz.

File metadata

  • Download URL: dreamml-3.5.4.tar.gz
  • Upload date:
  • Size: 321.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for dreamml-3.5.4.tar.gz
Algorithm Hash digest
SHA256 965a94073b8456c2921cd1c9aec822ef2ed1dc631efb51f7a035764bc47a702b
MD5 eccab00cbde39c29a9e132b375bb7e05
BLAKE2b-256 d8219f5f02df2f11012ff8fcd3dd0595a7d15f46e518a5c3cafab88bb8588cf8

See more details on using hashes here.

File details

Details for the file dreamml-3.5.4-py3-none-any.whl.

File metadata

  • Download URL: dreamml-3.5.4-py3-none-any.whl
  • Upload date:
  • Size: 449.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for dreamml-3.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6ae94b883cd71aff4e1e639f3aa53c409af812062f2954c1d55f2fc36577cdb8
MD5 02224afb8edba3de30d06e16ecf741ce
BLAKE2b-256 2da0fa4d2f6392da277b3e126d36f08871ac1a3a97160f248bb6babd654d3265

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page