Skip to main content

Framework for creating, running and validation of ML models on tabular data

Project description

DreamML - Self Machine Learning ❤️

The next stage of evalution DS-Template

DreamML_promo

About the DreamML


DreamML is a machine learning framework aimed at the industrial process. The main task is to choose a simple model, taking into account the balance of complexity, quality and metrics. We also suggest reviewing the quality of the models in special development reports, and for some tasks, a validation report created using the central bank's methodology.

*This is the first cycle of the project's release into open source, then we plan to publish more materials and improve the framework.


DreamML Concepts

  • Flexibility. DreamML can be used to automate the construction of solutions for various problem, data types (text, tables), and models.

  • Tuningability. Various hyper-parameters tuning methods are supported including models custom evaluation metrics and search spaces.

  • Validability. DreamML provides the ability to validate models, ensuring they meet necessary quality standards and are ready for use in real-world conditions.

  • Integrability. DreamML supports widely used ML libraries (Scikit-learn, CatBoost, XGBoost, Optuna, etc.).

  • Reproducibility. The generated pipelines and model artifacts are automatically saved in the experiment folder for reproducibility. Additionally, there is an option to resume training from checkpoints.

  • Customizability. DreamML allows managing models complexity and thereby achieving desired quality.

  • Production-orientability. The saved model artifacts and code can be easily wrapped into the necessary artifacts for deployment in production.


Installation

Get started


To develop a model, you can use the notebooks located in the notebooks/1. Model Development and select the one you need depending on the type of your task.

To validate models, you can use the notebooks located in the notebooks/2. Validate Model

To calibration models, you can use the notebooks located in the notebooks/3. Calibration

How to Use


Information on notebooks for development notebooks/1. Model Development

  1. First, you need to determine the pipeline configuration

  2. You should start building the configuration and preparing the data for modeling

config_storage = ConfigStorage(config=config)
transformer = DataTransformer(config_storage)
data_storage = transformer.transform()
  1. Next, you should run the simulation pipeline
pipeline = MainPipeline(config_storage=config_storage, data_storage=data_storage)
pipeline.transform()
  1. For some tasks, you can also use Light Auto M L as a model and calculate out of time potential
lama = add_lama_model(data_storage.get_eval_set(), config_storage)
oot_potential = calculate_oot_metrics(data_storage.get_eval_set(), config_storage)
  1. You can also start the process of saving simulation artifacts if you need it
saver = pipeline.artifact_saver
models = pipeline.prepared_model_dict
pipeline.oot_potential = oot_potential
models.update(lama)
nb_name = saver.get_notebook_path_and_save()
saver.save_artifacts(
    models=models,
    other_models=pipeline.other_model_dict,
    encoder=transformer.cat_transformer,
    ipynb_name=nb_name,
    feature_threshold=config_storage.feature_threshold,
)
saver.save_data(data=data_storage.get_eval_set(), dropped_data=data_storage.get_dropped_data())
  1. At the end, we can generate a development report. By default, it will be saved to the dreamml/results folder.
get_report(pipeline=pipeline, config_storage=config_storage, data_storage=data_storage, encoder=transformer.cat_transformer)

Authors


Author Email
Nikita Buts nikitabuts2000@gmail.com
Alexander Izyurov halfbrick845@gmail.com
Ivan Plotnikov com.gateway.api@gmail.com
Maidari Tsydenov maidaritsydenov@gmail.com
Evgeny Tkachenko e_t@inbox.ru
Ilya Ivanov morwes4@gmail.com
Nikita Varganov -

LICENSE


This project is licensed under the Apache License, Version 2.0. See LICENSE for details.

PyPI Version Documentation Status

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dreamml-3.6.3.tar.gz (33.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dreamml-3.6.3-py3-none-any.whl (33.6 MB view details)

Uploaded Python 3

File details

Details for the file dreamml-3.6.3.tar.gz.

File metadata

  • Download URL: dreamml-3.6.3.tar.gz
  • Upload date:
  • Size: 33.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for dreamml-3.6.3.tar.gz
Algorithm Hash digest
SHA256 8f07a0cdf82de3aa1b91572cff4964c4b51fb79c2f72b7b4abbb50f22dccaad0
MD5 8c6afc0bc3d2b13113db108c263f635e
BLAKE2b-256 da2819b513e1785e40ed5534c0a6dcd264ccfecb2b690680899f2b5c82d1f6ab

See more details on using hashes here.

File details

Details for the file dreamml-3.6.3-py3-none-any.whl.

File metadata

  • Download URL: dreamml-3.6.3-py3-none-any.whl
  • Upload date:
  • Size: 33.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for dreamml-3.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3b0d04d2495f6a474386b496fc34ff820a461d51e0567b16b0193e006b1636b4
MD5 484640c49c0d7e7879a9944f4e8e81bf
BLAKE2b-256 be5286ee09feaa77aecc12e85c4a5260003081e894f52639c2ea20e650716ef2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page