Skip to main content

Welcome to hana_automl - Automated Machine Learning library based on SAP HANA.

Project description

Read the Docs Lines of code GitHub issues GitHub Repo stars GitHub contributors


Logo

Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Science tasks.
📚 Explore the docs »

🐞 Report Bug · 🆕 Request Feature

Table of Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgements

About the project

What's this?

This is a simple but accurate Automated Machine Learning library. Based on SAP HANA powerful in-memory algorithms, it provides high accuracy in multiple machine learning tasks. Our library also uses numerous data preprocessing functions to automate routine data cleaning tasks. So, hana_automl goes through all AutoML steps and makes Data Science work easier.

What is SAP HANA?

From www.sap.com: SAP HANA is a high-performance in-memory database that speeds data-driven, real-time decisions and actions.

Documentation

https://sap-hana-automl.readthedocs.io/en/latest/index.html

Benchmarks

https://github.com/dan0nchik/SAP-HANA-AutoML/blob/main/comparison_openml.ipynb

ML tasks:

  • Binary classification
  • Regression
  • Multiclass classification
  • Forecasting

Steps automated:

  • Data exploration
  • Data preparation
  • Feature engineering
  • Model selection
  • Model training
  • Hyperparameter tuning

👇 By the end of summer 2021, blue part will be fully automated by our library Logo

Clients

  • GUI (Streamlit app)
  • Python library
  • CLI (coming soon)

Streamlit client Streamlit client

Built With

Getting Started

To get a package up and running, follow these simple steps.

Prerequisites

Make sure you have the following:

  1. ✅ Setup SAP HANA (skip this step if you have an instance with PAL enabled). There are 2 ways to do that.
    In HANA Cloud:

    • Create a free trial account
    • Setup an instance
    • Enable PAL - Predictive Analysis Library. It is vital to enable it because we use their algorithms.

    In Virtual Machine:

    • Rent a virtual machine in Azure, AWS, Google Cloud, etc.
    • Install HANA instance there or on your PC (if you have >32 Gb RAM).
    • Enable PAL - Predictive Analysis Library. It is vital to enable it because we use their algorithms.
  2. ✅ Installed software

  • Python > 3.6
    Skip this step if python --version returns > 3.6
  • Cython
    pip3 install Cython
    

Installation

There are 2 ways to install the library

  • Stable: from pypi
    pip3 install hana_automl
    
  • Latest: from the repository
    pip3 install https://github.com/dan0nchik/SAP-HANA-AutoML/archive/dev.zip
    
    Note: latest version may contain bugs, be careful!

After installation

Check that PAL (Predictive Analysis Library) is installed and roles are granted

  • Read docs section about that.
  • If you don't want to read docs, run this code
    from hana_automl.utils.scripts import setup_user
    from hana_ml.dataframe import ConnectionContext
    
    cc = ConnectionContext(address='address', user='user', password='password', port=39015)
    
    # replace with credentials of user that will be created or granted a role to run PAL.
    setup_user(connection_context=cc, username='user', password="password")
    

Usage

From code

Our library in a few lines of code

Connect to database.

from hana_ml.dataframe import ConnectionContext

cc = ConnectionContext(address='address',
                     user='username',
                     password='password',
                     port=1234)

Create AutoML model and fit it.

from hana_automl.automl import AutoML

model = AutoML(cc)
model.fit(
  file_path='path to training dataset', # it may be HANA table/view, or pandas DataFrame
  steps=10, # number of iterations
  target='target', # column to predict
  time_limit=120 # time limit in seconds
)

Predict.

model.predict(
file_path='path to test dataset',
id_column='ID',
verbose=1
)

For more examples, please refer to the Documentation

How to run Streamlit client

  1. Clone repository: git clone https://github.com/dan0nchik/SAP-HANA-AutoML.git
  2. Install dependencies: pip3 install -r requirements.txt
  3. Run GUI: streamlit run ./web.py

Roadmap

See the open issues for a list of proposed features (and known issues). Feel free to report any bugs :)

Contributing

Any contributions you make are greatly appreciated 👏!

  1. Fork the Project

  2. Create your Feature Branch (git checkout -b feature/NewFeature)

  3. Install dependencies

    pip3 install Cython
    
    pip3 install -r requirements.txt
    
  4. Create credentials.py file in tests directory Your files should look like this:

    SAP-HANA-AutoML
    │   README.md
    │   all other files   
    │   .....
    |
    └───tests
        │   test files...
        │   credentials.py
    

    Copy and paste this piece of code there and replace it with your credentials:

    host = "host"
    user = "username"
    password = "password"
    port = 39015 # or any port you need
    schema = "your schema"
    

    Don't worry, this file is in .gitignore, so your credentials won't be seen by anyone.

  5. Make some changes

  6. Write tests that cover your code in tests directory

  7. Run tests (under SAP-HANA-AutoML directory)

    pytest
    
  8. Commit your changes (git commit -m 'Add some amazing features')

  9. Push to the branch (git push origin feature/AmazingFeature)

  10. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.
Don't really understand license? Check out the MIT license summary.

Contact

Authors: @While-true-codeanything, @DbusAI, @dan0nchik

Project Link: https://github.com/dan0nchik/SAP-HANA-AutoML

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hana_automl-0.0.5.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hana_automl-0.0.5-py3-none-any.whl (53.7 kB view details)

Uploaded Python 3

File details

Details for the file hana_automl-0.0.5.tar.gz.

File metadata

  • Download URL: hana_automl-0.0.5.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.25.1

File hashes

Hashes for hana_automl-0.0.5.tar.gz
Algorithm Hash digest
SHA256 a1962ed8ff34bc7b8ae1b5eed51e48697bfe0888937018ab05a770f1f4bc2765
MD5 befcec6e6f89d025814f6405ca0ffa86
BLAKE2b-256 25f2a2d9bc8de96df301b135cd56a430329b716d538bf239a43f30f20d5d1402

See more details on using hashes here.

File details

Details for the file hana_automl-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: hana_automl-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 53.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.25.1

File hashes

Hashes for hana_automl-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9d65270674154156519710589d53a73d15839f45b4f8510c13d5b56185b428ff
MD5 dd8005468f1daedcbb696f128cc17242
BLAKE2b-256 f4ade21bf2933a5e2a7d31e60b73116af1844f339ca32141f7e79abf81fe6cd9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page