Welcome to hana_automl - Automated Machine Learning library based on SAP HANA.
Project description
Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Science tasks.
📚 Explore the docs »
🐞 Report Bug
·
🆕 Request Feature
Table of Contents
About the project
What's this?
This is a simple but accurate Automated Machine Learning library. Based on SAP HANA powerful in-memory algorithms, it provides high accuracy in multiple machine learning tasks. Our library also uses numerous data preprocessing functions to automate routine data cleaning tasks. So, hana_automl goes through all AutoML steps and makes Data Science work easier.
What is SAP HANA?
From www.sap.com: SAP HANA is a high-performance in-memory database that speeds data-driven, real-time decisions and actions.
Documentation
https://sap-hana-automl.readthedocs.io/en/latest/index.html
Benchmarks
https://github.com/dan0nchik/SAP-HANA-AutoML/blob/main/comparison_openml.ipynb
ML tasks:
- Binary classification
- Regression
- Multiclass classification
- Forecasting
Steps automated:
- Data exploration
- Data preparation
- Feature engineering
- Model selection
- Model training
- Hyperparameter tuning
👇 By the end of summer 2021, blue part will be fully automated by our library
Clients
- GUI (Streamlit app)
- Python library
- CLI (coming soon)
Streamlit client
Built With
Getting Started
To get a package up and running, follow these simple steps.
Prerequisites
Make sure you have the following:
-
✅ Setup SAP HANA (skip this step if you have an instance with PAL enabled). There are 2 ways to do that.
In HANA Cloud:- Create a free trial account
- Setup an instance
- Enable PAL - Predictive Analysis Library. It is vital to enable it because we use their algorithms.
In Virtual Machine:
-
✅ Installed software
- Python > 3.6
Skip this step ifpython --versionreturns > 3.6 - Cython
pip3 install Cython
Installation
There are 2 ways to install the library
- Stable: from pypi
pip3 install hana_automl
- Latest: from the repository
pip3 install https://github.com/dan0nchik/SAP-HANA-AutoML/archive/dev.zip
Note: latest version may contain bugs, be careful!
After installation
Check that PAL (Predictive Analysis Library) is installed and roles are granted
- Read docs section about that.
- If you don't want to read docs, run this code
from hana_automl.utils.scripts import setup_user from hana_ml.dataframe import ConnectionContext cc = ConnectionContext(address='address', user='user', password='password', port=39015) # replace with credentials of user that will be created or granted a role to run PAL. setup_user(connection_context=cc, username='user', password="password")
Usage
From code
Our library in a few lines of code
Connect to database.
from hana_ml.dataframe import ConnectionContext
cc = ConnectionContext(address='address',
user='username',
password='password',
port=1234)
Create AutoML model and fit it.
from hana_automl.automl import AutoML
model = AutoML(cc)
model.fit(
file_path='path to training dataset', # it may be HANA table/view, or pandas DataFrame
steps=10, # number of iterations
target='target', # column to predict
time_limit=120 # time limit in seconds
)
Predict.
model.predict(
file_path='path to test dataset',
id_column='ID',
verbose=1
)
For more examples, please refer to the Documentation
How to run Streamlit client
- Clone repository:
git clone https://github.com/dan0nchik/SAP-HANA-AutoML.git - Install dependencies:
pip3 install -r requirements.txt - Run GUI:
streamlit run ./web.py
Roadmap
See the open issues for a list of proposed features (and known issues). Feel free to report any bugs :)
Contributing
Any contributions you make are greatly appreciated 👏!
-
Fork the Project
-
Create your Feature Branch (
git checkout -b feature/NewFeature) -
Install dependencies
pip3 install Cython
pip3 install -r requirements.txt
-
Create
credentials.pyfile intestsdirectory Your files should look like this:SAP-HANA-AutoML │ README.md │ all other files │ ..... | └───tests │ test files... │ credentials.pyCopy and paste this piece of code there and replace it with your credentials:
host = "host" user = "username" password = "password" port = 39015 # or any port you need schema = "your schema"
Don't worry, this file is in .gitignore, so your credentials won't be seen by anyone.
-
Make some changes
-
Write tests that cover your code in
testsdirectory -
Run tests (under
SAP-HANA-AutoML directory)pytest
-
Commit your changes (
git commit -m 'Add some amazing features') -
Push to the branch (
git push origin feature/AmazingFeature) -
Open a Pull Request
License
Distributed under the MIT License. See LICENSE for more information.
Don't really understand license? Check out the MIT license summary.
Contact
Authors: @While-true-codeanything, @DbusAI, @dan0nchik
Project Link: https://github.com/dan0nchik/SAP-HANA-AutoML
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hana_automl-0.0.5.tar.gz.
File metadata
- Download URL: hana_automl-0.0.5.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.25.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1962ed8ff34bc7b8ae1b5eed51e48697bfe0888937018ab05a770f1f4bc2765
|
|
| MD5 |
befcec6e6f89d025814f6405ca0ffa86
|
|
| BLAKE2b-256 |
25f2a2d9bc8de96df301b135cd56a430329b716d538bf239a43f30f20d5d1402
|
File details
Details for the file hana_automl-0.0.5-py3-none-any.whl.
File metadata
- Download URL: hana_automl-0.0.5-py3-none-any.whl
- Upload date:
- Size: 53.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.25.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d65270674154156519710589d53a73d15839f45b4f8510c13d5b56185b428ff
|
|
| MD5 |
dd8005468f1daedcbb696f128cc17242
|
|
| BLAKE2b-256 |
f4ade21bf2933a5e2a7d31e60b73116af1844f339ca32141f7e79abf81fe6cd9
|