No project description provided

These details have not been verified by PyPI

Project links

Homepage

Project description

hermione

Hermione

A Data Science Project struture in cookiecutter style.

Developed with ❤️ by A3Data

What is Hermione?

Hermione is the newest open source library that will help Data Scientists on setting up more organized codes, in a quicker and simpler way. Besides, there are some classes in Hermione which assist with daily tasks such as: column normalization and denormalization, data view, text vectoring, etc. Using Hermione, all you need is to execute a method and the rest is up to her, just like magic.

Why Hermione?

To bring in a little of A3Data experience, we work in Data Science teams inside several client companies and it’s undeniable the excellence of notebooks as a data exploration tool. Nevertheless, when it comes to data science products and their context, when the models needs to be consumed, monitored and have periodic maintenance, putting it into production inside a Jupyter Notebook is not the best choice (we are not even mentioning memory and CPU performance yet). And that’s why Hermione comes in! We have been inspired by this brilliant, empowered and awesome witch of The Harry Potter saga to name this framework!

This is also our way of reinforcing our position that women should be taking more leading roles in the technology field. #CodeLikeAGirl

Installing

Dependencies

Python (>= 3.8)
docker

Hermione does not depend on conda to build and manage virtual environments anymore. It uses venv instead.

Install

pip install -U hermione-ml

Enabling autocompletion (unix users):

For bash:

echo 'eval "$(_HERMIONE_COMPLETE=source_bash hermione)"' >> ~/.bashrc

For Zsh:

echo 'eval "$(_HERMIONE_COMPLETE=source_zsh hermione)"' >> ~/.zshrc

How do I use Hermione?

After installed Hermione:

Create you new project:

hermione project new project_hermione

Hit Enter if you want to start with an example code

Please select one of the following templates 
	(0) starter 
	(1) barebones 
	(2) sagemaker 
Option [0]:

Hermione already creates a virtual environment for the project. For Windows users, activate it with

<project_name>_env\Scripts\activate

For linux and MacOS users, do

source <project_name>_env/bin/activate

After activating, you should install some libraries. There are a few suggestions in “requirements.txt” file:

pip install -r requirements.txt

Now, if you selected the starter version, we will train some models from the example, using MLflow ❤. To do so, inside project directory, just type: hermione train. The “hermione run train” command will search for a train.py file and execute it. In the example, models and metrics are already controlled via MLflow.

After that, a mlflow experiment is created. To verify the experiment in mlflow, type: mlflow ui. The application will go up.

mlflow ui

[2020-10-19 23:23:12 -0300] [15676] [INFO] Starting gunicorn 19.10.0
[2020-10-19 23:23:12 -0300] [15676] [INFO] Listening at: http://127.0.0.1:5000 (15676)
[2020-10-19 23:23:12 -0300] [15676] [INFO] Using worker: sync
[2020-10-19 23:23:12 -0300] [15678] [INFO] Booting worker with pid: 15678

To access the experiment, just enter the path previously provided in your preferred browser. Then it is possible to check the trained models and their metrics.

To make batch predictions using your predict.py file, type hermione run predict. The default implemented version will print some predictions for you in the terminal.

hermione run predict

In the Titanic example, we also provide a step by step notebook. To view it, just type jupyter notebook inside directory notebooks.

If you selected the Sagemaker version, click here to check a tutorial.

Do you want to create your project from scratch? There click here to check a tutorial.

Docker

Hermione comes with a default Dockerfile which implements a FastAPI application that serves your ML model. You should take a look at the api/app.py module and rewrite predict_new() function as you see fit.

Also, in the newest version, hermione brings two CLI commands that helps us abstract a little bit the complexity regarding docker commands. To build an image (remember you should have docker installed), you should be in the project's root directory. Than, do:

hermione run build <IMAGE_NAME>

After you have built you're docker image, run it with:

hermione run container <IMAGE_NAME>

[2020-10-20 02:13:20 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2020-10-20 02:13:20 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2020-10-20 02:13:20 +0000] [1] [INFO] Using worker: sync
[2020-10-20 02:13:20 +0000] [7] [INFO] Booting worker with pid: 7
[2020-10-20 02:13:20 +0000] [8] [INFO] Booting worker with pid: 8
[2020-10-20 02:13:20 +0000] [16] [INFO] Booting worker with pid: 16

THAT IS IT! You have a live model up and running. To test your API, hermione provides a api/myrequests.py module. This is not part of the project; it's a "ready to go" code to make requests to the API. Help yourself!

cd src/api
python myrequests.py

Sending request for model...
Data: {"Pclass": [3, 2, 1], "Sex": ["male", "female", "male"], "Age": [4, 22, 28]}
Response: "[0.24630952 0.996      0.50678968]"

Play a little with the 'fake' data and see how far can the predictions go.

Documentation

This is the class structure diagram that Hermione relies on:

Here we describe briefly what each class is doing:

Data Source

DataBase - should be used when data recovery requires a connection to a database. Contains methods for opening and closing a connection.
Spreadsheet - should be used when data recovery is in spreadsheets/text files. All aggregation of the bases to generate a "flat table" should be performed in this class.
DataSource - abstract class which DataBase and Spreadsheet inherit from.

Preprocessing

Preprocessing - concentrates all preprocessing steps that must be performed on the data before the model is trained.
Normalization - applies normalization and denormalization to reported columns. This class contains the following normalization algorithms already implemented: StandardScaler e MinMaxScaler.
TextVectorizer - transforms text into vector. Implemented methods: Bag of words, TF_IDF, Embedding: mean, median e indexing.
DataQuality - concentrates all data validation steps that must be performed on the data to ensure its quality.

Visualization

Visualization - methods for data visualization. There are methods to make static and interactive plots.
App Streamlit - streamlit example consuming Titanic dataset, including pandas profilling.

Model

Trainer - module that centralizes training algorithms classes. Algorithms from scikit-learn library, for instance, can be easily used with the TrainerSklearn implemented class.
Wrapper - centralizes the trained model with its metrics. This class has built-in integration with MLFlow.
Metrics - it contains key metrics that are calculated when models are trained. Classification, regression and clustering metrics are already implemented.

Tests

test_project - module for unit testing.

Contributing

Have a look at our contributing guide.

Make a pull request with your implementation.

For suggestions, contact us: hermione@a3data.com.br

Licence

Hermione is open source and has Apache 2.0 License:

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.10.0

Apr 18, 2022

0.9.2

Sep 14, 2021

0.9.1

Sep 13, 2021

0.9.0

Sep 13, 2021

0.8.0

Jul 29, 2021

0.7.1

Jul 12, 2021

0.6.0

May 27, 2021

0.5.4

May 18, 2021

0.5.3

May 18, 2021

0.5.2

Apr 22, 2021

0.5.1

Apr 14, 2021

0.5.0

Apr 12, 2021

0.4.0

Apr 8, 2021

0.3.4

Feb 25, 2021

0.3.2

Nov 24, 2020

0.3.1

Nov 12, 2020

0.3

Oct 20, 2020

0.2

Aug 13, 2020

0.1.5

Jun 23, 2020

0.1.4

Jun 4, 2020

0.1.3

Jun 4, 2020

0.1.2

Jun 1, 2020

0.1.1

Jun 1, 2020

0.1

Jun 1, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hermione-ml-0.10.0.tar.gz (58.3 kB view details)

Uploaded Apr 18, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hermione_ml-0.10.0-py3-none-any.whl (451.6 kB view details)

Uploaded Apr 18, 2022 Python 3

File details

Details for the file hermione-ml-0.10.0.tar.gz.

File metadata

Download URL: hermione-ml-0.10.0.tar.gz
Upload date: Apr 18, 2022
Size: 58.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for hermione-ml-0.10.0.tar.gz
Algorithm	Hash digest
SHA256	`8062dfe18388875562a9c0d5d416dbe76b4d8a27dcff3928edb5448789b954a7`
MD5	`b28cd7548347315feed2473497fa65a4`
BLAKE2b-256	`8c74359dff472ee1881143cb114266106a76b74bfa502c884e12ab04618247fb`

See more details on using hashes here.

File details

Details for the file hermione_ml-0.10.0-py3-none-any.whl.

File metadata

Download URL: hermione_ml-0.10.0-py3-none-any.whl
Upload date: Apr 18, 2022
Size: 451.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for hermione_ml-0.10.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`552a6e4a94404d6c524008df3ec2708ac0bbfea346d65114a381cf55f2255c2b`
MD5	`1bf3aabd41845e416d1cf93a3d4dcf67`
BLAKE2b-256	`47b54907dc010feaee28f4a4f7881d16f250a3a3bf232604552e12ee16a4643e`

See more details on using hashes here.

hermione-ml 0.10.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What is Hermione?

Why Hermione?

Installing

Dependencies

Install

Enabling autocompletion (unix users):

How do I use Hermione?

Docker

Documentation

Data Source

Preprocessing

Visualization

Model

Tests

Contributing

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes