Skip to main content

An end-to-end machine learning pipeline to train ml model and deploy it to realtime inference endpoint

Project description

personalization


An end-to-end demo machine learning pipeline to provide an artifact for a real-time inference service

Requirements: we want to create a machine learning pipeline which satisfies the following properties

1. Multiple Models Support: The code should support maintaining 
wide range of machine learning algorithms,
linear regression, decision trees, random forests,
and deep learning models, to meet diverse business requirements.
2. Configurability: The API should be highly configurable to
    allow users to customize
    the machine learning models to their specific use cases.
    This may include hyperparameter tuning, feature selection, and feature engineering.
3. Flexibility: The API should be flexible enough to handle a wide range of data formats,
such as CSV, JSON, and Parquet. It should also support various
deployment environments, such as on-premises, cloud-based, and hybrid environments.
4. Scalability: The API should be designed with scalability in mind,
meaning it can handle large volumes of data, high request rates, and multiple concurrent users.
This may involve incorporating distributed computing
and parallel processing techniques to handle the workload.
5. Support versioning with MLFlow
6. Documentation: The API should be accompanied by comprehensive documentation,
including user manuals, API reference guides, and developer documentation.
This will make it easier for users to learn
how to use the API and integrate it into their applications.

How to run

  1. git clone git@github.com:ra312/personalization.git && cd personalization
  2. obttain sessions.csv and venues.csv and move them to the root folder
  3. install poetry on Linux, MacOS
curl -sSL https://install.python-poetry.org | python3 - --version 1.3.2

How to train pipeline and get artifact, copy this into bash

python3 -m personalization \
    --sessions-bucket-path sessions.csv \
    --venues-bucket-path venues.csv \
    --objective lambdarank \
    --num_leaves 100 \
    --min_sum_hessian_in_leaf 10 \
    --metric ndcg --ndcg_eval_at 10 20 \
    --learning_rate 0.8 \
    --force_row_wise True \
    --num_iterations 10 \
    --trained-model-path trained_model.joblib

PyPI version Test Status Lint Status codecov Join the chat at https://gitter.im/ra312/personalization License Downloads Code style: black Imports: isort CI


Read Latest Documentation - Browse GitHub Code Repository


personalization An endpoint service to provide real-time personalization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

personalization-0.0.1-py3-none-any.whl (13.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page