Streamlined Recommender System workflows with TensorFlow and Kubeflow
Project description
Rexify is a library to streamline recommender systems model development. It is built on top of Tensorflow Recommenders models and Kubeflow pipelines.
In essence, Rexify adapts dynamically to your data, and outputs high-performing TensorFlow models that may be used wherever you want, independently of your data. Rexify also includes modules to deal with feature engineering as Scikit-Learn Transformers and Pipelines.
Installation
For now, you'll have to install Rexify from source:
pip install git+https://github.com/joseprsm/rexify.git
Quick Tour
Rexify is meant to be usable right out of the box. All you need to set up your model is interaction data - something that kind of looks like this:
user_id | item_id | timestamp | item_name | event_type |
---|---|---|---|---|
22 | 67 | 2021/05/13 | Blue Jeans | Purchase |
37 | 9 | 2021/04/11 | White Shirt | Page View |
22 | 473 | 2021/04/11 | Red Purse | Add to Cart |
... | ... | ... | ... | ... |
358 | 51 | 2021/04/11 | Bracelet | Purchase |
Additionally, we'll have to have configured a schema for the data.
This schema is what will allow Rexify to generate a dynamic model and preprocessing steps.
The schema should be comprised of three dictionaries: user
, ìtem
, context
.
Each of these dictionaries should consist of features and internal data types,
such as: id
, categorical
, timestamp
, text
. More data types will be available
in the future.
{
"user": {
"user_id": "id"
},
"item": {
"item_id": "id",
"timestamp": "timestamp",
"item_name": "text"
},
"context": {
"event_type": "categorical"
}
}
Essentially, what Rexify will do is take the schema, and dynamically adapt to the data.
As a package
There are two main components in Rexify workflows: FeatureExtractor
and Recommender
.
The FeatureExtractor
is a scikit-learn Transformer that basically takes the schema of the data, and transforms the event data accordingly. Another method .make_dataset()
, converts the transformed data into a tf.data.Dataset
, all correctly configured to be fed to the Recommender
model. You can read more about how the FeatureExtractor
works here.
Recommender
is a tfrs.Model
that basically implements the Query and Candidate towers. During training, the Query tower will take the user ID, user features, and context, to learn an embedding; the Candidate tower will do the same for the item ID and its features. More information about the Recommender
model can be found here.
A sample Rexify workflow should sort of look like this:
import json
import pandas as pd
from rexify.features import FeatureExtractor
from rexify.models import Recommender
events = pd.read_csv('path/to/events/data')
with open('path/to/schema') as f:
schema = json.load(f)
feat = FeatureExtractor(schema)
prep_data = feat.fit_transform(events)
ds = feat.make_dataset(prep_data)
model = Recommender(**feat.model_params)
model.compile()
model.fit(ds)
When training is complete, you'll have a trained tf.keras.Model
ready to be used, as you normally would.
As a prebuilt pipeline
After cloning this project and setting up the necessary environment variables, you can run:
python -m rexify.pipeline
Which should output a pipeline.json
file. You can then upload this file manually to
either a Kubeflow Pipeline or Vertex AI Pipelines instance, and it should run seamlessly.
You can also check the Kubeflow Pipeline and Vertex AI documentation to learn how to submit these pipelines programmatically.
The prebuilt pipeline consists of 5 components:
download
, which downloads the event data from URLs set on the$INPUT_DATA_URL
and$SCHEMA_URL
environment variablesload
, which prepares the data downloaded in the previous steptrain
, which trains aRecommender
model on the preprocessed dataindex
, which trains a ScaNN model to retrieve the nearest neighborsretrieval
, which basically retrieves the nearest k neighbors for each of the known users
Via the demo application
After cloning the project, install the demo dependencies and run the Streamlit application:
pip install -r demo/requirements.txt
streamlit run demo/app.py
Or, if you're using docker:
docker run joseprsm/rexify-demo
You can then follow the steps here to set up your pipeline.
During setup, you'll be asked to either input a publicly available dataset URL or use a sample data set. After that, you'll have a form to help you set up the schema for the data.
Finally, after hitting "Compile", you'll have your Pipeline Spec ready. The resulting JSON file can then be uploaded to Vertex AI Pipelines or Kubeflow, seamlessly.
The key difference from this pipeline to the prebuilt one is that instead of using the download
component to download the schema, it will pass it as an argument to the pipeline, and then use a copy
component to pass it down as an artifact.
Who is this for?
Rexify is a project that simplifies and standardizes the workflow of recommender systems. It is mostly geared towards people with little to no machine learning knowledge, that want to implement somewhat scalable Recommender Systems in their applications.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.