Skip to main content

No project description provided

Project description

aws-ml-inference-demo

we aim at building a product that can truly teach languages to our learners. To do this we need to provide a personalized experience that suits each learner's needs. As part of this personalized learning, imagine that we want to prompt learners (within the app or via notifications) with a personalized message to encourage them to review the material they have learned recently. The message is personalized in the sense that it recommends one of the Listening, Speaking, Writing, or Flashcards exercise types that will most likely be attempted by the learner. For the sake of this exercise assume the content and every other aspect of the messages are fixed and static.

API and Architecture Design:

For our service, we need to expose two endpoints for training and inference. The incoming request will accept a learner identifier UUID and respond with one of four possible values: "listening", "speaking", "writing", or "flashcard".

Scale of the Service: There are around 1 million active daily users concentrated mainly in North American and Europe. This number might vary since we have 100k new app downloads every day.

Assumptions on the regions and user base distribution

There are 8 regions with 24 availability zones in Europe and 7 regions in North America We assume there is negligible number of learners in elsewhere.

With daily 100k downloads, the active daily users number can rise up to 15 millions in 3-4 months if they start learning after download. So the maximum (unlikely to reach) is 1 million user in every AWS geographical region. In order to keep network overhead low, we aim to deploy an AWS Sagemaker endpoint to each of 15 regions.

Sise of user records table

We want to keep network overhead costs low to increase payload of the messages in the future or their granularity. We expect each user to have 2 lessons (one hour maximum duration) per day maximum with maximum 6 exercsies per each lesson. So each user requires maximum 12 recommendations per day. That makes 12 million recommendation in any 2 hour window during the day.

endpoints

With these assumptions we deploy 15 AWS sagemaker lambda functions into each of the 15 global regions. The traffic between zones is rerouted automatically to the step function.

We will have two API endpoints:

POST /inference: This endpoint will accept a learner UUID in the request body and respond with a recommendation for an exercise type based on the Thompson Sampling algorithm.

POST /training: This endpoint will accept a learner UUID and the learner's response in the request body. The Lambda function will use the response to update the underlying models in the Thompson Sampling algorithm.

About training

The training in batches will result in the same sequence of prior param values as event-based training on new user interactions with the app in the region There are many ways to distribute training but the methodology is still in research phase [1]. There, we adopt on event training for the bandits which simply amounts to updating the priors for the best arm. We do not use the updated priors in the production endpoint.

API design: Syncing model updates

Every day, we shall update the model with latest priors for the day This is the structure of the bucket the structure of the prior files "s3://personalization-service/priors/region=us-east2/timestamp=2022-01-01 T08:04:15.1256/priors.json" the structure of the user experience data used for updating the priors "s3://personalization-service/experience/region=us-east2/timestamp=2022-01-01 T08:04:15.1256/data.parquet each local AWS sagemaker instance shall select latest by time priors for its region and re-run cloud build job to update the endpoint

storage size per region

Each request response is few bytes, 10 bytes, so overall data exchange in network is 0.012 Gb we have around 10 columns of user activity in app that we record, it makes 40 bytes per record Each exercise completion generates a record in an sql table, we expect 40 bytes * 12 million recommendation request 0.48 Gb per day of new data. We might store last 30 days of user interaction records, that will be 14 Gb per region

ingesting user experience data

  1. As soon as the learner studied 1/4, 2/4, 3/4, 4/4 of the lesson, the AWS Lambda function gets triggered and calls Kinesis Data Stream, a possible events.json is attached
  2. Kinesis Data stream calls the Firehose and store the file on S3 bucket.
  3. As soon as the file upload to the bucket, another Lambda gets trigger and call the Glue Job.
  4. As soon as the file upload to the bucket, we trigger "train" endpoint to produce new priors
  5. AWS Glue job runs and convert the JSON file format to Parquet file format
  6. Glue Job place this new file to the destination bucket.
  7. SnowPipe configure on the destination bucket and trigger once file uploads to Bucket.
  8. Data gets ingest to the Snowflake.

Modelling Problem:

We want to model probabilities that the suggested exercise type is useful for the user and he/she will start and finish the exercise. This probability shall be represented by the Beta-Bernoulli distribution We want to model the relevancy of each of the exercises as a multi-armed bandit with one arm corresponding to suggested exercise type. The probability of the completion of any of the four exercise types is represented by the beta bernoulli model. In other words, for each user uuid + possible attached features vector of features is classified by four labels: "listening", "speaking", "writing", or "flashcard", where each label indicates that exercise of this type will be completed by the user The reward is modelled by the beta-bernoulli model.

Billing

Snowflake instance

12 million requests per day per region 0.2 per 1M requests 0.2 * 12 = 2.4 usd per region

Snowflake instance incurs costs for storage of up to 14Gb per region and compute on demand (runing sql queries against the data). We do not plan to run SQL queries every day.

Bibliography

  1. https://arxiv.org/pdf/2105.10590.pdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

lingo_fit-0.2.0-py3-none-any.whl (19.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page