The app for independent voices

Serving ML models is one of the most complex steps when it comes to AI/ML in production, as you have to put all the pieces together into a unified system while considering:

  • throughput/latency requirements

  • infrastructure costs

  • data and model access

  • training-serving skew

.

As we started this project with production in mind by using the Hopsworks AI Lakehouse, we can easily bypass most of these issues, such as:

  • the query and ranking models are accessed from the model registry;

  • the customer and H&M article features are accessed from the feature store using the offline and online stores depending on throughput/latency requirements;

  • the features are accessed from a single source of truth (feature store), solving the training-serving skew.

.

Estimating infrastructure costs in a PoC is more complicated. Still, we will leverage a Kubernetes cluster managed by Hopsworks, which uses KServe to scale up and down our real-time personalized recommender depending on traffic.

.

Thus, in this lesson, you will learn how to:

  • Architect offline and online inference pipelines using MLOps best practices.

  • Implement offline and online pipelines for an H&M real-time personalized recommender.

  • Deploy the online inference pipeline using the KServe engine.

  • Test the H&M personalized recommender from a Streamlit app.

  • Deploy the offline ML pipelines using GitHub Actions.

.

Enjoy!

Deploy scalable TikTok-like recommenders
Dec 26, 2024
at
8:03 AM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.