Implementing Canary Releases of TensorFlow Model Deployments with Kubernetes and Istio

search share Unirse Acceder

Implementing Canary Releases of TensorFlow Model Deployments with Kubernetes and Istio

2 horas 7 créditos


Google Cloud Self-Paced Labs


Istio is an open source framework for connecting, securing, and managing microservices, including services running on Kubernetes Engine. It lets you create a mesh of deployed services with load balancing, service-to-service authentication, monitoring, and more, without requiring any changes in service code.

This lab shows you how to use Istio on Google Kubernetes Engine (GKE) and TensorFlow Serving to create canary deployments of TensorFlow machine learning models.


In this lab, you will learn how to:

  • Prepare a GKE cluster with the Istio add-on for TensorFlow Serving.

  • Create a canary release of a TensorFlow model deployment.

  • Configure various traffic splitting strategies.


To successfully complete the lab you need to have a solid understanding of how to save and load TensorFlow models and a basic familiarity with Kubernetes and Istio concepts and architecture. Before proceeding with the lab we recommend reviewing the following resources:

Lab scenario

In the lab, you will walk through a canary deployment of two versions of the ResNet model. The idea behind a canary deployment is to introduce a new version of a service (model deployment) by first testing it using a small percentage of user traffic, and then if the new model meets the set requirements, redirect, possibly gradually in increments, the traffic from the old version to the new one.

In its simplest form, the traffic sent to the canary version is a randomly selected percentage of requests send to a common endpoint that exposes both models. The more sophisticated traffic splitting schemas can also be used. For example, the traffic can be split based on the orginating region, user or user group, or other properties of the request. When the traffic is split based on well defined groups of originators, the canary deployment can be used as a foundation of A/B testing.

You will use TensorFlow Serving to deploy two versions of ResNet: ResNet50 and ResNet101. Both models expose the same interface (inputs and outputs). ResNet50 will be a simulated production model. ResNet101 will be a new, canary release.

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data. TensorFlow Serving can be run in a docker container and deployed and managed by Kubernetes. In the lab, you will deploy TensorFlow Serving as a Kubernetes Deployment on Google Cloud Kubernetes Engine (GKE).

Istio will be used to configure transparent traffic splitting between both deployments. Both models will be exposed through the same external endpoint. You will use Istio's traffic management features to experiment with various traffic splitting strategies.

Summary of the tasks performed during the lab:

  • Creating a GKE cluster with Istio add-on

  • Deploying ResNet models using TensorFlow Serving

  • Configuring Istio Ingress gateway

  • Configuring Istio virtual services and destination rules

  • Configuring weight based routing

  • Configuring content based routing

Únase a Qwiklabs para leer este lab completo… y mucho más.

  • Obtenga acceso temporal a Google Cloud Console.
  • Más de 200 labs para principiantes y niveles avanzados.
  • El contenido se presenta de a poco para que pueda aprender a su propio ritmo.
Únase para comenzar este lab