Introduction to Kubeflow on Google Kubernetes Engine

Introduction to Kubeflow on Google Kubernetes Engine

1 hour 20 minutes 7 Credits


Google Cloud Self-Paced Labs



As datasets continue to expand and models become more complex, distributing machine learning (ML) workloads across multiple nodes has become more attractive. Unfortunately, breaking up and distributing a workload can add both computational overhead and a great deal more complexity to the system. Data scientists should be able to focus on ML problems, not DevOps.

Fortunately, distributed workloads are becoming easier to manage, thanks to Kubernetes. Kubernetes is a mature, production-ready platform that gives developers a simple API to deploy programs to a cluster of machines as if they were a single piece of hardware. Using Kubernetes, computational resources and be added or removed as desired, and the same cluster can be used to both train and serve ML models.

This lab is an introduction to Kubeflow, an open-source project which aims to make running ML workloads on Kubernetes simple, portable and scalable. Kubeflow adds some resources to your cluster to assist with a variety of tasks, including training and serving models and running Jupyter Notebooks. It also extends the Kubernetes API by adding new Custom Resource Definitions (CRDs) to your cluster, so machine learning workloads can be treated as first-class citizens by Kubernetes.

What You'll Build


In this lab you will train and serve a TensorFlow model, and then deploy a web interface to allow users to interact with the model over the public internet. You will build a classic handwritten digit recognizer using the MNIST dataset.

The purpose of this lab is to get a brief overview of how to interact with Kubeflow. To keep things simple, use CPU-only training, and only make use of a single node for training. Kubeflow's user guide has more information when you are ready to explore further.


What You'll Learn

  • How to set up Kubeflow on a Kubernetes Engine cluster

  • How to package a TensorFlow program in a container and upload it to Google Container Registry

  • How to submit a tf-train job and save the resulting model to Google Cloud Storage

  • How to serve and interact with a trained model

Join Qwiklabs to read the rest of this lab...and more!

  • Get temporary access to the Google Cloud Console.
  • Over 200 labs from beginner to advanced levels.
  • Bite-sized so you can learn at your own pace.
Join to Start This Lab