Data Science on Google Cloud Platform: Machine Learning
Advanced 6 Steps 8 hours 42 Credits
This is the second of two Quests of hands-on labs derived from the exercises from the book Data Science on Google Cloud Platform by Valliappa Lakshmanan, published by O'Reilly Media, Inc. In this second Quest, covering chapter 9 through the end of the book, you extend the skills practiced in the first Quest, and run full-fledged machine learning jobs with state-of-the-art tools and real-world data sets, all using Google Cloud Platform tools and services.
PrerequisitesThis Quest assumes two prerequisites: 1) you have access to the O’Reilly book Data Science on the Google Cloud Platform, as the labs only include the exercises from the end of each chapter and do not contain the concepts or teaching from the text itself. 2) You have already completed the first Quest in this sequence: Data Science on the Google Cloud Platform as well as all the prerequisites required for that Quest. WIthout these prerequisites, students will not have the skills or experience needed to succeed here.
In this lab you will learn how to implement logistic regression using a machine learning library for Apache Spark running on a Google Cloud Dataproc cluster to develop a model for data from a multivariable dataset.
Deploy a Java application using Maven to process data with Cloud Dataflow. The Java application implements time-windowed aggregation to augment the raw data in order to produce consistent training and test datasets.
In this lab you will learn how to use Google Cloud Machine Learning and Tensorflow to develop and evaluate prediction models using machine learning.
Learn the process for partitioning a data set into two separate parts: a training set to develop a model, and a test set to evaluate the accuracy of the model and then independently evaluate predictive models in a repeatable manner.
Using Cloud DataProc running on a Hadoop cluster you will analyse a data set using Bayes Classification.
In this lab you will deploy a Google Cloud Dataproc cluster with Datalab pre-installed, then use Spark to perform quantization to improve the accuracy of data modelling.