Machine Learning with Spark on Google Cloud Dataproc
In this lab you will learn how to implement logistic regression using a machine learning library for Apache Spark running on a Google Cloud Dataproc cluster to develop a model for data from a multivariable dataset.
Google Cloud Dataproc is a fast, easy-to-use, fully-managed cloud service for running Apache Spark and Apache Hadoop clusters in a simple, cost-efficient way. Cloud Dataproc easily integrates with other Google Cloud Platform (GCP) services, giving you a powerful and complete platform for data processing, analytics and machine learning
Apache Spark is an analytics engine for large scale data processing. Logistic regression is available as a module in Apache Spark's machine learning library, MLlib. Spark MLlib, also called Spark ML, includes implementations for most standard machine learning algorithms such as k-means clustering, random forests, alternating least squares, k-means clustering, decision trees, support vector machines, etc. Spark can run on a Hadoop cluster, like Google Cloud Dataproc, in order to process very large datasets in parallel.
The base data set that is used provides historical information about internal flights in the United States retrieved from the US Bureau of Transport Statistics website. This data set can be used to demonstrate a wide range of data science concepts and techniques and is used in all of the other labs in the Data Science on the Google Cloud Platform and Data Science on Google Cloud Platform: Machine Learning quests. In this lab the data is provided for you as a set of CSV formatted text files.
In this lab, you will learn how to:
Prepare the Spark interactive shell on a Google Cloud Dataproc cluster.
Create a training dataset for machine learning using Spark.
Develop a logistic regression machine learning model using Spark.
Evaluate the predictive behavior of a machine learning model using Spark on Google Cloud Datalab.
Bergabunglah dengan Qwiklabs untuk membaca tentang lab ini selengkapnya... beserta informasi lainnya!
- Dapatkan akses sementara ke Google Cloud Console.
- Lebih dari 200 lab mulai dari tingkat pemula hingga lanjutan.
- Berdurasi singkat, jadi Anda dapat belajar dengan santai.
Check that the Spark ML model files have been saved to Cloud Storage