Machine Learning with Spark on Google Cloud Dataproc




Check that the Spark ML model files have been saved to Cloud Storage

Machine Learning with Spark on Google Cloud Dataproc

1시간 30분 크레딧 7개


Google Cloud Self-Paced Labs



In this lab you will learn how to implement logistic regression using a machine learning library for Apache Spark running on a Google Cloud Dataproc cluster to develop a model for data from a multivariable dataset.

Google Cloud Dataproc is a fast, easy-to-use, fully-managed cloud service for running Apache Spark and Apache Hadoop clusters in a simple, cost-efficient way. Cloud Dataproc easily integrates with other Google Cloud services, giving you a powerful and complete platform for data processing, analytics and machine learning

Apache Spark is an analytics engine for large scale data processing. Logistic regression is available as a module in Apache Spark's machine learning library, MLlib. Spark MLlib, also called Spark ML, includes implementations for most standard machine learning algorithms such as k-means clustering, random forests, alternating least squares, k-means clustering, decision trees, support vector machines, etc. Spark can run on a Hadoop cluster, like Google Cloud Dataproc, in order to process very large datasets in parallel.

The base data set that is used provides historical information about internal flights in the United States retrieved from the US Bureau of Transport Statistics website. This data set can be used to demonstrate a wide range of data science concepts and techniques and is used in all of the other labs in the Data Science on the Google Cloud and Data Science on Google Cloud: Machine Learning quests. In this lab the data is provided for you as a set of CSV formatted text files.


In this lab, you will learn how to:

  • Prepare the Spark interactive shell on a Google Cloud Dataproc cluster.

  • Create a training dataset for machine learning using Spark.

  • Develop a logistic regression machine learning model using Spark.

  • Evaluate the predictive behavior of a machine learning model using Spark on Google Cloud Datalab

이 실습의 나머지 부분과 기타 사항에 대해 알아보려면 Qwiklabs에 가입하세요.

  • Google Cloud Console에 대한 임시 액세스 권한을 얻습니다.
  • 초급부터 고급 수준까지 200여 개의 실습이 준비되어 있습니다.
  • 자신의 학습 속도에 맞춰 학습할 수 있도록 적은 분량으로 나누어져 있습니다.
이 실습을 시작하려면 가입하세요