menu
arrow_back

Processing Data with Google Cloud Dataflow

—/100

Checkpoints

arrow_forward

Create a BigQuery Dataset

Copy the airport geolocation file to your Cloud Storage bucket

Process the Data using Cloud Dataflow (submit Dataflow job)

Processing Data with Google Cloud Dataflow

1시간 15분 크레딧 7개

GSP198

Google Cloud Self-Paced Labs

Overview

In this lab you will simulate a real-time real world data set from a historical data set. This simulated data set will be processed from a set of text files using Python and Google Cloud Dataflow, and the resulting simulated real-time data will be stored in BigQuery. You will then use BigQuery to analyse some features of the real-time data set.

Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes via Java and Python APIs with the Apache Beam SDK. Cloud dataflow provides a serverless architecture that can be used to shard and process very large batch data sets, or high volume live streams of data, in parallel.

BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage.

The data set that is used provides historic information about internal flights in the United States retrieved from the US Bureau of Transport Statistics website. This data set can be used to demonstrate a wide range of data science concepts and techniques and will be used in all of the other labs in the Data Science on Google Cloud Platform quest.

이 실습의 나머지 부분과 기타 사항에 대해 알아보려면 Qwiklabs에 가입하세요.

  • Google Cloud Console에 대한 임시 액세스 권한을 얻습니다.
  • 초급부터 고급 수준까지 200여 개의 실습이 준비되어 있습니다.
  • 자신의 학습 속도에 맞춰 학습할 수 있도록 적은 분량으로 나누어져 있습니다.
이 실습을 시작하려면 가입하세요