Run a Big Data Text Processing Pipeline in Cloud Dataflow
Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Because Dataflow is a managed service, it can allocate resources on demand to minimize latency while maintaining high utilization efficiency.
The Dataflow model combines batch and stream processing so developers don't have to make tradeoffs between correctness, cost, and processing time. In this lab, you'll learn how to run a Dataflow pipeline that counts the occurrences of unique words in a text file.
What you'll learn
- How to create a Maven project with the Cloud Dataflow SDK
- Run an example pipeline using the Google Cloud Platform Console
- How to delete the associated Cloud Storage bucket and its contents
이 실습의 나머지 부분과 기타 사항에 대해 알아보려면 Qwiklabs에 가입하세요.
- Google Cloud Console에 대한 임시 액세스 권한을 얻습니다.
- 초급부터 고급 수준까지 200여 개의 실습이 준비되어 있습니다.
- 자신의 학습 속도에 맞춰 학습할 수 있도록 적은 분량으로 나누어져 있습니다.
Create a new Cloud Storage bucket
Run a text processing pipeline on Cloud Dataflow